More

pQd · 2025-07-24T21:40:59 1753393259

sphinx project went half-dead around that time, few years later it got revived but with closed source license.

as far as i understand apparent death of sphinx and demand for continued development/support from big users of it led to creation of manticore.

pQd · 2025-07-24T15:23:00 1753370580

manticore, earlier sphinx search, has been rock solid for us for the past 16 years. now serving searches across nearly 300M short documents. we're using it in the old mode - where full index is re-created every 24h.

it's great to see that the project is alive and adding embeddings-related functions needed for semantic search.

snikolaev · 2025-07-24T15:39:33 1753371573

Great to hear! 16 years is impressive. Glad to see the new semantic features caught your eye — we're excited to keep improving the project.

pQd · 2025-06-29T12:46:21 1751201181

aspect worth noting: up to my knowledge HE's tunnel will work only if you're assigned public IPv4 by your ISP. if you're behind a carrier grade NAT - too bad, you'll need to use another solution to get IPv6 to your home.

Abekkus · 2025-06-29T13:36:36 1751204196

Strange. This sounds like something Hurricane Electric specifically limited. There’s nothing in CGNAT that would naturally break such a tunnel

pQd · 2025-06-29T14:34:06 1751207646

HE is using plain stateless IPv6 in IPv4 tunnel - it's neither TCP nor UDP, it's not NAT'able.

it's relatively simple for them to implement [ the stateless part ] but due to that puts some requirements on the party establishing the tunnel.

TechDebtDevin · 2025-06-29T13:46:17 1751204777

I use tunnels all day like this with cgnat on multiple devices.

Shadowmist · 2025-06-29T15:56:28 1751212588

Go Fiber (Shentel) is one such ISP, and they will gladly switch you to a public IP for no cost if you contact their support. Sadly they don’t support IPv6 yet.

pQd · on Sept 28, 2024

environment: KVM VMs running on physical hardware managed by us.

we have a belt & suspenders approach:

* backups of selected files / database dumps [ via dedicated tools like mysqldump or pg_dumpall ] from within VMs

* backups of whole VMs

backups of whole VMs are done by creating snapshot files via virsh snapshot-create-as, rsync followed by virsh blockcommit. this provides crash-consistant images of the whole virtual disks. we zero-fill virtual disks before each backup.

all types of backups later go to borg backup[1] [ kopia[2] would be fine as well ] deduplicated, compressed repository.

this approach is wasteful in terms of bandwidth, backup size but gives us peace of mind - even if we miss to take file-level backup of some folder - we'll have it in the VM-level backups.

[1]: https://www.borgbackup.org/

[2]: https://kopia.io/

pQd · on May 7, 2024

and elinks as well. it had better handling of page layout.

http://elinks.cz/ + https://github.com/rkd77/elinks

pQd · on March 29, 2024

Allegro has amazing metadata allowing you to precisely filter out the results. Search experience on amazon is an utter abomination compared to Allegro.

pQd · on March 21, 2024

We're using BTRFS to host PostgreSQL and MySQL replication slaves. We're snapshoting drives holding data for both every 15 minutes, 1h, 8h and 12h and keep few snapshots for each frequency.

Those replicas are not used for any workload, besides nightly consistency checks for MySQLs via pt-table-checksum to ensure we don't have data drift.

Snapshots are crash consistent. Once in a while they give us ability to very quickly inspect how data looked like few minutes or hours ago. This can be life-saver in case of fat-fingering a production data and saved us from lenghty grepping of backups when we needed to recover few records from a specific table.

Yes, I know soft deletes, audit logs - all of those could help and we do have them, but sometimes that's not enough or not feasible.

Due to it's COW nature BTRFS is far from perfect for data that changes all the time [ databases busy with writes, images of VMs with plenty of disk write activity ]. There's plenty of write amplification, but that can be solved with NVMe drives thrown on the problem.

dilyevsky · on March 21, 2024

How do you avoid heavy fragmentation caused by random writes? Do you disable COW (sounds like "no", given you snapshot)? Or autodefrag (how's performance)?

pQd · on April 24, 2023

Also - thanks to that FIDO2 does not seem to be usable with Microsoft's MS365 services [ Teams, Outlook, Excel etc ] on Android or iOS. there's no way to provide pin for the security key, regardless if it's plugged in via the USB port or used via NFC.

https://learn.microsoft.com/en-us/azure/active-directory/aut...

bummer

pQd · on Feb 12, 2023

coincidentally - we've been using first sphinx and then manticore for over 15 years as well. in our case it's fed each night with XML generated by Java code from data stored in MySQL databases. we index over 294M pseudo documents.

it's been rock solid for all those years.

pQd · on Dec 28, 2022

borg is great. we've been using it for the past 3 years to archive hundreds of file-level backups of servers, database dumps and VM images. average size of each borg repo is few GB but there are few outliers up to few hundreds of GB. most of backups are done daily, with 7-24 past days preserved in borg archive. borg repos are verified, copied to external disks, verified again and rotated offline each week.

borg replaced https://rdiff-backup.net/ for us and gave: * nice speedup of backups/backup tests, * decent saving in the disk space thanks to compression and deduplication, * decreased backup replication time [ borg repo tends to have much less, larger files compared to rdiff which has in its repo at least as many files as your source data; rsync likes it ].

to finish backups in a reasonable time we had to parallelize backup gathering [ each server / vm goes to separate borg repo; this limits a failure domain in case of corrupted repo, but denies us benefit of deduplication on larger scale - across servers ] and borg archiving. without that - we would be a limited by a single cpu core performance [ borg is not multithreaded yet ].

it's worth testing the backups - we're doing it each day by using borg's repo self test and by extracting few key files and checking their checksums and content... just in case.

echoing other comments - https://kopia.io/ looks interesting but we have not tried it yet.