Folgen

Important ASA (Admins Service Announcement) for Mastodon and generally PostgreSQL admins: Due to some changes in glibc some distribution upgrades will cause PostgreSQL text indexes to become corrupted, potentially leading to unique indexes not being correctly enforced and inconsistent application data.

THIS WILL ALSO HAPPEN IF YOU DO NOT UPDATE YOUR POSTGRESQL, A DISTRO UPDATE IS ENOUGH!

wiki.postgresql.org/wiki/Local

· · Web · 6 · 89 · 41

Affected upgrades are Ubuntu 18.04 or lower to 18.10 or higher, so especially current LTS upgrades (18.04 to 20.04), Debian 9 or lower to Debian 10, RHEL/CentOS 7 or lower to RHEL/CentOS 8.

Not heeding the linked advices will at some point in the future lead to strange search and ordering results out of your db and will probably cause duplicate entries despite unique indices. Fixing this afterwards involves rebuilding the indexes and somehow fixing any duplicate key error you encounter.

Zeige Konversation

We did not know about this beforehand as it is not noted in any docs we could see regarding distro or pg upgrades. We noticed strange things approx. 6 hours after the distro+pg upgrade with pg_upgrade and had ~10 unique index violations in our relatively small Mastodon instance, ~10 across 2 smallish HackMD instances, and ~30-40 across 2 rather lively Matrix Synapse instances. It took us around 3h to get everything sorted and can only _hope_ everything is good now. Please read the linked wiki.

Zeige Konversation
@thegcat Thing that I say all the fucking time: please put your system locale to C or C.UTF-8 (also UTC timezone).
en_US.UTF-8 or similar is for humans-only.

@thegcat Should maybe tag this with #mastoadmin now that people might be tempted to move to the new Ubuntu LTS.

(glibc upgrade affects Postgres indices, see thread down from original post)

@galaxis Didn't know that that tag was "standard", thanks :-)

@thegcat Yeah, it's not really a standard unfortunately - some use "mastoadmins" too. (And I've also seen "fediadmins" - I think Pleroma uses a Postgres backend too? So other Fediverse software will likely be affected as well, btw.)

There's also a (mostly unused) mailing list and a forum at discourse.joinmastodon.org where instance admins are around.

@mmu_man
Merci. G3L n'est pas impacté. Les serveurs utilisant postgresql sont sous Debian 9 encore.
@thegcat @vincib @jcasseron

@jpfox @mmu_man @vincib @jcasseron A faire attention au moment du passage à Debian 10, mais j'avais lu dans les release notes de Debian 10 qu'ils indiquent cette problématique et les démarches à suivre pour la mise à jour.

@thegcat
Oui oui, j'ai bien noté 😉
Mais sur du serveur hébergé en prod, une montée de version est souvent synonyme de changement de serveur... du coup, c'est dump+import et là pas de souci visiblement.
@mmu_man @vincib @jcasseron

@jpfox @mmu_man @vincib @jcasseron dump+import n'a pas ce problème étant donné que ça "n'emmène" pas les fichiers physiques avec les indexs, en effet. Alors tout va bien 🙂

@thegcat I wish this #MastoAdmin tip was more widely known 2 months ago because I started having weirdness with my instance when I upgraded to Debian 10 and battled with masto weirdness throughout March.

If one or more of your mastodon docker containers starts randomly flapping in the breeze (restarting) and you did an OS update look at your database indexes. My instance and a few others have already dealt with this.

Also postgresql in a docker container seems to be a bad idea.

@msh Note that the Debian 10 release notes did include this piece of information, so kudos to them for that. And from what I understood the pg Docker containers only did dump and restore pg upgrades, which would be unaffected?

@thegcat I will have to read the release notes more carefully in the future!

It should be noted that when I started having problems I was using a docker container for the db and it was upgraded on a different day than the host OS, and I'm not sure if there are issues with clashing host/container glib versions as well.

If you have issues with random broken emojis, @ mentions, certain public profiles going 404, masto containers flapping etc. then your instance has probably hit this gotcha.

@msh @thegcat

"Also postgresql in a docker container seems to be a bad idea."

Can you expand on this a little? Postgres in the container would come along with glibc in the container, so why is this an issue, instead of a solution?

@atrus @thegcat it is an issue if you have an LTS host but upgrade the container without reindexing or doing a dump and reload of the database in the attached volume.

Let's say you upgrade the db container during a docker-compose rebuild, and it is still a v11 database but the base image has updated glibc. The container could theoretically just reattach the data volume and then your indexes will start corrupting if you don't manually reindex ASAP, just as if you upgraded OS on a base image...

@msh @thegcat So to rephrase:

The issue is that in a non-container environment, deb packages could (but don't) automatically catch this case and reindex, whereas switching container base images rules out the possibility of the packaging scripts handing this case more intelligently.

@atrus @thegcat ...so docker isn't a solution unto itself, however if the creator of the docker image is mindful they could address the issue. All I know is that I started noticing an increasing number of glitches after upgrading (I rebuilt the containers one weekend, and updated the OS of the machine hosing them to Buster the next IIRC) and it appears to have been this glib issue anyways, so nothing seemed to catch the problem.

Aside from that I had other reasons to ditch docker...

@atrus @thegcat

For me, docker just kept getting in the way. For some reason I couldn't determine the database ran noticeably slower in docker despite there not being anything that caused obvious overhead. Also when there did start to be errors docker-compose kept slamming services off and on abruptly to recover, including the database. Occasionally graceful shutdown would timeout and docker would just force kill. Going through logs was annoying. All those overlayFS mounts was annoying etc...

@atrus @thegcat

...it's not so much that docker is bad, m'kay? I know there are answers to all the docker problems I had...But for my use case it ultimately made no sense. I have a small instance and a pretty simple environment. Tuning the stock docker-compose to work *optimally* with my environment (including learning the intricacies of docker) turned out to be much more of a time sink than just following the docs to install masto straight onto a stock debian setup.

@msh @thegcat Thanks muchly for the detailed explainations.

It gets me thinking on what sorts of best practices and plumbing are missing in this space.

@thegcat pretty sure I'm not affected but just to be safe

mastodon=# reindex database concurrently mastodon;

Melde dich an, um an der Konversation teilzuhaben
MastoKIF

Mastodon ist ein soziales Netzwerk. Es basiert auf offenen Web-Protokollen und freier, quelloffener Software. Es ist dezentral (so wie E-Mail!).