I've been running Synapse for about a year at this point and it's been fairly consistently the software that gives me the most trouble for my homelab.
A part I've started to address recently is the ineffectiveness of the healthcheck endpoints. It always works! Which is a problem when the service isn't working and the healthcheck says it is. This is commonly when my Postgres server moves for some reason, Synapse never reconnects. It's a known issue.
I've worked around this with a healthcheck script that will probe the Synapse API
curl -fv http://127.0.0.1:{{ PORT }}/_matrix/client/v3/publicRooms -H "Authorization: Bearer {{ TOKEN }}"
The TOKEN
needs to be a user token which also gives this a bootstrapping problem where this check can only be in place after the server works. And a user changing their password will invalidate this token and take the server offline...
After a few days though it's solved my problem of Synapse breaking but pretending to be fine.