you can use pg_upgrade with -k - it will complete within seconds. Afterwards, things will be slow until a complete analyze updates the statistics, but the update itself can be done in seconds.
I have updated ~2TB of database from 9.0 all the way to 9.5 over the years.
The problem with this is that if anything fails, you can potentially corrupt your data and have no backup plan. To make that option safe, you would have to copy your data directory first, and you need to be offline for that. So you have to add the time it takes to make that copy.
This is why I ensure that the slaves are up to date, then disconnect them, pg_upgrade the master and resync the slaves (which is required anyways). If something goes wrong, I would fail over to the slave.
Also: You don't need to be offline to copy the data directory. Check `pg_start_backup` or `pg_basebackup` (which calls the former)
That requires the master and slaves to run different versions for a while. And that is not possible with stock postgresql, is it?
Regarding your second point, I meant copying the data directory as in a 'cp' command. Or rsync if you will. The functions you mentioned are only useful when doing a dump, isn't it? And recovering from a upgrade problem using a dump is way slower than just starting the previous version in the backup data directory.
> That requires the master and slaves to run different versions for a while. And that is not possible with stock postgresql, is it?
Yes. That's not possible. But if I announce the downtime, bring master and slave down, migrate the slave and run our test-suite, migrate the master, run the test suite again and bring the site back up, then I know whether the migration worked.
If the migration on the slave fails, well, then I can figure out where the problem lies and just bring master back.
If the migration on master fails, but works on slave, then I can bring slave up as the new master.
No matter what, there's always one working copy and the downtime is limited to two `pg_upgrade -k` runs (which is measured in minutes).
> Regarding your second point, I meant copying the data directory as in a 'cp' command. Or rsync if you will.
Yes. You execute `select pg_start_backup()` to tell the server that you're now going to run cp or rsync and to thus keep the data files in a consistent state. Once you have finished cp/rsync, you execute `select pg_stop_backup()` to put the server back in the original mode.
This works while the server is running.
If you don't want the hassle of executing these commands, you can also invoke the command-line tool `pg_basebackup` which does all of this for you.
Any chance you have a blog or website where you could write up/post an example of this entire process? It sounds like the details that you've posted above would be of extreme assistance to many others.
Yes, that does allow you to copy the database directory with the "cp" command. The command tells postgres to stop deleting obsolete WAL files until further notice. As long as you start your copy after you issue the command, and copy across at least all of the files that are present (as in, you can ignore new files that are created), then the data is safe. Just don't forget to tell postgres that the backup has finished afterwards.
pg_start_backup/pg_basebackup are used when doing an rsync-style copy. You'll end up with a copy of the data directory, rather than a dump. You can then start up a server instance directly in the resulting backup directory.
How do I learn all this stuff, as a person that has no reason to touch Postgres other than personal interest? I never get to encounter these types of problems in my day to day.
You don't need to be offline to make a copy of the data directory. You can do that ahead of time, keeping all the WAL segments up until the point that you make the switch.
I have updated ~2TB of database from 9.0 all the way to 9.5 over the years.