Alright team, now that we're able to vacuum all of our postgres tables no matter how big, it's time for us to setup a plan for vacuuming all of our tables on a regular basis. That way when we do get a chance to perform a vacuum full on the tables they won't fill back up with dead tuples and allocated free space.
For those that have sudo access on db2, I've written a little program, vacstat.py, that can help analyze our needs. The first command that people should know about is: sudo -u postgres vacstat.py schema
When run in this mode, vacstat will attempt to get a list of all databases and the tables in those databases. It will compare those lists against the copy from a previous run. If they are the same then vacstat will be happy. If they differ, vacstat will save the new data in a file and print a message for you to setup a vacuum policy for the new table.
I've set up vacstat to run from cron on db2 in this mode. If we get email from vacstat telling us that there's a new database or table, we'll need to make sure to enter those databases or tables in our vacuum script and then follow the directions to let vacstat know we're aware of the new tables.
The next major mode is: sudo -u postgres vacstat.py stattuple-start --database DBNAME
In this mode, vacstat will vacuum the database and then take samples of how dirty the tables have gotten over the course of a day. vacstat is currently set to take a sample immediately after vacuuming, after one hour, after six hours, and after a day. The information is recorded into a pickle file under /var/lib/vacstat on db2. Once we have that information we can use it to see how quickly the dead tuples accumulate in the table and from that come up with a plan on how frequently to vacuum on a table-by-table basis.
I'm currently testing the latter functionality with a run on some of our smaller databases. Once that's working I'll be collecting stats for all of our databases.
-Toshio
infrastructure@lists.fedoraproject.org