Running since yesterday morning. I have enabled the postgres plugins, since the postgres log showed, that the agent was still active after "uninventory" - but with wrong credentials.
I had to restart the rhq-server (only the server, neither the Pg, nor the agent) this morning, since the number of open files raised above 10K.
The processes before restart:
root@ct-front:~# ps auxw|grep postgres|grep rhq postgres 1631 0.0 0.1 121300 44740 ? Ss Sep23 0:05 postgres: rhq rhq 127.0.0.1(46903) idle postgres 1735 0.0 0.1 118792 40384 ? Ss Sep23 0:09 postgres: rhq rhq 127.0.0.1(36888) idle postgres 3669 0.4 0.1 118572 29676 ? Ss 08:40 0:00 postgres: rhq rhq 127.0.0.1(54749) idle postgres 3670 0.3 0.1 119164 27232 ? Ss 08:40 0:00 postgres: rhq rhq 127.0.0.1(54750) idle postgres 5669 0.0 0.0 102028 5644 ? Ss Sep23 0:00 postgres: rhq rhq 127.0.0.1(56115) idle postgres 5680 0.0 0.1 115072 39132 ? Ss Sep23 0:05 postgres: rhq rhq 127.0.0.1(56125) idle postgres 6025 0.0 0.0 103304 10648 ? Ss Sep23 0:27 postgres: rhq rhq 127.0.0.1(56210) idle postgres 6041 0.0 0.1 123280 48024 ? Ss Sep23 0:11 postgres: rhq rhq 127.0.0.1(56215) idle postgres 6060 0.0 0.1 119700 42168 ? Ss Sep23 0:17 postgres: rhq rhq 127.0.0.1(56237) idle postgres 7387 0.0 0.0 106948 16364 ? Ss Sep23 1:01 postgres: postgres rhq 127.0.0.1(60327) idle postgres 8535 0.0 0.1 114120 36108 ? Ss 02:51 0:04 postgres: rhq rhq 127.0.0.1(48651) idle postgres 11169 0.0 0.1 118580 39900 ? Ss 00:00 0:07 postgres: rhq rhq 127.0.0.1(59908) idle postgres 11409 0.0 0.1 123716 46292 ? Ss Sep23 0:11 postgres: rhq rhq 127.0.0.1(46198) idle postgres 11431 0.0 0.2 124624 52432 ? Ss Sep23 0:47 postgres: rhq rhq 127.0.0.1(46289) idle postgres 11568 0.0 0.1 116364 40484 ? Ss Sep23 0:05 postgres: rhq rhq 127.0.0.1(34407) idle postgres 12416 0.0 0.1 122788 49160 ? Ss Sep23 0:17 postgres: rhq rhq 127.0.0.1(52744) idle postgres 15707 0.0 0.0 103324 10892 ? Ss Sep23 0:25 postgres: rhq rhq 127.0.0.1(60936) idle postgres 19382 0.0 0.1 122916 48488 ? Ss Sep23 0:08 postgres: rhq rhq 127.0.0.1(56676) idle postgres 19868 0.1 0.1 118104 38276 ? Ss 07:01 0:09 postgres: rhq rhq 127.0.0.1(45391) idle postgres 20554 0.0 0.1 118244 40764 ? Ss 04:00 0:08 postgres: rhq rhq 127.0.0.1(55602) idle postgres 21534 0.0 0.2 125260 50348 ? Ss 01:00 0:12 postgres: rhq rhq 127.0.0.1(37533) idle postgres 22480 0.0 0.1 105776 30264 ? Ss Sep23 0:03 postgres: rhq rhq 127.0.0.1(58453) idle postgres 23255 0.0 0.1 114344 43000 ? Ss Sep23 0:28 postgres: rhq rhq 127.0.0.1(52114) idle postgres 23435 0.1 0.1 120388 44036 ? Ss 07:21 0:05 postgres: rhq rhq 127.0.0.1(47287) idle postgres 24113 0.0 0.1 123212 48060 ? Ss 04:21 0:05 postgres: rhq rhq 127.0.0.1(57681) idle postgres 25861 0.0 0.1 119720 43260 ? Ss 01:25 0:05 postgres: rhq rhq 127.0.0.1(40007) idle postgres 27396 0.0 0.1 124076 48696 ? Ss 04:40 0:08 postgres: rhq rhq 127.0.0.1(59573) idle postgres 28973 0.0 0.1 116924 39768 ? Ss Sep23 0:05 postgres: rhq rhq 127.0.0.1(43984) idle postgres 30394 0.2 0.1 125260 39752 ? Ss 08:01 0:06 postgres: rhq rhq 127.0.0.1(51100) idle postgres 30455 0.0 0.2 123336 50372 ? Ss Sep23 0:15 postgres: rhq rhq 127.0.0.1(34681) idle postgres 30576 0.2 0.1 124888 39200 ? Ss 08:01 0:06 postgres: rhq rhq 127.0.0.1(51154) idle postgres 30841 0.1 0.1 121776 38844 ? Ss 08:03 0:04 postgres: rhq rhq 127.0.0.1(51340) idle postgres 31674 0.2 0.1 125424 39604 ? Ss 08:08 0:04 postgres: rhq rhq 127.0.0.1(51823) idle postgres 31702 0.0 0.2 122984 49660 ? Ss Sep23 0:07 postgres: rhq rhq 127.0.0.1(40525) idle postgres 32761 0.0 0.1 123000 47332 ? Ss Sep23 0:07 postgres: rhq rhq 127.0.0.1(53885) idle
and after restart:
root@ct-front:~# ps auxw|grep postgres|grep rhq postgres 4571 4.6 0.0 105016 15876 ? Ss 08:43 0:01 postgres: rhq rhq 127.0.0.1(55153) idle postgres 4572 0.0 0.0 102052 5656 ? Ss 08:43 0:00 postgres: rhq rhq 127.0.0.1(55156) idle postgres 7387 0.0 0.0 106948 16364 ? Ss Sep23 1:01 postgres: postgres rhq 127.0.0.1(60327) idle
I have uploaded a screenshot here: https://www.dropbox.com/s/ysmh9uq37xxacs0/RHQ-restart.png
Regards,
Attila
2013/9/23 Attila Heidrich attila.heidrich@gmail.com
Postgres database stopped again... I guess the problem was the enormous number of open files...
We use 9.2, and we also use the postgres plugin - which still doesn't really support 9.2 as far as I know.
Altogether Postgres (only the one storing the RHQ database) can only run for a few days, than the number of open files and the open connections raise really high, and finally I should restart it.
Usually I restart RHQ as well, I generally have no time to play with "what to restart".
The log is practically endless, I can quote, but I think I should know something to look for in it.
Regards,
Attila