Checkpoint Monitord Process Consumes Excess Memory - 91Sec

Latest

Learning, Sharing, Creating

Wednesday, August 26, 2015

Checkpoint Monitord Process Consumes Excess Memory

During a regular review firewall mem and cpu usage, I found some of Checkpoint UTM272 R77.10 gateways are using lots memory and ssh / snmp access seems slow sometimes. With the TOP command , I am able to sort the mem / cpu usage and see who is hogging the resources.

The result of finding is monitord service. Monitord server is used by device sensors to monitor hardware and saves data into DB file stored on local. Before R76, it will keep one year data in DB. After R76, it only keeps 3 months history to save devices resources during process the data. In my case, the DB file is more than 350M which cause monitord service consumes lots memory to process DB file. Although we are using R77.10, it seems upgrading to R771.10, not fresh installation,  wont reset your DB file structure.

There is workaround provided at SK93587. Here are all steps I recorded to fix this.


1. Before applied the workaround, monitord is using 42.5% MEM.


top - 10:56:37 up 10 days,  1:08,  1 user,  load average: 0.00, 0.06, 0.43
Tasks:  83 total,   3 running,  80 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  1.1%sy,  0.0%ni, 97.3%id,  0.2%wa,  0.1%hi,  0.1%si,  0.0%st
Mem:    957272k total,   947392k used,     9880k free,     2772k buffers
Swap:  2096472k total,    43292k used,  2053180k free,   209280k cached
%MEM   PID USER      PR  NI  VIRT  RES  SHR S %CPU    TIME+  COMMAND             
 5.0  4226 admin     15   0  263m  47m  11m S  0.4  59:12.98 cpd                 
 0.1  2782 admin     15   0  2172 1084  836 R  0.2   0:00.05 top                 
 0.8  3988 admin     15   0 24344 7956 5780 S  0.2  22:38.83 snmpd               
 1.4  3947 admin     16   0 33796  13m 7964 S  0.1   2947:10 confd               
42.5  3952 admin     15   0  400m 397m 2332 S  0.1 119:05.53 monitord            
 0.1  3545 admin     18   0  1708  688  584 S  0.1   2:38.13 syslogd             
 0.1     1 admin     15   0  2040  580  548 S  0.0   0:01.47 init                
 0.0     2 admin     RT  -5     0    0    0 S  0.0   0:00.00 migration/0         
 0.0     3 admin     15   0     0    0    0 S  0.0   0:00.67 ksoftirqd/0         
 0.0     4 admin     RT  -5     0    0    0 S  0.0   0:00.00 watchdog/0          
 0.0     5 admin     10  -5     0    0    0 S  0.0   0:01.56 events/0                                                                                             



Next is the top outputs sorted by %MEM:              
top - 10:58:15 up 10 days,  1:10,  1 user,  load average: 0.00, 0.04, 0.38
Tasks:  83 total,   3 running,  80 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.3%sy,  0.0%ni, 99.0%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    957272k total,   947972k used,     9300k free,     3036k buffers
Swap:  2096472k total,    43292k used,  2053180k free,   209708k cached

%MEM   PID USER      PR  NI  VIRT  RES  SHR S %CPU    TIME+  COMMAND             
42.5  3952 admin     15   0  400m 397m 2332 S  0.3 119:05.63 monitord            
 6.9  6938 admin     19   0  122m  64m 3836 S  0.0  19:09.09 DAService           
 5.0  4226 admin     15   0  263m  47m  11m S  0.0  59:13.25 cpd                 
 2.0  4386 admin     15   0  284m  18m  10m S  0.0   1:23.18 fw_full             
 1.5  3948 admin     15   0 38032  13m 1704 S  0.0  70:42.63 searchd             
 1.4  3947 admin     15   0 33796  13m 7964 S  0.0   2947:10 confd               
 1.4  6779 admin     15   0  163m  13m 7252 S  0.0   0:03.49 rtmd                
 0.8  3988 admin     15   0 24344 7956 5780 S  0.0  22:39.07 snmpd                

2. Rebuild monitord DB

[Expert@CP-DMZ-1:0]# tellpm process:monitord
[Expert@CP-DMZ-1:0]# 
Message from syslogd@ at Wed Aug 26 10:59:39 2015 ...
CP-DMZ-1 monitord[3952]: monitord got killed 
[Expert@CP-DMZ-1:0]# top  (Sorted result by %MEM)
                 
top - 11:00:09 up 10 days,  1:12,  1 user,  load average: 0.00, 0.02, 0.33
Tasks:  82 total,   2 running,  80 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.3%us,  1.7%sy,  0.0%ni, 95.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    957272k total,   542928k used,   414344k free,     3620k buffers
Swap:  2096472k total,    42700k used,  2053772k free,   208824k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 6938 admin     19   0  122m  64m 3836 S  0.0  6.9  19:09.09 DAService           
 4226 admin     15   0  263m  47m  11m S  1.0  5.0  59:13.62 cpd                 
 4386 admin     15   0  284m  18m  10m S  0.0  2.0   1:23.18 fw_full             
 3948 admin     15   0 38032  13m 1704 S  0.0  1.5  70:42.63 searchd             
 3947 admin     15   0 33796  13m 7968 S  0.0  1.4   2947:10 confd               
 6779 admin     15   0  163m  13m 7252 S  0.0  1.4   0:03.49 rtmd                
 3930 admin     15   0 25300 7996 6340 S  0.0  0.8   0:00.41 pm                  
 3988 admin     15   0 24344 7956 5780 S  0.3  0.8  22:39.35 snmpd               
 4339 admin     15   0  149m 7352 5748 S  0.0  0.8   0:00.51 cphamcset           
 4367 admin     15   0 32944 7224 6472 S  0.0  0.8   1:09.32 routed              
 4374 admin     16   0 33044 7168 6976 S  0.0  0.7   0:13.16 routed              
 3951 admin     18   0 99768 7024 6620 S  0.0  0.7   0:06.79 rconfd              
 3983 admin     17   0 25272 6816 6136 S  0.0  0.7   0:00.34 cloningd            
 2228 admin     15   0 21000 5972 3324 S  0.0  0.6   0:00.52 clish               
 4240 admin     15   0  150m 5732 5592 S  0.0  0.6   0:00.75 mpdaemon                                                                                                                                                              
[Expert@CP-DMZ-1:0]# cd /var/log
[Expert@CP-DMZ-1:0]# ls -l db
-rw-r--r-- 1 admin root 356237312 Aug 26 10:45 db
[Expert@CP-DMZ-1:0]# cp /var/log/db  /var/log/db_ORIGINAL
[Expert@CP-DMZ-1:0]#  sqlite3 /var/log/db 
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> VACUUM;
sqlite> .exit 
[Expert@CP-DMZ-1:0]# tellpm process:monitord t
[Expert@CP-DMZ-1:0]# 


3. Check Memory usage after workaround applied

The memory usage has been reduced to only 4.9%, dropped from 42.5% we found from Step 1

top - 11:15:24 up 10 days,  1:27,  1 user,  load average: 0.00, 0.05, 0.18
Tasks:  83 total,   2 running,  81 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.3%sy,  0.0%ni, 98.3%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
Mem:    957272k total,   446428k used,   510844k free,     4808k buffers
Swap:  2096472k total,    42696k used,  2053776k free,    67228k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND             
 6938 admin     17   0  122m  64m 3836 S  0.0  6.9  19:09.09 DAService           
 4226 admin     15   0  263m  47m  11m S  0.0  5.0  59:16.10 cpd                 
 3088 admin     15   0 49684  45m 2320 S  0.0  4.9   0:01.55 monitord            
 4386 admin     15   0  284m  18m  10m S  0.0  2.0   1:23.23 fw_full             
 3948 admin     15   0 38032  13m 1704 S  0.0  1.5  70:42.63 searchd             
 3947 admin     15   0 33796  13m 7968 S  0.0  1.4   2947:10 confd               
 6779 admin     15   0  163m  13m 7252 S  0.0  1.4   0:03.49 rtmd                
 3930 admin     16   0 25300 8012 6340 S  0.0  0.8   0:00.41 pm                  
 3988 admin     15   0 24344 7956 5780 S  0.0  0.8  22:41.56 snmpd               
 4339 admin     15   0  149m 7352 5748 S  0.0  0.8   0:00.51 cphamcset           
 4367 admin     15   0 32944 7224 6472 S  0.0  0.8   1:09.33 routed              
 4374 admin     15   0 33044 7168 6976 S  0.0  0.7   0:13.19 routed              
 3951 admin     18   0 99768 7024 6620 S  0.0  0.7   0:06.79 rconfd              
 3983 admin     17   0 25272 6816 6136 S  0.0  0.7   0:00.34 cloningd            
 2228 admin     15   0 21000 5972 3324 S  0.0  0.6   0:00.52 clish               
 4240 admin     15   0  150m 5732 5592 S  0.0  0.6   0:00.75 mpdaemon            
 4787 admin     18   0 20936 5512 5508 S  0.0  0.6   0:00.28 cpviewd             
 4347 nobody    17   0 18748 5108 5104 S  0.0  0.5   0:00.21 ci_http_server        
And the DB size reduced from more than 350M to less than 40M

[Expert@CP-DMZ-1:0]# ls -l db
-rw-r--r-- 1 admin root 37168128 Aug 26 11:32 db



Reference: 

sk93587 - Output of 'top' command on Gaia OS shows that 'monitord' process consumes memory or CPU at high level










No comments:

Post a Comment

Banner

BANNER 728X90