Thursday, April 23, 2015

Oracle Cluster Health Monitor (CHM) using large amount of space (crfclust.bdb)

Last night my rac 2 node server went down for OS patcing and rebooted but all CRS resources not coming up on both the node after node reboots:

conn as root user and check all resources

[root@oradev11 bin]# ./crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE
ora.crf
      1        ONLINE  OFFLINE
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  OFFLINE
ora.cssdmonitor
      1        ONLINE  ONLINE       oradev11
ora.ctssd
      1        ONLINE  OFFLINE
ora.diskmon
      1        ONLINE  OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       oradev11
ora.evmd
      1        ONLINE  OFFLINE
ora.gipcd
      1        ONLINE  OFFLINE
ora.gpnpd
      1        ONLINE  OFFLINE
ora.mdnsd
      1        ONLINE  OFFLINE                               STARTING


CRS alert log says:

[root@oradev11 ] # cd $GRID_HOME/log/hostname
[root@oradev11 oradev11]# tail -50f alertoradev11.log

o/p trimmed………

2015-04-22 20:06:01.173:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(24990)]CRS-5818:Aborted
2015-04-22 20:06:05.177:
[ohasd(12696)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.4/grid/log/sl73vmhasd/ohasd.log.
2015-04-22 20:06:05.658:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25614)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.
LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:06:05.659:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25614)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"
2015-04-22 20:06:06.176:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25631)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.
LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:06:06.176:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25631)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"
2015-04-22 20:06:06.272:
[gpnpd(25644)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/gpnpd/gpnpd.log". Additional diagnostics: LFI-00004: ibwrt() failed.
LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:06:06.272:
[gpnpd(25644)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/gpnpd/gpnpd.log"
2015-04-22 20:06:09.314:
[gpnpd(25644)]CRS-2329:GPNPD on node oradev11 shutdown.
2015-04-22 20:08:06.226:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(25631)]CRS-5818:Aborted command 'start' for resource 'ora.gpnpd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.4/grid/logbd001/agent/ohasd/oraagent_grid/oraagent_grid.log.
2015-04-22 20:08:10.229:
[ohasd(12696)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.gpnpd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.4/grid/log/sl73vmhasd/ohasd.log.
2015-04-22 20:08:10.710:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26582)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.
LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:08:10.710:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26582)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"
2015-04-22 20:08:11.280:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26604)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagenagent_grid.log". Additional diagnostics: LFI-00004: Call to lfibwrt() failed.
LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:08:11.280:
[/u01/app/11.2.0.4/grid/bin/oraagent.bin(26604)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/agent/ohasd/oraagent_gridgrid.log"
2015-04-22 20:08:11.347:
[mdnsd(26617)]CRS-0037:An error occurred while attempting to write to file "/u01/app/11.2.0.4/grid/log/oradev11/mdnsd/mdnsd.log". Additional diagnostics: LFI-00004: ibwrt() failed.
LFI-01518: write() failed(OSD return value = 28) in slfiwl.

2015-04-22 20:08:11.347:
[mdnsd(26617)]CRS-0004:logging terminated for the process. log file: "/u01/app/11.2.0.4/grid/log/oradev11/mdnsd/mdnsd.log"
2015-04-22 20:08:11.351:
[mdnsd(26617)]CRS-5602:mDNS service stopping by request.

After so much of time spending on troubleshooting I checked the space on server  and then released it is because of space issue on a mount point where my GRID home located, due to which CRS resources are not coming up

[root@oradev11 bin]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-rootlv
                      5.8G  1.8G  3.8G  33% /
tmpfs                 3.0G     0  3.0G   0% /dev/shm
/dev/sda1             190M   86M   95M  48% /boot
/dev/mapper/rootvg-homelv
                      2.0G  9.2M  1.8G   1% /home
/dev/mapper/rootvg-optlv
                      9.8G  2.0G  7.3G  22% /opt
/dev/mapper/rootvg-securlv
                      1.5G  211M  1.2G  16% /opt/security
/dev/mapper/rootvg-tmplv
                      2.0G  375M  1.5G  21% /tmp
/dev/mapper/rootvg-varlv
                      9.8G  1.1G  8.2G  12% /var
/dev/mapper/datavg-gridbaselv
                       50G   49G     0 100% /u01/app
/dev/mapper/datavg-rdbmsbaselv
                       50G  4.8G   42G  11% /u01/app/oracle
/dev/mapper/datavg-adrrepolv
                       50G  2.6G   45G   6% /oratrace
/dev/mapper/datavg-oemagentlv
                       20G  651M   18G   4% /u01/app/emagent
/dev/mapper/datavg-gglv
                       50G   52M   47G   1% /gg
/dev/mapper/datavg-dbawslv
                       99G   16G   79G  17% /oraworkspace
/dev/mapper/datavg-auditfslv
                       50G  230M   47G   1% /oradbaudit
/dev/mapper/datavg-dbtoolslv
                      9.8G   86M  9.2G   1% /oratools

Checking to see if i can delete anything on /u01/app mount point and i see "crfclust.bdb" is consuming much space then any other

[root@oradev11 bin]# cd ../crf/db
[root@oradev11 db]# ls -lrht
total 4.0K
drwxr-x--- 2 root oinstall 4.0K Apr 22 20:45 oradev11
[root@oradev11 db]# cd oradev11

[root@oradev11 oradev11]# ls -lrth
total 38G
-rw-r--r-- 1 root root 1.1M Sep  8  2014 08-SEP-2014-09:24:06.txt
-rw-r--r-- 1 root root 1.9M Sep  8  2014 08-SEP-2014-10:07:28.txt
-rw-r--r-- 1 root root 1.2M Sep  8  2014 08-SEP-2014-10:20:00.txt
-rw-r----- 1 root root 8.0K Nov 20 09:44 repdhosts.bdb
-rw-r--r-- 1 root root  74K Mar  9 10:53 09-MAR-2015-10:53:37.txt
-rw-r--r-- 1 root root 856K Mar  9 10:56 09-MAR-2015-10:56:42.txt
-rw-r--r-- 1 root root  77K Mar 13 19:21 13-MAR-2015-19:21:26.txt
-rw-r--r-- 1 root root 218K Mar 13 19:21 13-MAR-2015-19:21:44.txt
-rw-r----- 1 root root  16M Apr 22 12:19 log.0000007983
-rw-r----- 1 root root  24K Apr 22 20:42 __db.001
-rw-r--r-- 1 root root 115M Apr 22 20:42 oradev11.ldb
-rw-r----- 1 root root 8.0K Apr 22 20:43 crfconn.bdb
-rw-r--r-- 1 root root 777K Apr 22 20:45 22-APR-2015-20:45:53.txt
-rw-r----- 1 root root  56K Apr 22 20:56 __db.006
-rw-r----- 1 root root 392K Apr 22 20:56 __db.002
-rw-r----- 1 root root 812M Apr 22 20:56 crfloclts.bdb
-rw-r----- 1 root root 668M Apr 22 20:56 crfcpu.bdb
-rw-r----- 1 root root 743M Apr 22 20:56 crfalert.bdb
-rw-r----- 1 root root 526M Apr 22 20:56 crfts.bdb
-rw-r----- 1 root root 607M Apr 22 20:56 crfhosts.bdb
-rw-r----- 1 root root  34G Apr 22 20:56 crfclust.bdb
-rw-r----- 1 root root  16M Apr 22 20:56 log.0000007984
-rw-r----- 1 root root 1.2M Apr 22 20:56 __db.005
-rw-r----- 1 root root 2.1M Apr 22 20:56 __db.004
-rw-r----- 1 root root 2.6M Apr 22 20:56 __db.003

From the above output I see only “crfclust.bdb” is consuming lot of space, then I followed the steps given in the oracle doc to free up the space on the server


Stop ora.crf ……….

[root@oradev11 bin]# ./crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'oradev11'
CRS-2677: Stop of 'ora.crf' on 'oradev11' succeeded

[root@oradev11 oradev11]# rm crfclust.bdb

[root@oradev11 oradev11]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-rootlv
                      5.8G  1.8G  3.8G  33% /
tmpfs                 3.0G  854M  2.2G  28% /dev/shm
/dev/sda1             190M   86M   95M  48% /boot
/dev/mapper/rootvg-homelv
                      2.0G  9.2M  1.8G   1% /home
/dev/mapper/rootvg-optlv
                      9.8G  2.0G  7.3G  22% /opt
/dev/mapper/rootvg-securlv
                      1.5G  211M  1.2G  16% /opt/security
/dev/mapper/rootvg-tmplv
                      2.0G  376M  1.5G  21% /tmp
/dev/mapper/rootvg-varlv
                      9.8G  1.1G  8.2G  12% /var
/dev/mapper/datavg-gridbaselv
                       50G   13G   34G  28% /u01/app
/dev/mapper/datavg-rdbmsbaselv
                       50G  4.8G   42G  11% /u01/app/oracle
/dev/mapper/datavg-adrrepolv
                       50G  2.6G   45G   6% /oratrace
/dev/mapper/datavg-oemagentlv
                       20G  651M   18G   4% /u01/app/emagent
/dev/mapper/datavg-gglv
                       50G   52M   47G   1% /gg
/dev/mapper/datavg-dbawslv
                       99G   16G   79G  17% /oraworkspace
/dev/mapper/datavg-auditfslv
                       50G  231M   47G   1% /oradbaudit
/dev/mapper/datavg-dbtoolslv
                      9.8G   86M  9.2G   1% /oratools
/dev/asm/ggatevol-387
                       20G  562M   20G   3% /gg/GG11
                                                           
Start again………..

[root@oradev11 bin]# ./crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'oradev11'

CRS-2676: Start of 'ora.crf' on 'oradev11' succeeded

[root@oradev11 bin]# ./crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       oradev11           Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       oradev11
ora.crf
      1        ONLINE  ONLINE       oradev11
ora.crsd
      1        ONLINE  ONLINE       oradev11
ora.cssd
      1        ONLINE  ONLINE       oradev11
ora.cssdmonitor
      1        ONLINE  ONLINE       oradev11
ora.ctssd
      1        ONLINE  ONLINE       oradev11           OBSERVER
ora.diskmon
      1        OFFLINE OFFLINE
ora.drivers.acfs
      1        ONLINE  ONLINE       oradev11
ora.evmd
      1        ONLINE  ONLINE       oradev11
ora.gipcd
      1        ONLINE  ONLINE       oradev11
ora.gpnpd
      1        ONLINE  ONLINE       oradev11
ora.mdnsd
      1        ONLINE  ONLINE       oradev11


Now I see all the resources are up and running

Refer:
Oracle Cluster Health Monitor (CHM) using large amount of space (more than default) (Doc ID 1343105.1)



Auto Scroll Stop Scroll