Tuesday, November 1, 2016

PING[ARC3]: Heartbeat failed to connect to standby

Today after server patching (reboot of servers) my standby recovery stopped working and LAG keeps increasing.

Primary: prod on RHEL 6 11.2.0.3
Standby:  on RHEL 6 11.2.0.3

On STANDBY:

SQL> select status  from v$instance;

STATUS
------------
MOUNTED

SQL>  SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

PROCESS   STATUS
--------- ------------
ARCH      CONNECTED
ARCH      CONNECTED
ARCH      CONNECTED
ARCH      CONNECTED
MR(fg)    WAIT_FOR_LOG            (Don’t see RFS running)


Don’t see any archive GAPS and I see recovery is Running fine and applied all the logs whatever have on standby, NO archive log GAPS

Tried to switch logs on primary and they are not shipping, so first need to find out why logs not shipping to standby


Problem:   Archive logs NOT shipping to standby side

Things checked:

·         Tried stopping and restarting of recovery process (NO Luck)
·         Bounced standby database (NO Luck)
·         Checked archive log free space
·         Parameters looks good nothing changed
·         Log dest on primary ENABLED
·         Can able to ping server
·         Listener up and running


Primary alert log :
Error 1034 received logging on to the standby
Fri Sep 23 13:21:11 2016
Error 1034 received logging on to the standby
Fri Sep 23 13:22:14 2016
Error 1034 received logging on to the standby
Fri Sep 23 13:23:16 2016


Check error message

SQL> SELECT gvi.thread#, timestamp, message FROM gv$dataguard_status gvds, gv$instance gvi WHERE gvds.inst_id = gvi.inst_id AND severity in ('Error','Fatal') and rownum <11 ORDER BY timestamp, thread#;

THREAD#  TIMESTAMP    MESSAGE
---------- --------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         1 23-SEP-16 Error 1034 received logging on to the standby
         1 23-SEP-16 PING[ARC3]: Heartbeat failed to connect to standby 'primdb_sb1'. Error is 1034.
         1 23-SEP-16 Error 1034 received logging on to the standby
         1 23-SEP-16 PING[ARC3]: Heartbeat failed to connect to standby 'primdb_sb1'. Error is 1034.
         1 23-SEP-16 Error 1034 received logging on to the standby
         1 23-SEP-16 PING[ARC3]: Heartbeat failed to connect to standby 'primdb_sb1'. Error is 1034.
         1 23-SEP-16 Error 1034 received logging on to the standby
         1 23-SEP-16 PING[ARC3]: Heartbeat failed to connect to standby 'primdb_sb1'. Error is 1034.
         1 23-SEP-16 Error 1034 received logging on to the standby
         1 23-SEP-16 PING[ARC3]: Heartbeat failed to connect to standby 'primdb_sb1'. Error is 1034.

10 rows selected.


Solution:
After reviewing error messages from above output I again checked the listener on standby

oradev13:primdb:/opt/oracle/product/11.2.0.3.64/network/admin $ ps -e f| grep tns| grep -i prim
110542 ?        Ssl    0:37 /opt/oracle/product/12.1.0.2.64/bin/tnslsnr LSNR_DBPRIMDBSB1 -inherit



oradev13:primdb:/opt/oracle/product/11.2.0.3.64/network/admin $ lsnrctl status LSNR_DBPRIMDBSB1

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 23-SEP-2016 10:30:37

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbprimdbsb1)(PORT=1781)))
STATUS of the LISTENER
------------------------
Alias                     LSNR_DBPRIMDBSB1
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                22-SEP-2016 01:41:35
Uptime                    1 days 8 hr. 49 min. 2 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/12.1.0.2.64/network/admin/listener.ora
Listener Log File         /opt/oracle/diag/tnslsnr/oradev13/lsnr_dbprimdbsb1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.27.68)(PORT=1781)))
Services Summary...
Service "primdb" has 1 instance(s).
  Instance "primdb", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully



Notice my listener is running from 12c home but my primary and standby databases are on 11g version

oradev13:primdb:/opt/oracle/product/11.2.0.3.64/network/admin $ echo $ORACLE_HOME
/opt/oracle/product/11.2.0.3.64

oradev13:primdb:/opt/oracle/product/11.2.0.3.64/network/admin $ lsnrctl stop LSNR_DBPRIMDBSB1

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 23-SEP-2016 10:32:44

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbprimdbsb1)(PORT=1781)))
The command completed successfully


oradev13:primdb:/opt/oracle/product/11.2.0.3.64/network/admin $ lsnrctl start LSNR_DBPRIMDBSB1

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 23-SEP-2016 10:33:20

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /opt/oracle/product/11.2.0.3.64/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /opt/oracle/product/11.2.0.3.64/network/admin/listener.ora
Log messages written to /opt/oracle/diag/tnslsnr/oradev13/lsnr_dbprimdbsb1/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.27.68)(PORT=1781)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbprimdbsb1)(PORT=1781)))
STATUS of the LISTENER
------------------------
Alias                     LSNR_DBPRIMDBSB1
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                23-SEP-2016 10:33:20
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/oracle/product/11.2.0.3.64/network/admin/listener.ora
Listener Log File         /opt/oracle/diag/tnslsnr/oradev13/lsnr_dbprimdbsb1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.20.27.68)(PORT=1781)))
Services Summary...
Service "primdb" has 1 instance(s).
  Instance "primdb", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully



oradev13:primdb:/opt/oracle/product/11.2.0.3.64/network/admin $ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Fri Sep 23 10:33:39 2016

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;

PROCESS   STATUS
--------- ------------
ARCH      CONNECTED
ARCH      CONNECTED
ARCH      CONNECTED
ARCH      CONNECTED
MR(fg)    WAIT_FOR_GAP
RFS       RECEIVING
RFS       RECEIVING
RFS       RECEIVING

8 rows selected.

SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
(SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1;
  2    3    4
    Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
         1                  63578                 63576          2

SQL> /

    Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
         1                  63580                 63578          2

SQL> /

    Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
         1                  63621                 63621          0


That means from the above I can say that after my server patching is done someone had started this database from the wrong oracle home (THIS SERVER HOSTS 11G and 12c databases) this was not failed  because in my 12 home I have 11g listener also which is not supposed to be.


Now I see my database is in Sync and running fine. Hope this helps, Thanks J


0 comments:

Post a Comment

Auto Scroll Stop Scroll