Overview:
An archive gap is a
range of missing archived redo logs created whenever the standby system is
unable to receive the next archived redo log generated by the
primary db.
For example, an archive gap
occurs when the network becomes unavailable and automatic archiving from the
primary database to the standby database stops. When the network is available
again, automatic transmission of the redo data from the primary database to the
failed standby database resumes.
Possible and common Causes for Archive Gaps are:
- Network Disconnects
or stop of Log Transport Services
- Outages of the
Standby Database
- Misconfigurations of
Log Transport Services
- I/O-Issues on the
Standby Site
- Manual Deletion of
ArchiveLogs before they are applied to the Standby
- Insufficient
Bandwith in the Network between the Primary and Standby Site
Methods of Gap Resolution:
Data
Guard provides 2 methods for gap resolution, Automatic and FAL (Fetch Archive Log). The
automatic method requires no configuration while FAL requires configuration via
init.ora parameters.
Automatic Gap Resolution:
In both 9.0.1 and
9.2.0 Automatic Gap Resolution is implemented during log transport processing.
As the LGWR or ARCH process begins to send redo over to the standby, the sequence
number of the log being archived is compared to the last sequence received by
the RFS process on the standby. If the RFS process detects that the archive log
being received is greater than the last sequence received plus one, then the
RFS will piggyback a request to the primary to send the missing archive logs.
Since the standby destination requesting the gap resolution is already defined
by the LOG_ARCHIVE_DEST_n parameter on the primary, the ARCH
process on the primary sends the logs to the standby and notifies the LGWR that
the gaps have been resolved.
Starting
in 9.2.0, automatic gap resolution has been enhanced. In addition to the
above, the ARCH process on the primary database polls all standby databases
every minute to see if there is a gap in the sequence of archived redo logs. If
a gap is detected then the ARCH process sends the missing archived redo log
files to the standby databases that reported the gap. Once the gap is resolved,
the LGWR process is notified that the site is up to date.
FAL Gap Resolution:
As the RFS process on
the standby receives an archived log, it updates the standby controlfile with
the name and location of the file. Once the MRP
(Media Recovery Process) sees
the update to the controlfile, it attempts to recover that file. If the MRP
process finds that the archived log is missing or is corrupt, FAL is called to
resolve the gap or obtain a new copy. Since MRP has no direct communications
link with the primary, it must use the FAL_SERVER and FAL_CLIENT initialization parameters to resolve
the gap.Both of these parameters must be set in the standby init.ora. The 2
parameters are defined as:
FAL_SERVER: An OracleNet service
name that exist in the standby tnsnames.ora file that points to the primary
db listener. The FAL_SERVER parameter can contain a comma delimited list of
locations that should be attempted during gap resolution.
FAL_CLIENT: An OracleNet service
name that exist in the primary tnsnames.ora file that points to the standby
database listener. The value of FAL_CLIENT should also be listed as the service
in a remote archive destination pointing to the standby.
Once MRP needs to resolve a gap
it uses the value from FAL_SERVER to call the primary database. Once
communication with the primary has been established, MRP passes the FAL_CLIENT
value to the primary ARCH process. The primary ARCH process locates the remote
archive destination with the corresponding service name and ships the missing
archived redo logs. If the first destination listed in FAL_SERVER is unable to
resolve the gap then the next destination is attempted until either the gap is
resolved or all FAL_SERVER destination have been tried.
As of 9.2.0 FAL Gap Resolution
only works with Physical Standby databases as the process is tied to MRP. Gap
recovery on a logical standby database is handled through the
heartbeat mechanism.
Simulating Gap Recovery
The follow steps can
be used to illustrate and verify both automatic and FAL gap recovery. As the
steps involve shutting down the standby database, which can impact disaster
recovery, it is recommended to perform these procedures in a test environment.
Automatic Gap Resolution:
1. Shutdown the physical standby
database.
2. Determine the
current sequence on the primary database.
3. Perform at least
three log switches on the primary database.
4. Verify that the
logs did not get transferred to the standby archive dest.
5. Start the standby
database.
6. Perform a log
switch on the primary and verify that the gap gets resolved on the standby.
FAL Gap Resolution:
1. In the standby init.ora define
the fal_server and fal_client parameters.
2. Bounce the standby
database so that the parameters are put into effect.
3. Perform three log
switches on the primary database.
4. In the
standby_archive_dest directory delete the middle archive log on the standby.
5. Start managed
recovery and verify that the gap is resolved by FAL_SERVER and FAL_CLIENT.
Manually Resolving a Gap:
In some rare cases it
might be necessary to manually resolve gaps. The following section describes how to
query the appropriate views to determine if a gap exists.
On your physical standby database:
Query the V$ARCHIVE_GAP view:
SQL> SELECT * FROM
V$ARCHIVE_GAP;
THREAD# LOW_SEQUENCE#
HIGH_SEQUENCE#
-----------
-------------
--------------
1
443
446
The query results show that your
physical standby database is currently missing logs from sequence 443 to
sequence 446 for thread 1. After you identify the gap, issue the following
SQL statement on the primary database to locate the archived redo logs on
your primary database:
SQL> SELECT NAME FROM
V$ARCHIVED_LOG WHERE THREAD#=1 AND
DEST_ID=1 AND SEQUENCE# BETWEEN 443 AND 446;
NAME
-------------------------------------------------------
/u01/oradata/arch/arch_1_443.arc
/u01/oradata/arch/arch_1_444.arc
/u01/oradata/arch/arch_1_445.arc
Copy the logs returned by the
query to your physical standby database and
register using the
ALTER DATABASE REGISTER LOGFILE command.
SQL> ALTER DATABASE
REGISTERLOGFILE '/u01/oradata/stby/arch/arch_1_443.arc';
SQL> ALTER
DATABASE REGISTER LOGFILE '/u01/oradata/stby/arch/arch_1_444.arc';
SQL> ALTER
DATABASE REGISTER LOGFILE '/u01/oradata/stby/arch/arch_1_445.arc';
Once the log files have been
registered in the standby controlfile, you can restart the MRP process.
On a logical standby
database:
Query the DBA_LOGSTDBY_LOG view.
SQL> SELECT THREAD#, SEQUENCE#, FILE_NAME FROM DBA_LOGSTDBY_LOG
L WHERE NEXT_CHANGE# NOT IN (SELECT
FIRST_CHANGE# FROM DBA_LOGSTDBY_LOG WHERE L.THREAD# =THREAD#) ORDER
BY THREAD#,SEQUENCE#;
THREAD#
SEQUENCE#
FILE_NAME
----------
----------
-----------------------------------------------
1
451
/u01/oradata/logical_stby/arch/arch_1_451.arc
1
453
/u01/oradata/logical_stby/arch/arch_1_453.arc
Copy
the missing logs to the logical standby system and register them using the ALTER DATABASE REGISTER LOGICAL
LOGFILE statement on your
logical standby db.
SQL>
ALTER DATABASE REGISTER LOGICAL LOGFILE
/u01/oradata/logical_stby/arch/arch_1_452.arc;
After
you register these logs on the logical standby database, you can restart log apply services.
Ref:
https://sysdbagurus.wordpress.com/2017/04/03/data-guard-gap-detection-and-resolution/
Data Guard Gap Detection and Resolution Possibilities (Doc ID 1537316.1)