📋Steps in this guide1/15

How I Set Up Archive Log Lag Monitoring in Oracle Data Guard?.. And Why You Should Too

Learn how to set up archive log lag monitoring in Oracle Data Guard using shell scripts, crontab, & real-time alerts to protect your standby database.

oracle configurationintermediate

by OracleDba

29 views

Why This Matters (And What Happened With Me)

A couple of years ago, I was handling a DR setup for a logistics client running round-the-clock operations. One night, the primary database went down , and we had to failover to the standby . But the standby was almost 2 hours behind because the MRP process had silently failed, and nobody noticed. It was a mess. After that, I made it a non-negotiable rule for myself: Always monitor log lag, transport delay, and apply lag, and alert yourself before your NOC does .

What Is Archive Log Lag, Again?

If you’re new to Oracle Data Guard or just never gave this much attention, let me quickly explain. There are two types of lags that you should monitor: - Transport Lag : Time delay in sending archive logs from primary to standby - Apply Lag : Delay in applying those logs to the standby DB For example, if your primary shipped archive log 1000 at 10:00 AM, and your standby applied it at 10:10 AM, that’s a 10-minute apply lag . Sometimes it’s 2 minutes, sometimes 2 hours, and without monitoring, you’ll never know.

How You Can Check Lag Manually

If you want to see your current lag, log in to the standby database and run this: ou’ll get output like: That means your logs are 3 minutes late in arriving and 7 minutes late in being applied.

Code/Command (click line numbers to comment):

sql
SELECT NAME, VALUE, UNIT 
FROM V$DATAGUARD_STATS 
WHERE NAME IN ('transport lag', 'apply lag');

sql
NAME VALUE UNIT -------------- ----------- --------------------- 
transport lag +00 00:03:00 day(2) to second(0) 
apply lag +00 00:07:00 day(2) to second(0)

How I Monitor Archive Log Lag Using a Shell Script

Here’s the simple shell script I use in most projects. It runs via cron every 5 minutes and shoots out a mail if the apply lag crosses a set threshold (say 10 minutes).

Script: archive_lag_check.sh

This small script has saved me so many times , especially in high-load windows like year-end processing or sudden redo surges.

Code/Command (click line numbers to comment):

bash
#!/bin/bash

# Set environment
. ~/.bash_profile

export ORACLE_SID=your_sid
export ORAENV_ASK=NO
. oraenv

THRESHOLD="00:10:00"

LAG=$(sqlplus -s / as sysdba <<EOF
SET pagesize 0 feedback off verify off heading off echo off
SELECT VALUE FROM V\\$DATAGUARD_STATS WHERE NAME='apply lag';
EOF
)

LAG=${LAG#+}
echo "Apply Lag: $LAG"

if [[ "$LAG" > "$THRESHOLD" ]]; then
  echo "ALERT: Apply Lag is $LAG" | mailx -s "Data Guard Apply Lag Alert"
[email protected]
fi

Automating It with Crontab

Add this to your crontab like: It checks every 5 minutes and logs the result. You can set up SMS alerts, Teams notifications, or Grafana dashboards depending on your infra.

Code/Command (click line numbers to comment):

1
2

bash
*/5 * * * * /u01/scripts/archive_lag_check.sh >> /u01/logs/lag_check.log 2>&1

Real-Time Log Sequence Tracking

Sometimes, I just want to know the log difference between primary and standby. Run this on both:

On Primary:

Code/Command (click line numbers to comment):

1
2

sql
SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE DEST_ID=1;

On Standby:

If primary is at 1020 and standby is at 1018, you’ve got a 2 log lag .

Code/Command (click line numbers to comment):

1
2
3

sql
SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE APPLIED='YES';
_ID=1;

Common Reasons for Lag (From My Experience)

I’ve seen several causes across client environments: - Slow network or firewall issues - Disk I/O bottlenecks on standby - Heavy DML activity causing redo spikes - MRP or LNS process stuck silently - Archivelog destination full or inaccessible Sometimes it’s as silly as an OS patch that accidentally disabled the listener. Always check alert logs and trace files when things feel off.

How Much Lag Is Okay?

Honestly, it depends on the business. - For banking or real-time trading , I’ve seen RPO < 30 seconds - For mid-level CRMs , 5–10 mins lag is tolerable - For daily batch systems , even 30–60 mins can be okay But never assume, always align your alerting threshold with the business team’s recovery expectations .

Pro Tip: Add This to Your DR Checklist

Here’s what I always advise my juniors and even clients: > “Your standby is your safety net. If you’re not watching it, it’s just a false sense of security.” “Your standby is your safety net. If you’re not watching it, it’s just a false sense of security.” Make archive lag checks part of your daily DBA checklist . Don’t leave it to chance.

Advanced Setup (For DevOps-Ready Teams)

If you’re using Prometheus + Grafana , you can integrate Oracle exporter and build dashboards that show: - Transport lag trends - Apply lag spikes - Log apply rate per hour - Real-time sequence difference This kind of visibility is gold when you’re handling multi-TB databases and mission-critical workloads .

Summary

Here’s a quick wrap-up checklist: - Monitor transport & apply lag using V$DATAGUARD_STATS - Use shell script + cron + mail alerts - Align thresholds with RPO needs - Regularly compare sequence# on both ends - Investigate alert logs when things look slow - Never assume things are fine unless you’ve checked

Final Thoughts

Being an Oracle DBA isn’t just about keeping the database “up”, it’s about being proactive . Archive log lag might sound like a small detail, but trust me, it can ruin your DR plan if ignored. If you found this helpful, feel free to drop a message. I can also share a follow-up post on Data Guard broker-based monitoring or using OEM custom metrics . Let’s keep learning, keep improving, and most importantly, keep our databases safe .

Comments (0)

Please to add comments

No comments yet. Be the first to comment!