Automated Binlog Archiving to Object Storage: A Production-Grade Blueprint for MySQL PITR

Q: Why does gtid_purged mismatch when I restore an archived chain?

A gtid_purged mismatch means the recovery target's executed set does not align with where the archived chain begins. Set gtid_purged to the base backup's GTID set on the fresh instance before replaying, and validate the overlap against the manifest's first interval.

Q: What happens if an upload's checksum does not match?

A checksum mismatch is non-retryable. The worker routes the segment to a dead-letter queue and pages instead of confirming it, leaving the segment un-archived until the corruption source is fixed.

Q: How do I prove the archive is actually recoverable?

Run scheduled dry-runs that resolve the object set, verify checksums, and diff GTID intervals for contiguity without applying anything, plus periodic full restore drills to measure real RTO.

The operational viability of Point-in-Time Recovery (PITR) in production MySQL environments hinges on the continuous, verifiable, and immutable preservation of binary logs. Relying exclusively on local disk retention introduces unacceptable failure modes: volume exhaustion triggers premature log rotation, underlying storage degradation silently corrupts sequential segments, and ephemeral infrastructure migrations routinely discard unreplicated transaction history. When any of these occur, your recovery point objective (RPO) collapses from seconds to hours — or the recovery becomes impossible because the GTID chain has a hole in it. Modern database reliability engineering therefore requires an automated, idempotent pipeline that streams completed binlog segments to durable object storage while preserving strict GTID continuity. This architecture cleanly decouples real-time transaction logging from long-term retention, enabling recovery windows that span months or years without introducing I/O contention on primary nodes, and it turns raw log files into an auditable, replay-ready recovery asset.

This guide is the top-level reference for the archiving side of the platform. It assumes familiarity with the mechanics documented under MySQL Binary Log Architecture & GTID Fundamentals — event serialization, GTID coordinate spaces, and format selection — and builds the automation, transport, retention, security, and recovery layers on top of them.

Visual Overview

Event & Data Model: What the Archiver Actually Observes

An archiving pipeline is only as reliable as its understanding of the server-side objects it tracks. The binary log is not a single file but an ordered sequence of segments named by a monotonically increasing suffix (mysql-bin.000001, mysql-bin.000002, …), indexed by a plaintext catalog file (mysql-bin.index) that lists every active segment in rotation order. The server appends events to the current segment until it crosses max_binlog_size or receives an explicit FLUSH BINARY LOGS, at which point it closes the segment and opens the next. Only closed segments are safe to archive; the active tail is still being written and its final size, checksum, and terminating Rotate event are not yet stable.

The archiver’s authoritative view of segment state comes from three server surfaces:

-- MySQL 8.0.22+
SHOW BINARY LOGS;              -- Log_name, File_size, Encrypted columns
SHOW MASTER STATUS;           -- current File, Position, Executed_Gtid_Set
SELECT @@GLOBAL.gtid_executed; -- full GTID set persisted so far

SHOW BINARY LOGS enumerates every segment the server still retains locally along with its byte size and (on 8.0.14+) its encryption state. The last row is always the active segment; every row above it is closed and eligible for capture. Cross-referencing each closed segment’s File_size against max_binlog_size confirms the server rotated it deliberately rather than truncating it during a crash. The Executed_Gtid_Set reported by SHOW MASTER STATUS is the linchpin that lets the archiver stamp each captured file with the exact GTID interval it contains — the coordinate that a later recovery uses to prove the chain is gap-free.

A critical modeling decision is that binlog_expire_logs_seconds is a local retention control, not an archiving signal. In MySQL 8.0+ it supersedes the deprecated expire_logs_days and lets the server purge segments older than the configured window. The archiving daemon must treat expiry purely as a deadline it races against — every closed segment must be safely in object storage before the server is entitled to delete it. Modeling this correctly is where naive rsync-style scripts fail: they synchronize whatever files happen to be on disk, so a slow upload plus an aggressive expiry window silently drops segments and leaves a hole in the recovery chain. Setting these thresholds correctly is covered in depth under Binlog Retention Boundaries.

The event format inside each segment also constrains the pipeline. PITR replay is only deterministic when the server logs row images rather than statements, so the archiver should validate that binlog_format = ROW at startup and refuse to run otherwise — the reasoning behind that choice is detailed in ROW vs STATEMENT vs MIXED Formats. Row-based segments are larger and more compressible, which shapes the compression and bandwidth decisions later in the pipeline.

Architecture & Configuration

The reference architecture is a stateful observer running adjacent to (or on) each primary, structured as three decoupled stages: detect closed segments, enqueue them with their GTID metadata, and transfer them durably. Decoupling detection from transfer is not a stylistic choice — it prevents a slow or throttled upload from blocking the loop that watches for new rotations, which is the difference between an archiver that keeps pace under write pressure and one that falls progressively behind until expiry overtakes it.

Server-side parameters

The primary must be configured so that segments rotate predictably, carry GTID metadata, and are never purged before capture:

# my.cnf — MySQL 8.0.22+ / 8.4
[mysqld]
server_id                    = 41
log_bin                      = /var/lib/mysql/mysql-bin
binlog_format                = ROW            # deterministic row images for replay
binlog_row_image             = FULL           # complete before/after images
gtid_mode                    = ON             # GTID coordinates, not file:pos
enforce_gtid_consistency     = ON             # reject non-deterministic statements
max_binlog_size              = 268435456      # 256 MiB segments — capture-friendly units
binlog_expire_logs_seconds   = 259200         # 3-day local safety net; archive well inside it
binlog_checksum              = CRC32          # per-event integrity, verified on replay
sync_binlog                  = 1              # durable flush before ack

max_binlog_size is a throughput lever as much as a sizing one: smaller segments rotate more often, shortening the window between a transaction committing and its segment becoming archivable, at the cost of more upload operations. For most write-heavy primaries, 128–256 MiB segments balance capture latency against per-object overhead. binlog_expire_logs_seconds should always be several multiples of your worst-case archiving lag so that a transient object-storage outage cannot force the server to purge an un-archived segment. The safe interaction between rotation cadence and archiver wake cycles is the subject of Rotation Scheduling & Cron Automation.

Object storage layout

Object storage functions as the immutable ledger of binlog history. Lay it out so that recovery is an index lookup, not a directory scan:

s3://binlogs-prod/<cluster>/<server_uuid>/
    segments/mysql-bin.000041.zst.enc
    segments/mysql-bin.000042.zst.enc
    manifests/manifest.jsonl        # one line per archived segment

Namespacing by server_uuid (not hostname) keeps the archive stable across host replacements and makes multi-primary topologies unambiguous, since the server_uuid is exactly the source identifier embedded in every GTID. Enabling object-lock / write-once-read-many retention on the bucket enforces immutability at the storage layer so that a compromised or buggy agent cannot rewrite history. Abstracting the provider-specific transport — bucket regioning, multipart thresholds, checksum algorithms — behind a single interface is covered in AWS S3 & GCS Sync Pipelines.

Python Automation Layer

The automation layer is a typed, pooled, retried daemon. It persists an idempotent state manifest — typically at /var/lib/mysql-binlog-archiver/state.json — recording, per segment, the filename, the Executed_Gtid_Set captured at rotation, a SHA-256 digest of the local file, and the upload outcome. Persisting this across restarts is what delivers exactly-once semantics: on boot the daemon reconciles the manifest against SHOW BINARY LOGS and re-enqueues only segments that are closed, not yet confirmed uploaded, and still present on disk.

# archiver/detector.py — Python 3.10+
from __future__ import annotations

import hashlib
import json
from dataclasses import dataclass, field
from pathlib import Path

import mysql.connector
from mysql.connector.pooling import MySQLConnectionPool

POOL = MySQLConnectionPool(pool_name="archiver", pool_size=4, autocommit=True)


@dataclass(slots=True, frozen=True)
class Segment:
    """A closed binlog segment eligible for archiving."""
    name: str
    size: int
    gtid_set: str
    sha256: str


@dataclass(slots=True)
class ArchiveState:
    """Idempotent, restart-safe record of what has been durably stored."""
    path: Path
    uploaded: dict[str, str] = field(default_factory=dict)  # name -> sha256

    @classmethod
    def load(cls, path: str | Path) -> "ArchiveState":
        p = Path(path)
        if p.exists():
            data = json.loads(p.read_text())
            return cls(path=p, uploaded=data.get("uploaded", {}))
        return cls(path=p)

    def confirm(self, seg: Segment) -> None:
        self.uploaded[seg.name] = seg.sha256
        tmp = self.path.with_suffix(".tmp")
        tmp.write_text(json.dumps({"uploaded": self.uploaded}, indent=2))
        tmp.replace(self.path)  # atomic swap — never leave a half-written manifest


def _digest(path: Path) -> str:
    h = hashlib.sha256()
    with path.open("rb") as fh:
        while chunk := fh.read(1 << 20):   # walrus: stream in 1 MiB blocks
            h.update(chunk)
    return h.hexdigest()


def closed_segments(binlog_dir: Path, state: ArchiveState) -> list[Segment]:
    """Return closed segments that are not yet confirmed uploaded, in order."""
    conn = POOL.get_connection()
    try:
        cur = conn.cursor(dictionary=True)
        cur.execute("SHOW BINARY LOGS;")              # MySQL 8.0.22+
        rows = cur.fetchall()
        cur.execute("SELECT @@GLOBAL.gtid_executed AS g;")
        gtid_executed = cur.fetchone()["g"]
    finally:
        conn.close()

    pending: list[Segment] = []
    for row in rows[:-1]:                              # drop the active tail segment
        name = row["Log_name"]
        if name in state.uploaded:
            continue
        local = binlog_dir / name
        if not local.exists():
            continue
        pending.append(
            Segment(
                name=name,
                size=int(row["File_size"]),
                gtid_set=gtid_executed,
                sha256=_digest(local),
            )
        )
    return pending

The transfer stage runs on asyncio so that a stalled upload never blocks detection. Every network operation is wrapped in tenacity retries with exponential backoff and jitter, and the queue enforces strict FIFO ordering because uploading mysql-bin.000042 before mysql-bin.000041 breaks the sequential dependency mysqlbinlog relies on during replay:

# archiver/transfer.py — Python 3.10+
import asyncio
import logging

from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential,
)

from .detector import ArchiveState, Segment

log = logging.getLogger("archiver.transfer")


class TransientUploadError(RuntimeError):
    """Retryable transport failure (5xx, throttling, connection reset)."""


@retry(
    retry=retry_if_exception_type(TransientUploadError),
    wait=wait_random_exponential(multiplier=1, max=60),
    stop=stop_after_attempt(8),
    reraise=True,
)
async def _upload(seg: Segment, store) -> None:
    # store.put performs a multipart upload with a server-verified checksum;
    # a checksum mismatch raises a non-retryable error and poisons the DLQ.
    await store.put(seg)
    log.info("archived %s (%d bytes) gtid=%s", seg.name, seg.size, seg.gtid_set)


async def drain(queue: asyncio.Queue[Segment], store, state: ArchiveState) -> None:
    """Single ordered worker: preserves segment sequence during replay."""
    while True:
        seg = await queue.get()
        try:
            await _upload(seg, store)
            state.confirm(seg)               # manifest write only after durable ack
        except Exception:                    # noqa: BLE001 — route to dead-letter queue
            log.exception("permanent failure archiving %s", seg.name)
            await store.dead_letter(seg)
        finally:
            queue.task_done()

Both the compression/encryption transform inside store.put and the retry taxonomy behind TransientUploadError are large enough topics to have their own references — see Compression & Encryption Workflows and Error Handling & Retry Logic. The queue plumbing, backpressure, and worker sizing that keep this loop stable under load are the subject of Async Processing & Queue Management.

Operational Boundaries & Retention

An archive without boundaries is a liability: it either grows without limit and inflates storage cost, or it purges too aggressively and destroys recovery capability. The pipeline enforces two independent retention horizons. Locally, binlog_expire_logs_seconds caps disk usage but must never fire before a segment is confirmed in the manifest — so the daemon should refuse to let the server purge, and alert, if SHOW BINARY LOGS still lists a segment whose upload has not been confirmed. Remotely, object-storage lifecycle rules govern long-term retention and cold-tiering, and these are the horizon that actually defines your recovery window.

The manifest is the index that makes the archive usable. Each line records the segment name, its GTID interval, byte size, SHA-256, compressed/encrypted object key, and archive timestamp:

{"segment":"mysql-bin.000041","gtid":"3E11FA47-...:1-90350","key":"segments/mysql-bin.000041.zst.enc","sha256":"9f2c...","bytes":268435456,"archived_at":"2026-07-04T09:12:03Z"}
{"segment":"mysql-bin.000042","gtid":"3E11FA47-...:90351-181120","key":"segments/mysql-bin.000042.zst.enc","sha256":"a71b...","bytes":268435456,"archived_at":"2026-07-04T09:19:41Z"}

Because each line carries the GTID interval, a recovery orchestrator can map any target GTID to the exact set of objects it must download in O(log n) rather than scanning the whole bucket. The one non-negotiable cross-reference is with your base backups: the archived chain is only useful if it overlaps the GTID position captured by the most recent physical or logical snapshot. If the oldest retained binlog begins after the newest usable backup ends, there is an unrecoverable gap. Keeping those horizons overlapping is the entire point of Base Backup Integration for PITR, and it must be validated continuously, not assumed.

A weekly reconciliation job should walk the manifest, confirm every referenced object still exists with a matching checksum, and assert that the GTID intervals form a contiguous, gap-free set from the base-backup position to the live tail. Any discontinuity is a recovery-blocking incident and should page immediately.

Security & Compliance Hardening

Binary logs contain the full plaintext of every row change — including PII, credentials, and financial data — so the archive is one of the most sensitive artifacts the platform produces. Hardening spans four planes: identity, transport, at-rest, and audit.

Privilege separation. The archiver reads binlog metadata and files; it must never hold write access to the database. Grant it the minimal set:

-- MySQL 8.0.22+
CREATE USER 'binlog_archiver'@'localhost' IDENTIFIED BY '<generated>';
GRANT REPLICATION CLIENT ON *.* TO 'binlog_archiver'@'localhost';  -- SHOW BINARY LOGS / MASTER STATUS
-- No SUPER, no RELOAD, no FILE, no DDL. Reading segment bytes is a filesystem grant, not a SQL one.

Filesystem access to the binlog directory is granted through a dedicated OS group so the daemon can read closed segments without running as mysql or root. The broader privilege model — roles, network ACLs, and credential scoping for binlog access — is documented in Security & Access Frameworks.

Encryption in transit and at rest. All uploads use TLS 1.2+, and every segment is encrypted client-side before it leaves the host so that plaintext row images never exist in object storage even momentarily. Envelope encryption with a KMS-managed data key is the recommended pattern; the concrete key-rotation and cipher choices live in Compression & Encryption Workflows.

Secret management. Object-storage and KMS credentials must be short-lived and delivered at runtime — an instance role, a Vault dynamic secret, or a workload-identity token — never a static key baked into config or an environment file committed to a repo. The daemon should refuse to start if it detects a long-lived static credential in its environment.

Audit hooks. Every archive and every recovery-time fetch emits a structured, tamper-evident log line (who, which object, which GTID interval, what checksum). Combined with bucket object-lock, this produces the audit trail that SOC 2, PCI-DSS, and similar regimes expect for a system that persists transaction history.

Performance & Scale Tuning

The pipeline must keep archiving latency comfortably below the local retention window on your busiest primary. On a write-heavy 8.0 node rotating 256 MiB segments every 40–90 seconds under load, a single ordered worker doing zstd level 3 compression plus AES-GCM encryption plus an 8 MiB-part multipart upload sustains roughly 250–400 MiB/s per host on modern hardware — enough headroom that archiving lag stays in the low single-digit seconds. The knobs that matter, in order of impact:

Compression level. zstd level 1–3 typically shrinks ROW binlogs by 60–80% while staying CPU-cheap; higher levels buy marginal ratio gains at steep CPU cost. Because row images compress well, compression is almost always a net egress-cost win rather than a bottleneck. The trade-off surface is mapped in Compression & Encryption Workflows.
Multipart part size. 8–16 MiB parts balance throughput against retry granularity: smaller parts re-transmit cheaply after a transient failure but add request overhead; larger parts maximize per-connection throughput.
Ordered vs parallel workers. Sequence integrity requires that segments commit to the manifest in order, but the transform stage (compress + encrypt) can be parallelized ahead of an ordered upload barrier. A bounded worker pool feeding a single ordered committer preserves replay ordering while using all available cores.
Queue depth and backpressure. The asyncio.Queue must be bounded so that a downstream stall applies backpressure to detection rather than exhausting memory. Tuning depth, worker count, and backpressure signals is the focus of Async Processing & Queue Management.

Track archiving lag — the age of the oldest closed-but-unconfirmed segment — as your primary SLI, and alert well before it approaches binlog_expire_logs_seconds. That single metric captures every failure mode that threatens the recovery window.

Recovery Orchestration & Fallback Routing

Archiving exists to serve recovery, and a chain that has never been replayed is a chain you do not actually have. Recovery orchestration starts from a base backup’s GTID position, resolves the ordered set of objects that carry the interval from that position to the target, downloads and decrypts them, verifies each SHA-256 against the manifest, and streams them through mysqlbinlog with a precise stop condition:

# Replay archived segments from the base-backup GTID position to a target time.
# MySQL 8.0.22+
mysqlbinlog --verify-binlog-checksum \
  --start-position=4 \
  --stop-datetime="2026-07-04 09:17:30" \
  mysql-bin.000041 mysql-bin.000042 \
  | mysql --host=recovery-target

Operators rarely replay to the live tail; they target a transaction boundary just before a faulty deployment or an accidental DELETE. Selecting the exact --stop-datetime or --stop-position, and reasoning about the difference between the two, is covered in Timestamp Targeting Strategies.

Three orchestration properties separate a real system from a hopeful one:

Dry-run mode. The orchestrator can resolve the object set, verify every checksum, and diff the GTID intervals for gaps without applying anything — turning “can we recover?” from a faith statement into a scheduled, passing test.
GTID gap handling. Before replay, diff the required interval against the union of manifest intervals. Any missing sub-range is surfaced as an explicit, named gap rather than discovered mid-replay when gtid_next rejects a non-contiguous transaction.
Graceful degradation. If the archive cannot satisfy a target — a gap, a corrupt object, an unreachable bucket — the orchestrator falls back to the most recent physical backup and reports the resulting RPO honestly instead of silently applying a partial chain. The decision logic and routing for that fallback is documented in Fallback Routing Strategies.

Frequently Asked Questions

Why does gtid_purged mismatch when I restore an archived chain?

gtid_purged mismatches almost always mean the recovery target’s executed set does not line up with where the archived chain begins. If you restore a base backup and then replay binlogs, the target’s gtid_executed must exactly equal the backup’s captured position before replay starts. A mismatch signals either that the base backup and the first archived segment do not overlap (a genuine gap), or that gtid_purged was not set on the restored instance. Set gtid_purged to the backup’s GTID set on the fresh instance before replaying, and validate the overlap with the manifest’s first interval.

Is binlog_expire_logs_seconds a safe trigger for archiving?

No. It is a local-disk retention deadline, not an archiving signal. The archiver must capture and confirm each closed segment well before expiry fires. Treating expiry as a trigger means you only archive files at the moment the server is about to delete them — leaving zero margin for a slow upload or a transient outage. Set the expiry window to several multiples of your worst-case archiving lag and alert if any un-confirmed segment approaches it.

Can I upload binlog segments in parallel to go faster?

You can compress and encrypt in parallel, but segments must commit to the manifest in order. Confirming mysql-bin.000042 before mysql-bin.000041 records a chain that mysqlbinlog cannot replay contiguously, because GTID replay depends on strict sequence. Use a parallel transform stage feeding a single ordered commit barrier — you get multi-core throughput without breaking replay ordering.

What happens if an upload’s checksum does not match?

A checksum mismatch is a non-retryable failure: retrying will not fix a corrupted payload. The worker routes the segment to a dead-letter queue and pages, rather than confirming it in the manifest. The segment stays un-confirmed, so it is neither counted as archived nor eligible to satisfy a recovery. Investigate the source (disk corruption, truncated read, encryption bug) before re-archiving from the local file while it still exists on disk.

Do I need binlog_format=ROW for this to work?

For deterministic PITR, yes. Statement-based logs replay non-deterministic functions (NOW(), UUID(), RAND()) differently on the recovery target, producing drift. The archiver should assert binlog_format = ROW and binlog_row_image = FULL at startup and refuse to run otherwise. The full comparison of the trade-offs is in ROW vs STATEMENT vs MIXED Formats.

How do I prove the archive is actually recoverable?

Run a scheduled dry-run: resolve the object set for a recent target, verify every checksum against the manifest, and diff the GTID intervals for contiguity — all without applying anything. Escalate to a periodic full restore drill into a throwaway instance to measure real RTO. A chain that has never been replayed end-to-end should not be trusted, regardless of how green the upload dashboards look.

Async Processing & Queue Management — bounded queues, backpressure, and worker sizing for the transfer stage.
AWS S3 & GCS Sync Pipelines — provider-abstracted multipart transport and checksum validation.
Compression & Encryption Workflows — zstd tuning and client-side envelope encryption before upload.
Base Backup Integration for PITR — keeping the archived chain overlapping your snapshot’s GTID position.
Timestamp Targeting Strategies — selecting the precise --stop-datetime / --stop-position for surgical recovery.

Back to MySQL Binary Log Architecture & GTID Fundamentals · Explore all topics from the site home.

Automated Binlog Archiving to Object Storage: A Production-Grade Blueprint for MySQL PITR #

Visual Overview #

Event & Data Model: What the Archiver Actually Observes #

Architecture & Configuration #

Server-side parameters #

Object storage layout #

Python Automation Layer #

Operational Boundaries & Retention #

Security & Compliance Hardening #

Performance & Scale Tuning #

Recovery Orchestration & Fallback Routing #

Frequently Asked Questions #

Why does gtid_purged mismatch when I restore an archived chain? #

Is binlog_expire_logs_seconds a safe trigger for archiving? #

Can I upload binlog segments in parallel to go faster? #

What happens if an upload’s checksum does not match? #

Do I need binlog_format=ROW for this to work? #

How do I prove the archive is actually recoverable? #

Related #

Explore this section

Async Processing & Queue Management for Binary Log Archiving and PITR Automation

AWS S3 & GCS Sync Pipelines for MySQL Binary Log Archiving and PITR Automation

Base Backup Integration for PITR: Anchoring Binary Log Archives to a Verifiable Recovery Coordinate

Compression & Encryption Workflows for MySQL Binary Log Archiving and PITR Automation

Error Handling & Retry Logic for MySQL Binary Log Archiving and PITR Automation

Rotation Scheduling & Cron Automation for MySQL Binary Log Archiving and PITR

Timestamp Targeting Strategies for MySQL Binary Log Archiving and PITR Automation