How do conditional writes prevent a duplicate delivery from corrupting an object?

Both backends create with a precondition (S3 If-None-Match:* and GCS if_generation_match=0) that the provider evaluates atomically. An existing key is rejected with 412 before any bytes replace it, so a redelivered or retried task is a verified no-op instead of an overwrite.

Can I upload segments in parallel without breaking PITR ordering?

Yes for transport, no for the commit. Run a worker per server_uuid and pipeline compress/encrypt/upload, but funnel every manifest commit through a single ordered barrier that advances strictly in sequence so mysqlbinlog can replay the chain contiguously.

AWS S3 & GCS Sync Pipelines for MySQL Binary Log Archiving and PITR Automation

The transport layer is where a binary log archive quietly becomes unrecoverable. Once the Compression & Encryption Workflows stage has sealed a closed segment, the sync pipeline is responsible for one deceptively hard guarantee: every byte MySQL committed lands in durable object storage exactly once, in strict sequence order, with a checksum the recovery run can trust — and it must survive rate limits, DNS blips, and regional endpoint outages without ever confirming a corrupt or missing object. This page defines a provider-abstracted pipeline that routes to both AWS S3 and Google Cloud Storage (GCS) behind a single interface, so a single-provider outage degrades to a second target instead of a stalled binlog_expire_logs_seconds purge and a hole in your Point-in-Time Recovery (PITR) chain. Naive approaches fail because they treat the upload as a fire-and-forget PUT: they ignore the ETag/CRC the provider already computed, retry non-retryable corruption, and let a duplicate delivery overwrite a good object with a truncated one. This is a component of the broader Automated Binlog Archiving to Object Storage pipeline and assumes segments arrive from the Async Processing & Queue Management worker pool in per-instance order.

Visual Overview

Core Concept & Prerequisites

The pipeline operates on a strict pull-and-commit model: it never assumes network stability or local disk persistence, and it treats the object store — not local disk — as the system of record only after a checksum has been verified back from the provider. A lightweight state manifest (an embedded SQLite database or an atomically-rewritten JSON file co-located with the datadir) records the last successfully committed binlog index, the remote checksum, and the ingestion timestamp. Restarting after a host crash, OOM kill, or network partition resumes exactly where it left off without duplicating artifacts or skipping a sequence number.

Ordering is the non-negotiable invariant. Replaying mysql-bin.000042 before mysql-bin.000041 corrupts the recovery chain and makes Timestamp Targeting Strategies mathematically impossible, so the manifest commits segments strictly in sequence even when uploads run concurrently. A gap-free GTID Tracking & Enforcement pipeline upstream is what makes that ordering meaningful — every segment must carry a resolvable gtid_executed range, or ordered transport buys you nothing.

Prerequisites for the pipeline described here:

MySQL 8.0.22+ — for the Encrypted column in SHOW BINARY LOGS and for SHOW BINARY LOG STATUS (the modern spelling of SHOW MASTER STATUS), used to distinguish closed segments from the active tail.
GTID mode enabled — gtid_mode=ON and enforce_gtid_consistency=ON, so each object’s manifest row carries the gtid_executed interval the recovery orchestrator diffs for contiguity.
binlog_format=ROW — deterministic replay depends on it; the trade-offs are covered in ROW vs STATEMENT vs MIXED Formats.
Python 3.10+ with boto3 for the S3-compatible transport, google-cloud-storage for GCS, tenacity for backoff/jitter, and mysql-connector-python for pooled reads of the server’s binlog state. Least-privilege upload credentials are scoped per the Security & Access Frameworks guidance.

The synchronization cycle executes four deterministic phases. Discovery enumerates candidate segments with SHOW BINARY LOGS. Validation cross-references SHOW BINARY LOG STATUS and considers only logs strictly preceding the active file — the active tail is deferred to prevent partial reads and replication stalls. Transport stages, routes, and conditionally uploads each closed segment. Commit updates the manifest atomically only after the provider’s checksum matches the locally computed digest; a failed commit rolls back and reschedules.

Production-Grade Python Implementation

The module below is complete and runnable. It abstracts provider quirks behind a StorageBackend protocol so the routing logic never branches on vendor; both backends enforce a conditional write (S3 If-None-Match: *, GCS if_generation_match=0) so a redelivered task can never clobber an existing object, and both read the checksum back from the stored object to catch silent corruption before the manifest is committed. A match statement selects the active target, tenacity supplies exponential backoff with jitter, and a simple in-process circuit breaker fails over from the primary cloud to the secondary after repeated failures. Structured JSON logging lets a single segment be traced from discovery to finalization.

#!/usr/bin/env python3
"""Provider-abstracted binlog sync — S3 + GCS with conditional writes.
Targets: MySQL 8.0.22+, Python 3.10+.
Requires: boto3, google-cloud-storage, tenacity, mysql-connector-python.
"""
from __future__ import annotations

import base64
import hashlib
import json
import logging
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Protocol

import boto3
from botocore.exceptions import ClientError
from google.api_core.exceptions import GoogleAPICallError, PreconditionFailed
from google.cloud import storage as gcs
from tenacity import (
    retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception_type,
)


# ---- structured logging -------------------------------------------------
class JsonFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        payload = {"level": record.levelname, "event": record.getMessage()}
        if isinstance(record.args, dict):
            payload |= record.args
        return json.dumps(payload)


_handler = logging.StreamHandler()
_handler.setFormatter(JsonFormatter())
logger = logging.getLogger("binlog_sync")
logger.addHandler(_handler)
logger.setLevel(logging.INFO)


class UploadConflict(Exception):
    """Object already exists — treated as an idempotent no-op, never retried."""


class ChecksumMismatch(Exception):
    """Stored object digest != local digest — non-retryable corruption."""


@dataclass(slots=True, frozen=True)
class Segment:
    server_uuid: str
    binlog_name: str        # e.g. mysql-bin.000041
    local_path: Path
    gtid_range: str
    sha256: str
    year: int
    month: int

    @property
    def object_key(self) -> str:
        # Deterministic key: same segment always resolves to the same path.
        return (f"mysql-binlogs/{self.server_uuid}/"
                f"{self.year:04d}/{self.month:02d}/{self.binlog_name}.zst.enc")


class StorageBackend(Protocol):
    name: str
    def put_if_absent(self, seg: Segment) -> None: ...
    def stored_sha256(self, seg: Segment) -> str | None: ...


# ---- AWS S3 -------------------------------------------------------------
class S3Backend:
    name = "s3"

    def __init__(self, bucket: str) -> None:
        self._bucket = bucket
        self._client = boto3.client("s3")

    def put_if_absent(self, seg: Segment) -> None:
        try:
            self._client.upload_file(
                str(seg.local_path), self._bucket, seg.object_key,
                ExtraArgs={
                    "ServerSideEncryption": "aws:kms",
                    "IfNoneMatch": "*",                 # conditional create only
                    "Metadata": {"sha256": seg.sha256,
                                 "gtid_range": seg.gtid_range,
                                 "server_uuid": seg.server_uuid},
                },
            )
        except ClientError as exc:
            if exc.response["Error"]["Code"] in ("PreconditionFailed", "412"):
                raise UploadConflict(seg.object_key) from exc
            raise

    def stored_sha256(self, seg: Segment) -> str | None:
        try:
            head = self._client.head_object(Bucket=self._bucket, Key=seg.object_key)
        except ClientError as exc:
            if exc.response["Error"]["Code"] in ("404", "NoSuchKey"):
                return None
            raise
        return head.get("Metadata", {}).get("sha256")


# ---- Google Cloud Storage ----------------------------------------------
class GCSBackend:
    name = "gcs"

    def __init__(self, bucket: str) -> None:
        self._bucket = gcs.Client().bucket(bucket)

    def put_if_absent(self, seg: Segment) -> None:
        blob = self._bucket.blob(seg.object_key)
        blob.metadata = {"sha256": seg.sha256, "gtid_range": seg.gtid_range,
                         "server_uuid": seg.server_uuid}
        try:
            # if_generation_match=0 => create only, never overwrite.
            blob.upload_from_filename(str(seg.local_path), if_generation_match=0)
        except PreconditionFailed as exc:
            raise UploadConflict(seg.object_key) from exc

    def stored_sha256(self, seg: Segment) -> str | None:
        blob = self._bucket.get_blob(seg.object_key)
        if blob is None:
            return None
        return (blob.metadata or {}).get("sha256")


@dataclass
class CircuitBreaker:
    """Opens after `threshold` consecutive failures; probes again after `cooldown`."""
    threshold: int = 5
    cooldown: float = 60.0
    _failures: int = field(default=0)
    _opened_at: float = field(default=0.0)

    @property
    def is_open(self) -> bool:
        if self._failures < self.threshold:
            return False
        if time.monotonic() - self._opened_at >= self.cooldown:
            self._failures = self.threshold - 1   # half-open: allow one probe
            return False
        return True

    def record_success(self) -> None:
        self._failures = 0

    def record_failure(self) -> None:
        self._failures += 1
        if self._failures >= self.threshold:
            self._opened_at = time.monotonic()


@retry(
    retry=retry_if_exception_type((ClientError, GoogleAPICallError)),
    stop=stop_after_attempt(6),
    wait=wait_exponential_jitter(initial=1, max=32),   # backoff + jitter
    reraise=True,
)
def _upload_and_verify(backend: StorageBackend, seg: Segment) -> None:
    backend.put_if_absent(seg)
    remote = backend.stored_sha256(seg)
    if remote != seg.sha256:
        # Corruption is not fixed by retrying — fail loud, do not commit.
        raise ChecksumMismatch(f"{seg.object_key}: {remote} != {seg.sha256}")


def sync_segment(seg: Segment, primary: StorageBackend, secondary: StorageBackend,
                 breaker: CircuitBreaker, *, dry_run: bool = False) -> str:
    """Route one segment to durable storage. Returns the provider that committed."""
    if dry_run:
        logger.info("dry_run", {"key": seg.object_key, "sha256": seg.sha256,
                                "target": (secondary if breaker.is_open else primary).name})
        return "dry_run"

    target = secondary if breaker.is_open else primary
    match target.name:
        case "s3" | "gcs":
            try:
                _upload_and_verify(target, seg)
            except UploadConflict:
                logger.info("idempotent_skip", {"key": seg.object_key})
            except ChecksumMismatch as exc:
                logger.error("checksum_mismatch", {"key": seg.object_key, "error": str(exc)})
                raise                                   # route to dead-letter upstream
            except (ClientError, GoogleAPICallError) as exc:
                breaker.record_failure()
                logger.error("provider_failed", {"provider": target.name,
                                                  "key": seg.object_key, "error": str(exc)})
                # Immediate failover attempt on the secondary.
                _upload_and_verify(secondary, seg)
                logger.info("failover_committed", {"provider": secondary.name,
                                                    "key": seg.object_key})
                return secondary.name
            breaker.record_success()
            logger.info("committed", {"provider": target.name, "key": seg.object_key})
            return target.name
        case other:                                     # pragma: no cover
            raise ValueError(f"unknown backend: {other}")

Two design choices carry the guarantee. First, the conditional write (IfNoneMatch="*" / if_generation_match=0) makes a redelivered task a provably safe no-op — the provider itself rejects the second create, so the pipeline can retry aggressively without risking an overwrite. Second, _upload_and_verify re-reads the stored digest before sync_segment reports success, so the manifest commit that follows can never point at an object that was silently truncated in flight. The full walkthrough of a single-cloud variant, credential rotation, and multipart tuning lives in Building a Python Script to Sync Binlogs to S3 with Boto3.

Configuration Reference

These server-side variables govern how fast segments arrive at the sync stage and how large each object is. The transport concurrency must be tuned to the rotation rate they imply.

Variable	Type	Default	Recommended	PITR impact
`binlog_expire_logs_seconds`	integer (s)	`2592000` (30d)	`≥ 259200` (3d)	Local floor: a segment must be uploaded and verified before this window purges it, or it is lost from the recovery chain.
`max_binlog_size`	integer (bytes)	`1073741824` (1 GiB)	`104857600`–`536870912`	Smaller segments rotate more often → more, smaller objects → finer PITR granularity but higher request volume and per-object overhead.
`sync_binlog`	integer	`1`	`1`	`1` guarantees each transaction is crash-safe on disk before it can be uploaded; other values risk syncing a segment MySQL has not durably persisted.
`binlog_transaction_compression`	boolean	`OFF`	`ON`	Compresses events at the source (MySQL 8.0.20+), shrinking upload bytes, egress cost, and stored object size.
`gtid_mode`	enum	`OFF`	`ON`	Supplies the `gtid_executed` range stamped into object metadata and the manifest for gap detection.
`enforce_gtid_consistency`	enum	`OFF`	`ON`	Rejects GTID-unsafe statements at write time, keeping un-replayable events out of archived objects.

-- MySQL 8.0.22+ : rotation + durability settings that feed the sync stage
SET PERSIST binlog_expire_logs_seconds     = 259200;   -- 3-day local floor
SET PERSIST max_binlog_size                = 268435456; -- 256 MiB segments
SET PERSIST sync_binlog                    = 1;
SET PERSIST binlog_transaction_compression = ON;

SET PERSIST writes to mysqld-auto.cnf, so these survive a restart without editing my.cnf and without racing a configuration-management tool. On the object-storage side, apply a lifecycle policy (S3 Intelligent-Tiering, GCS Nearline/Coldline) to transition aged segments and an object-lock/retention policy (S3 Object Lock, GCS Bucket Lock) to enforce Write-Once-Read-Many semantics on the immutable compliance window.

Validation & Verification Gates

An uploaded object is not “safe” until it has passed the gates a recovery run will depend on:

Checksum verification. The pipeline computes SHA-256 before upload and re-reads the provider’s stored digest via head_object / get_blob (shown above). A mismatch raises ChecksumMismatch and blocks the manifest commit rather than recording a corrupt row.
Conditional-write idempotency. Every create is conditional, so a re-run or a redelivered task resolves to the same deterministic key and is rejected as UploadConflict — proof the object already exists, not a silent overwrite.

GTID set diffing. After a batch drains, prove there are no holes by subtracting the archived union from the server’s executed set:

-- MySQL 8.0+ : an empty result means every committed transaction is archived
SELECT GTID_SUBTRACT(
         @@GLOBAL.gtid_executed,          -- authoritative executed set, live
         '<union of archived gtid_range values from the manifest>'
       ) AS unarchived_gtids;

Dry-run reconciliation. Running with dry_run=True executes discovery, validation, key resolution, and IAM/routing checks but transmits no bytes — the mandatory pre-flight for CI/CD and IAM-policy changes. It surfaces a structured diff of pending objects and expected keys, catching a misconfigured bucket policy or a missing KMS grant before it becomes a silent gap. Reconciliation pairs naturally with Base Backup Integration for PITR, which pins the starting GTID position the archived chain must contiguously extend.

Error Handling & Failure Modes

Transient cloud API failures are the common case; exponential backoff with jitter (wait_exponential_jitter) prevents a throttled endpoint from triggering a thundering-herd retry storm, and the circuit breaker fails over to the secondary cloud once a provider is persistently unhealthy. Persistent failures are the dangerous case, because they can silently truncate the recovery chain. Map each to a defined action:

429 Too Many Requests / S3 SlowDown / GCS rateLimitExceeded — provider-side throttling. Action: retry with backoff and jitter; if it persists past the circuit threshold, open the breaker and route to the secondary. Deep provider-throttling patterns are covered in Error Handling & Retry Logic.
412 PreconditionFailed (S3 If-None-Match / GCS if_generation_match=0) — the object already exists. This is not an error: it is the idempotency guarantee firing. Treat as a verified no-op after confirming the stored checksum matches.
Checksum mismatch after upload — non-retryable corruption (truncated read, disk bit-rot, encryption bug). Retrying re-uploads the same bad bytes. Action: route the segment to a dead-letter queue, page, and re-archive from the still-present local file once the source is fixed. Never commit the manifest row.
ERROR 1236 (HY000): Could not find first log file name in binary log index file — the pipeline references a segment MySQL already purged because the sync stage fell behind binlog_expire_logs_seconds. Action: alert immediately (the chain now has a hole), widen the local retention floor, and raise transport concurrency. Retention itself must be governed against replication lag as described in Binlog Retention Boundaries.
ERROR 3546 (HY000): @@GLOBAL.GTID_PURGED cannot be changed during recovery replay — the base backup’s gtid_purged does not line up with the archived chain’s starting GTID. Action: reconcile the base-backup and archiving retention windows and locate the correct starting object.

Observability & Alerting

Treat archiving lag as a first-class SLI: the gap between the newest closed segment and the newest verified manifest row is your real, measured RPO exposure. Export Prometheus metrics for both the queue and each provider:

binlog_sync_uploaded_total{provider} and binlog_sync_failed_total{provider} — per-cloud success/failure counters; a rising failure ratio on one provider is the leading indicator of a failover.
binlog_sync_lag_seconds{server_uuid} — wall-clock age of the oldest un-uploaded closed segment. Alert when this approaches binlog_expire_logs_seconds — that is the moment the recovery chain is about to lose a segment.
binlog_sync_latency_seconds — upload duration histogram; a rising p99 flags provider throttling before it becomes a stall.
binlog_circuit_state{provider} and binlog_checksum_mismatch_total — any open circuit or non-zero mismatch is a page.

Correlate transport metrics with MySQL’s own authoritative position so lag is measured, not guessed:

-- MySQL 8.0+ : current binary log file, position, and gtid_executed in one snapshot
SELECT * FROM performance_schema.log_status\G
-- Join the reported LOCAL binary_log_file against the newest verified manifest
-- row to compute true sync lag rather than trusting the file watcher alone.

Emit structured log fields (server_uuid, binlog_name, object_key, provider, sha256, attempt) as JSON, as the module above does, so a single segment can be followed from discovery through provider commit — turning a lag spike into an actionable root cause instead of a mystery.

Frequently Asked Questions

Why upload to both S3 and GCS instead of relying on one provider's cross-region replication?

Cross-region replication protects against a regional outage within one provider, but not against a provider-wide control-plane incident, a billing suspension, or an IAM misconfiguration that locks you out of the whole account. Dual-cloud routing makes the second provider an independent failure domain: when the circuit breaker opens on the primary, verified objects keep landing in the secondary and the recovery chain never stalls. The cost is roughly double storage on the hot window, which lifecycle tiering on the secondary largely offsets.

How do conditional writes actually prevent a duplicate delivery from corrupting an object?

Both backends create with a precondition — S3 If-None-Match: * and GCS if_generation_match=0 — which the provider evaluates atomically. If the deterministic key already exists, the create is rejected with 412 PreconditionFailed before any bytes replace the stored object. The pipeline catches that as UploadConflict and treats it as a verified no-op. A redelivered or retried task therefore cannot overwrite a good object with a truncated one; the only way an object changes is a first, successful, checksum-verified create.

Can I upload segments in parallel to go faster without breaking PITR ordering?

Yes, but only the transport runs in parallel — the manifest commit does not. Run a worker per server_uuid so different instances upload concurrently, and within one instance let the compress/encrypt/upload stages pipeline, but funnel all commits through a single ordered barrier that advances the manifest strictly in sequence. Confirming mysql-bin.000042 before 000041 records a chain mysqlbinlog cannot replay contiguously, so throughput must never be bought at the cost of commit ordering.

What belongs in the object key versus the object metadata?

The key must be deterministic and derivable from the segment alone — mysql-binlogs/{server_uuid}/{year}/{month}/{binlog_name}.zst.enc — so idempotent re-runs resolve to the same path and conditional writes work. Everything a recovery run needs to filter or verify goes in metadata: the SHA-256 digest, the gtid_range, the server_uuid, and the source timestamps. Keeping the GTID range in metadata (and mirrored in the manifest) lets the orchestrator diff contiguity without downloading and decrypting every object first.

Async Processing & Queue Management — the ordered worker pool that feeds segments into this transport layer.
Compression & Encryption Workflows — the transform that seals each segment before it is uploaded.
Error Handling & Retry Logic — backoff, circuit breakers, and dead-letter routing for provider failures.
Building a Python Script to Sync Binlogs to S3 with Boto3 — a focused single-cloud implementation with credential rotation and multipart tuning.
The Binary Log — MySQL 8.0 Reference Manual — canonical documentation for segment rotation and status surfaces.

Back to Automated Binlog Archiving to Object Storage.

AWS S3 & GCS Sync Pipelines for MySQL Binary Log Archiving and PITR Automation #

Visual Overview #

Core Concept & Prerequisites #

Production-Grade Python Implementation #

Configuration Reference #

Validation & Verification Gates #

Error Handling & Failure Modes #

Observability & Alerting #

Frequently Asked Questions #

Related #

Explore this section

Building a Python Script to Sync Binlogs to S3 with Boto3