Should I run the binlog sync as a cron job or a long-lived daemon?

Use a systemd service or timer rather than raw cron for execution tracking, automatic restart after an OOM kill, and structured logging. The atomic cursor keeps each run idempotent, and aligning the interval inside binlog_expire_logs_seconds lets a missed run catch up before segments are purged.

Why verify a SHA-256 checksum when S3 already returns an ETag?

For multipart objects the S3 ETag is a hash of part hashes, so it cannot prove the reassembled object matches the source. Round-tripping an independent SHA-256 stored as an object tag catches silent corruption before the cursor advances, which is the only point a segment is treated as recoverable.

Building a Python Script to Sync Binlogs to S3 with Boto3

The failure signature is always the same: a nightly aws s3 cp /var/lib/mysql/mysql-bin.* s3://bucket/ cron job reports exit code 0 for months, and then a recovery drill discovers that mysql-bin.000318 was uploaded while it was still the active, still-being-written tail — so mysqlbinlog halts mid-replay with ERROR: Error in Log_event::read_log_event(): 'read error' and the point-in-time recovery (PITR) chain dead-ends short of the incident. Naive copies also upload segments out of sequence, overwrite a good object with a truncated multipart retry, and delete the local file before the remote object is verified. This page builds the alternative: a single-purpose Python 3.10+ daemon that uses boto3 to stream, compress, encrypt, checksum, and idempotently commit each closed binary log segment to Amazon S3, advancing a durable cursor only after the object’s integrity is confirmed back from the bucket.

Visual Overview

One idempotent sync cycle: discover closed segments, stream an encrypted create-only upload, round-trip the checksum, and advance the cursor only after the object verifies.

Context & Prerequisites

This procedure is a single-cloud, boto3-focused implementation of the provider-abstracted transport defined in AWS S3 & GCS Sync Pipelines; read that page first for the ordering and dual-cloud failover model this script deliberately simplifies. It assumes a gap-free GTID Tracking & Enforcement pipeline upstream so every segment carries a resolvable gtid_executed interval, and that binlog_format=ROW is already enforced — the deterministic-replay rationale lives in ROW vs STATEMENT vs MIXED Formats. You need MySQL 8.0.22+ (for SHOW BINARY LOG STATUS and the Encrypted column in SHOW BINARY LOGS), Python 3.10+, and the boto3, tenacity, and mysql-connector-python packages. The daemon runs under a dedicated non-root OS user with read access to the datadir and least-privilege S3 credentials scoped per Security & Access Frameworks.

Step-by-Step Implementation

1. Harden the server so segments are safe to archive

Before the daemon reads a byte, the server must flush durably and retain segments longer than the maximum sync latency. Premature local purge before the upload is verified is the single most common cause of a broken recovery chain — the retention math is covered in Binlog Retention Boundaries.

-- MySQL 8.0.22+  (run as a privileged admin, then persist in my.cnf)
SET PERSIST sync_binlog = 1;                       -- flush binlog on every commit
SET PERSIST binlog_expire_logs_seconds = 259200;   -- keep 3 days locally as a safety window
SELECT @@GLOBAL.binlog_format, @@GLOBAL.gtid_mode; -- must return ROW, ON

PITR relevance: sync_binlog=1 guarantees a committed transaction survives an OS crash and is therefore present in the segment you archive; the retention window guarantees the file still exists locally when the daemon comes back from a network partition.

2. Model the cursor as durable, crash-safe state

A stateless cron run cannot know what it already archived. Track progress in a JSON cursor co-located with the datadir, written atomically (temp file + os.replace) so a crash mid-write can never corrupt it.

# Python 3.10+
from __future__ import annotations
import json, os
from dataclasses import dataclass, asdict
from pathlib import Path

CURSOR_PATH = Path("/var/lib/mysql/.binlog_sync_cursor.json")

@dataclass(slots=True)
class SyncCursor:
    last_synced_file: str = ""     # e.g. "mysql-bin.000317"
    sha256: str = ""               # digest of the compressed object
    last_epoch: float = 0.0        # UTC epoch of last verified commit

    @classmethod
    def load(cls) -> "SyncCursor":
        if CURSOR_PATH.exists():
            return cls(**json.loads(CURSOR_PATH.read_text()))
        return cls()

    def commit(self) -> None:
        tmp = CURSOR_PATH.with_suffix(".tmp")
        tmp.write_text(json.dumps(asdict(self)))
        os.replace(tmp, CURSOR_PATH)   # atomic on POSIX

PITR relevance: the cursor is the resume point. Because it is only committed after remote verification (step 6), a restart replays at most one segment rather than skipping one — over-delivery is safe, under-delivery is not.

3. Discover only closed segments — never the active tail

Query the server for the ordered index and for the file it is currently writing, then consider strictly the segments before the active one.

# Python 3.10+
import mysql.connector

def list_closed_segments(conn) -> list[str]:
    cur = conn.cursor()
    cur.execute("SHOW BINARY LOGS")               # ordered list of segments
    all_logs = [row[0] for row in cur.fetchall()]
    cur.execute("SHOW BINARY LOG STATUS")         # MySQL 8.0.22+ (was SHOW MASTER STATUS)
    active = cur.fetchone()[0]
    cur.close()
    active_idx = all_logs.index(active)
    return all_logs[:active_idx]                  # everything before the tail is closed

PITR relevance: the active file is still receiving events; uploading it captures a torn half-transaction. Deferring it until the next rotation is what makes each archived object independently replayable.

4. Stream-compress in 64 KB chunks with zero temp files

Compress in-flight so a multi-gigabyte segment never materializes on disk or in memory, avoiding I/O contention with the live database.

# Python 3.10+
import zlib

def stream_gzip(path: Path, chunk: int = 65536):
    # 16 + MAX_WBITS selects the gzip wrapper; level 6 balances CPU vs ratio.
    c = zlib.compressobj(6, zlib.DEFLATED, 16 + zlib.MAX_WBITS)
    with open(path, "rb") as f:
        while block := f.read(chunk):
            if out := c.compress(block):
                yield out
    if tail := c.flush():
        yield tail

PITR relevance: bounded memory means the archiver never OOM-kills MySQL during a peak write window, so the segment that captures the incident actually gets uploaded.

5. Upload with a deterministic key, KMS encryption, and a conditional write

Wrap the generator so boto3 can stream it, tune TransferConfig for stable multipart throughput, enforce aws:kms server-side encryption, and use IfNoneMatch="*" so a redelivered task can never overwrite an existing object. tenacity supplies jittered backoff for 5xx/throttling responses.

# Python 3.10+
import hashlib, io
import boto3
from boto3.s3.transfer import TransferConfig
from botocore.exceptions import ClientError
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type

XFER = TransferConfig(multipart_threshold=8 * 1024 * 1024, max_concurrency=4)

class _Wrapper(io.RawIOBase):           # adapt the generator into a readable file object
    def __init__(self, gen):
        self._gen, self._buf, self._sha = gen, b"", hashlib.sha256()
    def readable(self): return True
    def readinto(self, b):
        while not self._buf:
            try: self._buf = next(self._gen)
            except StopIteration: return 0
        n = min(len(b), len(self._buf))
        b[:n], self._buf = self._buf[:n], self._buf[n:]
        self._sha.update(bytes(b[:n]))
        return n

@retry(wait=wait_random_exponential(multiplier=2, max=60),
       stop=stop_after_attempt(6),
       retry=retry_if_exception_type(ClientError))
def upload_segment(s3, bucket: str, key: str, kms_arn: str, path: Path) -> str:
    body = _Wrapper(stream_gzip(path))
    s3.upload_fileobj(
        body, bucket, key,
        ExtraArgs={
            "ServerSideEncryption": "aws:kms",
            "SSEKMSKeyId": kms_arn,
            "IfNoneMatch": "*",                     # create-only: never clobber
        },
        Config=XFER,
    )
    digest = body._sha.hexdigest()
    s3.put_object_tagging(Bucket=bucket, Key=key, Tagging={
        "TagSet": [{"Key": "sha256", "Value": digest}]})
    return digest

Derive the key deterministically from the server UUID, date, and segment name — mysql-binlogs/{server_uuid}/{year}/{month}/{name}.gz — so idempotent re-runs resolve to the same path and the conditional write works. Ordering this transport correctly is what keeps Timestamp Targeting Strategies mathematically possible.

6. Verify the remote checksum, then commit the cursor

Read the stored tag back with head_object/get_object_tagging and compare it to the digest computed during upload. Commit the cursor only on a match; on mismatch, halt and alert rather than advance a broken chain.

# Python 3.10+
def verify_and_commit(s3, bucket, key, name, local_digest, cursor: SyncCursor) -> None:
    tags = {t["Key"]: t["Value"]
            for t in s3.get_object_tagging(Bucket=bucket, Key=key)["TagSet"]}
    remote = tags.get("sha256", "")
    match remote:
        case d if d == local_digest and d != "":
            cursor.last_synced_file, cursor.sha256 = name, local_digest
            cursor.last_epoch = __import__("time").time()
            cursor.commit()
        case _:
            raise RuntimeError(f"checksum mismatch for {key}: {remote!r} != {local_digest!r}")

PITR relevance: the cursor advance is the atomic “this segment is durably recoverable” fact. Gating it on a round-tripped checksum is what turns S3 — not local disk — into the system of record.

Configuration Snippet & Reference Table

Persist the server side in my.cnf and scope the IAM policy to exactly the two actions the daemon performs.

# /etc/mysql/my.cnf  — MySQL 8.0.22+
[mysqld]
log_bin                     = /var/lib/mysql/mysql-bin
binlog_format               = ROW
sync_binlog                 = 1
binlog_expire_logs_seconds  = 259200
binlog_checksum             = CRC32

Setting	Where	Recommended	Why it matters for PITR
`multipart_threshold`	`TransferConfig`	`8 MiB`	Below this a single `PUT` is used; above it, multipart parallelism kicks in for large segments.
`max_concurrency`	`TransferConfig`	`4`	Caps upload threads so archiving never starves live DB I/O.
`IfNoneMatch`	`ExtraArgs`	`"*"`	Create-only write; a duplicate delivery returns `412` instead of clobbering a good object.
`ServerSideEncryption`	`ExtraArgs`	`aws:kms`	Ties objects to a KMS key with CloudTrail auditing and IAM boundaries.
IAM actions	bucket policy	`s3:PutObject`, `s3:GetObjectTagging`	Least privilege; the daemon needs to write and verify, nothing else.

Verification Checklist

SHOW BINARY LOG STATUS returns a file that is not present in S3 (the active tail is correctly skipped).
Every closed segment before the tail exists in the bucket with a sha256 object tag.
The local cursor’s last_synced_file equals the newest closed segment name.
Downloading an archived object and running gunzip -c obj.gz | mysqlbinlog --verify-binlog-checksum - exits 0.
A forced re-run of the daemon uploads nothing new and returns a 412 PreconditionFailed no-op for already-archived keys.
Killing the daemon mid-upload and restarting resumes at the interrupted segment with no gap and no duplicate cursor advance.
KMS-encrypted objects are readable only by the recovery role, confirmed via aws s3api head-object.

Gotchas & Version-Specific Caveats

SHOW MASTER STATUS is deprecated. On MySQL 8.0.22+ use SHOW BINARY LOG STATUS; the old spelling still works on 8.0 but is removed as the canonical form in 8.4, so pin the new statement to stay forward-compatible.

Multipart ETags are not a whole-object hash. For any object uploaded in parts, the S3 ETag is a hash of the part hashes with a -N suffix, not the MD5 or SHA-256 of the content. Never compare it to a local digest — carry your own sha256 in an object tag (as above) and verify that instead. On boto3 with newer default checksum behavior you can alternatively request ChecksumAlgorithm="SHA256" and read back x-amz-checksum-sha256.

expire_logs_days is gone. It was deprecated in 8.0 and removed; only binlog_expire_logs_seconds exists. A leftover expire_logs_days line prevents startup on 8.4.

Compression choice affects CPU, not correctness. The zlib gzip stream shown here is universally available; if archiving competes with a hot OLTP window, swap to zstd/lz4 for a better ratio at lower CPU — the trade-off and streaming-encryption pairing is detailed in Compression & Encryption Workflows.

Sustained load will surface 503 SlowDown. A single flat prefix throttles under high binlog velocity; the date-partitioned key above helps, and the full mitigation (adaptive retries, prefix sharding, backpressure) lives in handling S3 throttling during high-throughput binlog archiving.

Frequently Asked Questions

Should I run this as a cron job or a long-lived daemon?

Use a systemd service or timer, not raw cron. A timer gives you precise execution tracking, automatic restart after an OOM kill, and structured journal logging; the atomic cursor makes each run idempotent either way. Align the timer interval well inside your binlog_expire_logs_seconds window so a missed run still has local segments to catch up on before they are purged.

Why verify a checksum when S3 already returns an ETag on success?

A 200 on upload_fileobj confirms the bytes were received, but for multipart objects the ETag is a hash of part hashes, so it cannot prove the reassembled object matches your source. Round-tripping an independent SHA-256 you stored as an object tag catches silent corruption — a truncated part, a retried partial upload — before the cursor advances, which is the only moment the segment is treated as recoverable.

What happens if the daemon crashes between upload and cursor commit?

On restart it re-selects the same closed segment (the cursor never advanced) and re-attempts the upload. The IfNoneMatch="*" conditional write returns 412 PreconditionFailed for the already-present object, which you treat as a verified no-op, then it re-reads the tag, confirms the checksum, and commits the cursor. Over-delivery is safe; the chain is never left with a gap.

AWS S3 & GCS Sync Pipelines — the provider-abstracted, dual-cloud parent this single-cloud script specializes.
Handling S3 throttling during high-throughput binlog archiving — adaptive retries, prefix sharding, and backpressure when uploads hit 503 SlowDown.
Using Celery for async binlog upload processing — scale this transport into an ordered worker pool.
Implementing AES-256 encryption for archived binlogs — client-side encryption when KMS SSE alone is insufficient for your compliance framework.
The Binary Log — MySQL 8.0 Reference Manual — canonical documentation for segment rotation and status surfaces.

Back to AWS S3 & GCS Sync Pipelines.

Building a Python Script to Sync Binlogs to S3 with Boto3 #

Visual Overview #

Context & Prerequisites #

Step-by-Step Implementation #

1. Harden the server so segments are safe to archive #

2. Model the cursor as durable, crash-safe state #

3. Discover only closed segments — never the active tail #

4. Stream-compress in 64 KB chunks with zero temp files #

5. Upload with a deterministic key, KMS encryption, and a conditional write #

6. Verify the remote checksum, then commit the cursor #

Configuration Snippet & Reference Table #

Verification Checklist #

Gotchas & Version-Specific Caveats #

Frequently Asked Questions #

Related #