Memento (snapshot & restore state)

When to use

  • You need a checkpoint to roll back after a failure (e.g., stream offset/watermark).
  • You want to save/undo user or job configuration changes safely.
  • You must decouple snapshot storage (caretaker) from the object that owns the state.

Avoid when state is tiny and you can just rebuild the object, or a DB transaction already gives atomicity.

Diagram (text)

Originator (IngestCursor)
   ├─ save() → Memento (immutable snapshot)
   └─ restore(memento)

Caretaker (history list, S3, DB) — stores mementos, never inspects internals

Python example (≤40 lines, type-hinted)

from __future__ import annotations
from dataclasses import dataclass
from typing import Optional, List

@dataclass(frozen=True)
class CursorMemento:
    offset: int
    watermark: Optional[str] = None  # e.g., ISO timestamp

@dataclass
class IngestCursor:                  # Originator
    offset: int = 0
    watermark: Optional[str] = None
    def save(self) -> CursorMemento:         # create snapshot
        return CursorMemento(self.offset, self.watermark)
    def restore(self, m: CursorMemento) -> None:  # roll back
        self.offset, self.watermark = m.offset, m.watermark
    def advance(self, n: int, watermark: Optional[str] = None) -> None:
        self.offset += n
        if watermark is not None: self.watermark = watermark

Tiny pytest (cements it)

def test_memento_checkpoint_and_rollback():
    cur = IngestCursor()
    history: List[CursorMemento] = []

    history.append(cur.save())  # 0, None
    cur.advance(100, watermark="2025-11-06T00:00:00Z")
    history.append(cur.save())  # 100, ts
    cur.advance(50)             # then a failure happens…

    cur.restore(history[-1])    # rollback to last good point
    assert (cur.offset, cur.watermark) == (100, "2025-11-06T00:00:00Z")

Trade-offs & pitfalls

  • Pros: Safe undo/rollback; caretaker doesn’t depend on internals; snapshots are immutable.
  • Cons: Snapshots consume memory/storage; can go stale if external systems move on.
  • Pitfalls:
    • Capturing too much state—keep snapshots minimal (offsets, IDs, timestamps).
    • Making mementos mutable—use frozen dataclasses or tuples.
    • Forgetting to persist mementos atomically alongside outputs.

Pythonic alternatives

  • dataclasses.replace/copy.deepcopy for small objects (but prefer explicit snapshots).
  • DB checkpoints (store offset/watermark in a durable table).
  • Filesystem/S3 JSON snapshots with versioning (pair with your run metadata).
  • Transactions: when you can, commit or rollback instead of manual snapshots.

Mini exercise

Add to_json(m: CursorMemento) -> str and from_json(s: str) -> CursorMemento so you can persist snapshots to S3/DB. Write a quick test to round-trip a snapshot and restore from the loaded one.

Checks (quick checklist)

  • Snapshot is immutable and minimal.
  • Originator exposes save() and restore(m) only—no other code touches internals.
  • Caretaker stores snapshots but never reads their fields.
  • Rollback tested after simulated failure.
  • Durable storage & atomicity considered for real pipelines.