Memento (snapshot & restore state)
When to use
- You need a checkpoint to roll back after a failure (e.g., stream offset/watermark).
- You want to save/undo user or job configuration changes safely.
- You must decouple snapshot storage (caretaker) from the object that owns the state.
Avoid when state is tiny and you can just rebuild the object, or a DB transaction already gives atomicity.
Diagram (text)
Originator (IngestCursor)
├─ save() → Memento (immutable snapshot)
└─ restore(memento)
Caretaker (history list, S3, DB) — stores mementos, never inspects internals
Python example (≤40 lines, type-hinted)
from __future__ import annotations
from dataclasses import dataclass
from typing import Optional, List
@dataclass(frozen=True)
class CursorMemento:
offset: int
watermark: Optional[str] = None # e.g., ISO timestamp
@dataclass
class IngestCursor: # Originator
offset: int = 0
watermark: Optional[str] = None
def save(self) -> CursorMemento: # create snapshot
return CursorMemento(self.offset, self.watermark)
def restore(self, m: CursorMemento) -> None: # roll back
self.offset, self.watermark = m.offset, m.watermark
def advance(self, n: int, watermark: Optional[str] = None) -> None:
self.offset += n
if watermark is not None: self.watermark = watermark
Tiny pytest (cements it)
def test_memento_checkpoint_and_rollback():
cur = IngestCursor()
history: List[CursorMemento] = []
history.append(cur.save()) # 0, None
cur.advance(100, watermark="2025-11-06T00:00:00Z")
history.append(cur.save()) # 100, ts
cur.advance(50) # then a failure happens…
cur.restore(history[-1]) # rollback to last good point
assert (cur.offset, cur.watermark) == (100, "2025-11-06T00:00:00Z")
Trade-offs & pitfalls
- Pros: Safe undo/rollback; caretaker doesn’t depend on internals; snapshots are immutable.
- Cons: Snapshots consume memory/storage; can go stale if external systems move on.
- Pitfalls:
- Capturing too much state—keep snapshots minimal (offsets, IDs, timestamps).
- Making mementos mutable—use frozen dataclasses or tuples.
- Forgetting to persist mementos atomically alongside outputs.
Pythonic alternatives
dataclasses.replace/copy.deepcopyfor small objects (but prefer explicit snapshots).- DB checkpoints (store offset/watermark in a durable table).
- Filesystem/S3 JSON snapshots with versioning (pair with your run metadata).
- Transactions: when you can, commit or rollback instead of manual snapshots.
Mini exercise
Add to_json(m: CursorMemento) -> str and from_json(s: str) -> CursorMemento so you can persist snapshots to S3/DB. Write a quick test to round-trip a snapshot and restore from the loaded one.
Checks (quick checklist)
- Snapshot is immutable and minimal.
- Originator exposes
save()andrestore(m)only—no other code touches internals. - Caretaker stores snapshots but never reads their fields.
- Rollback tested after simulated failure.
- Durable storage & atomicity considered for real pipelines.




