When to use
- You have different backends (S3, local disk, GCS) but want one tiny API in your code.
- You want to swap backends without changing your pipeline.
- You want easy tests that don’t need real cloud clients.
Avoid when you’ll only ever use one backend or a single helper function is enough.
Diagram (text)
Pipeline code ──> Storage (our tiny interface)
▲
┌───────┴────────┐
S3Adapter LocalFSAdapter
(wraps boto3) (wraps pathlib)
Step-by-step idea
- Pick the interface you wish your code could talk to (here:
Storage.put_bytes). - Write one adapter per vendor that converts their API to your interface.
- Use only the interface in your pipeline; pass in whichever adapter you want.
- Test with a fake client or temp folder—no cloud needed.
Python example (≤40 lines, type-hinted)
from __future__ import annotations
from pathlib import Path
from typing import Protocol
class Storage(Protocol):
def put_bytes(self, bucket: str, key: str, data: bytes) -> None: ...
class S3Adapter:
def __init__(self, client) -> None: # e.g., boto3.client("s3")
self._c = client
def put_bytes(self, bucket: str, key: str, data: bytes) -> None:
try:
self._c.put_object(Bucket=bucket, Key=key, Body=data)
except Exception as e:
raise IOError(f"S3 put failed: {bucket}/{key}") from e
class LocalFSAdapter:
def __init__(self, root: Path) -> None:
self._root = root
def put_bytes(self, bucket: str, key: str, data: bytes) -> None:
dest = self._root / bucket / key
dest.parent.mkdir(parents=True, exist_ok=True)
dest.write_bytes(data)
def backup_metrics(storage: Storage, date_key: str, data: bytes) -> None:
storage.put_bytes(bucket="metrics", key=f"{date_key}.json", data=data)
Tiny pytest (cements the idea)
def test_backup_works_with_both(tmp_path):
class FakeS3:
def __init__(self): self.store = {}
def put_object(self, Bucket, Key, Body): self.store[(Bucket, Key)] = Body
s3 = S3Adapter(FakeS3())
fs = LocalFSAdapter(tmp_path)
for st in (s3, fs):
backup_metrics(st, "2025-11-06", b"{}")
assert ("metrics", "2025-11-06.json") in s3._c.store
assert (tmp_path / "metrics" / "2025-11-06.json").exists()
Trade-offs & pitfalls
- Pros: Clean swap of backends; fewer
if/elif; simple testing; hides vendor quirks. - Cons: A little indirection; one small class per backend.
- Watch out for:
- Letting vendor details leak into pipeline code (keep them inside adapters).
- Inconsistent errors—normalize them (as shown with
IOError). - “Kitchen-sink” interfaces—keep your interface tiny.
Pythonic alternatives
fsspec/smart_openalready give a unified file API—use them if they fit.- Duck typing / Protocols (as above) keep it lightweight.
- Dataclasses for adapter config and simple dependency injection (pass the adapter in).
Mini exercise
Add a GCSAdapter that wraps a client with upload_blob(bucket, key, data). Make it work with backup_metrics without changing that function. Add a tiny test like above.
Checks (quick checklist)
- One tiny interface your pipeline calls (
put_bytes). - One adapter per backend; no
if backend == ...in pipeline. - Errors wrapped into your own exception type.
- Adapters are thin (no business logic).
- Tests pass with fakes/temp dirs (no real cloud).




