When to use

  • You have different backends (S3, local disk, GCS) but want one tiny API in your code.
  • You want to swap backends without changing your pipeline.
  • You want easy tests that don’t need real cloud clients.

Avoid when you’ll only ever use one backend or a single helper function is enough.

Diagram (text)

Pipeline code ──> Storage (our tiny interface)
                     ▲
             ┌───────┴────────┐
           S3Adapter      LocalFSAdapter
           (wraps boto3)  (wraps pathlib)

Step-by-step idea

  1. Pick the interface you wish your code could talk to (here: Storage.put_bytes).
  2. Write one adapter per vendor that converts their API to your interface.
  3. Use only the interface in your pipeline; pass in whichever adapter you want.
  4. Test with a fake client or temp folder—no cloud needed.

Python example (≤40 lines, type-hinted)

from __future__ import annotations
from pathlib import Path
from typing import Protocol

class Storage(Protocol):
    def put_bytes(self, bucket: str, key: str, data: bytes) -> None: ...

class S3Adapter:
    def __init__(self, client) -> None:  # e.g., boto3.client("s3")
        self._c = client
    def put_bytes(self, bucket: str, key: str, data: bytes) -> None:
        try:
            self._c.put_object(Bucket=bucket, Key=key, Body=data)
        except Exception as e:
            raise IOError(f"S3 put failed: {bucket}/{key}") from e

class LocalFSAdapter:
    def __init__(self, root: Path) -> None:
        self._root = root
    def put_bytes(self, bucket: str, key: str, data: bytes) -> None:
        dest = self._root / bucket / key
        dest.parent.mkdir(parents=True, exist_ok=True)
        dest.write_bytes(data)

def backup_metrics(storage: Storage, date_key: str, data: bytes) -> None:
    storage.put_bytes(bucket="metrics", key=f"{date_key}.json", data=data)

Tiny pytest (cements the idea)

def test_backup_works_with_both(tmp_path):
    class FakeS3:
        def __init__(self): self.store = {}
        def put_object(self, Bucket, Key, Body): self.store[(Bucket, Key)] = Body

    s3 = S3Adapter(FakeS3())
    fs = LocalFSAdapter(tmp_path)

    for st in (s3, fs):
        backup_metrics(st, "2025-11-06", b"{}")

    assert ("metrics", "2025-11-06.json") in s3._c.store
    assert (tmp_path / "metrics" / "2025-11-06.json").exists()

Trade-offs & pitfalls

  • Pros: Clean swap of backends; fewer if/elif; simple testing; hides vendor quirks.
  • Cons: A little indirection; one small class per backend.
  • Watch out for:
    • Letting vendor details leak into pipeline code (keep them inside adapters).
    • Inconsistent errors—normalize them (as shown with IOError).
    • “Kitchen-sink” interfaces—keep your interface tiny.

Pythonic alternatives

  • fsspec / smart_open already give a unified file API—use them if they fit.
  • Duck typing / Protocols (as above) keep it lightweight.
  • Dataclasses for adapter config and simple dependency injection (pass the adapter in).

Mini exercise

Add a GCSAdapter that wraps a client with upload_blob(bucket, key, data). Make it work with backup_metrics without changing that function. Add a tiny test like above.

Checks (quick checklist)

  • One tiny interface your pipeline calls (put_bytes).
  • One adapter per backend; no if backend == ... in pipeline.
  • Errors wrapped into your own exception type.
  • Adapters are thin (no business logic).
  • Tests pass with fakes/temp dirs (no real cloud).