Flyweight (share heavy, read-only stuff instead of duplicating it)

When to use

  • You create lots of similar objects that would each hold the same big immutable data (schemas, regexes, dictionaries).
  • You want to cut memory and startup time by reusing that shared data.
  • Example: many PII-masking steps all need the same compiled regexes.

Avoid when the shared thing is mutable (risk of cross-talk) or you only have a handful of objects.

Diagram (text)

PIIMasker ──> PatternPool.get("email") ── returns shared compiled regex
     ▲
 many instances reuse the same flyweight (compiled regex object)

Python example (≤40 lines, type-hinted)

Concrete case: share compiled regexes across many calls/maskers.

from __future__ import annotations
import re
from functools import lru_cache
from dataclasses import dataclass

PATTERNS: dict[str, str] = {
    "email": r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}",
    "phone": r"\+?\d[\d\- ]{7,}\d",
}

@lru_cache(maxsize=32)
def get_pattern(name: str) -> re.Pattern[str]:
    return re.compile(PATTERNS[name])

@dataclass(frozen=True)
class PIIMasker:
    repl: str = "***"
    def mask(self, text: str, rules: list[str]) -> str:
        out = text
        for rule in rules:
            out = get_pattern(rule).sub(self.repl, out)  # shared flyweight
        return out

# usage
masker = PIIMasker()
clean = masker.mask("contact: a@b.com, phone +1-202-555-0101", ["email", "phone"])

Tiny pytest (cements it)

def test_flyweight_shares_compiled_regex():
    a = get_pattern("email"); b = get_pattern("email")
    assert a is b  # same shared object
    m = PIIMasker(repl="[redacted]")
    s = m.mask("a@b.com and +1 202 555 0101", ["email","phone"])
    assert "[redacted]" in s and "@" not in s

Trade-offs & pitfalls

  • Pros: Less memory, faster repeated use; consistent behavior; cacheable construction.
  • Cons: Indirection layer; cache size/eviction to think about.
  • Pitfalls:
    • Mutability: flyweights must be read-only; don’t stash state on them.
    • Cache policy: unbounded caches can grow; too small can thrash.
    • Scope confusion: per-process cache only; each process/container has its own.

Pythonic alternatives

  • functools.lru_cache (used above) or cachetools.TTLCache for size/TTL control.
  • re module’s own cache (exists, but you get less control; explicit caches are clearer).
  • Module-level singletons (constants) for tiny shared data.
  • Weak refs (weakref.WeakValueDictionary) if you want entries to disappear when unused.

Mini exercise

Add a maxsize knob: replace the hardcoded @lru_cache(maxsize=32) with a small factory that returns a cached get_pattern with a configurable size. Write a test that proves evictions occur when the cache is small.

Checks (quick checklist)

  • Shared object is immutable/read-only.
  • A factory/cache returns the shared instance for the same key.
  • Clear eviction/size policy (or accept “process-lifetime” cache).
  • No per-request state stored on the flyweight.
  • Tests prove identity reuse (a is b) and correct behavior.