Flyweight (share heavy, read-only stuff instead of duplicating it)

When to use

You create lots of similar objects that would each hold the same big immutable data (schemas, regexes, dictionaries).
You want to cut memory and startup time by reusing that shared data.
Example: many PII-masking steps all need the same compiled regexes.

Avoid when the shared thing is mutable (risk of cross-talk) or you only have a handful of objects.

Diagram (text)

PIIMasker ──> PatternPool.get("email") ── returns shared compiled regex
     ▲
 many instances reuse the same flyweight (compiled regex object)

Python example (≤40 lines, type-hinted)

Concrete case: share compiled regexes across many calls/maskers.

from __future__ import annotations
import re
from functools import lru_cache
from dataclasses import dataclass

PATTERNS: dict[str, str] = {
    "email": r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}",
    "phone": r"\+?\d[\d\- ]{7,}\d",
}

@lru_cache(maxsize=32)
def get_pattern(name: str) -> re.Pattern[str]:
    return re.compile(PATTERNS[name])

@dataclass(frozen=True)
class PIIMasker:
    repl: str = "***"
    def mask(self, text: str, rules: list[str]) -> str:
        out = text
        for rule in rules:
            out = get_pattern(rule).sub(self.repl, out)  # shared flyweight
        return out

# usage
masker = PIIMasker()
clean = masker.mask("contact: a@b.com, phone +1-202-555-0101", ["email", "phone"])

Tiny pytest (cements it)

def test_flyweight_shares_compiled_regex():
    a = get_pattern("email"); b = get_pattern("email")
    assert a is b  # same shared object
    m = PIIMasker(repl="[redacted]")
    s = m.mask("a@b.com and +1 202 555 0101", ["email","phone"])
    assert "[redacted]" in s and "@" not in s

Trade-offs & pitfalls

Pros: Less memory, faster repeated use; consistent behavior; cacheable construction.
Cons: Indirection layer; cache size/eviction to think about.
Pitfalls:
- Mutability: flyweights must be read-only; don’t stash state on them.
- Cache policy: unbounded caches can grow; too small can thrash.
- Scope confusion: per-process cache only; each process/container has its own.

Pythonic alternatives

functools.lru_cache (used above) or cachetools.TTLCache for size/TTL control.
re module’s own cache (exists, but you get less control; explicit caches are clearer).
Module-level singletons (constants) for tiny shared data.
Weak refs (weakref.WeakValueDictionary) if you want entries to disappear when unused.

Mini exercise

Add a maxsize knob: replace the hardcoded @lru_cache(maxsize=32) with a small factory that returns a cached get_pattern with a configurable size. Write a test that proves evictions occur when the cache is small.

Checks (quick checklist)

Shared object is immutable/read-only.
A factory/cache returns the shared instance for the same key.
Clear eviction/size policy (or accept “process-lifetime” cache).
No per-request state stored on the flyweight.
Tests prove identity reuse (a is b) and correct behavior.

Data/ML Engineer Blog

Flyweight (share heavy, read-only stuff instead of duplicating it)

When to use

Diagram (text)

Python example (≤40 lines, type-hinted)

Tiny pytest (cements it)

Trade-offs & pitfalls

Pythonic alternatives

Mini exercise

Checks (quick checklist)

YOU MAY HAVE MISSED

Monitoring 101 for Data Engineers

Materialized Views in the Real World

Kafka Ingestion with Apache Doris Routine Load

Structured Logging 101