Plug-in discovery – Data/ML Engineer Blog

Plug-in discovery (entry points)

When to use

You want optional extensions (extra file formats, transforms) shipped separately.
Core app shouldn’t know every plug-in; just discover & load them.
Teams can add features by installing a package, no code changes.

Avoid when a fixed small set of features is enough—use a simple registry dict.

Diagram (text)

Core app ──> load_plugins("myapp.transformers") ──> {name: plugin}
                       ▲
            Distribution entry points
            (3rd-party packages expose plugins)

Python example (≤40 lines, type-hinted)

from __future__ import annotations
from importlib.metadata import entry_points
from typing import Protocol, Iterable, Dict, Any

class Transformer(Protocol):
    name: str
    def transform(self, rows: Iterable[dict[str, Any]]) -> Iterable[dict[str, Any]]: ...

def load_plugins(group: str = "myapp.transformers") -> Dict[str, Transformer]:
    plugins: Dict[str, Transformer] = {}

    # built-in example
    class Upper:
        name = "upper"
        def transform(self, rows):
            for r in rows:
                yield {**r, "name": str(r.get("name", "")).upper()}
    plugins[Upper.name] = Upper()

    # third-party via entry points
    for ep in entry_points().select(group=group):
        obj = ep.load()                  # class or factory
        plugin: Transformer = obj() if callable(obj) else obj
        plugins[plugin.name] = plugin
    return plugins

def run_pipeline(rows: Iterable[dict[str, Any]], steps: list[str]) -> list[dict[str, Any]]:
    plugins = load_plugins()
    for s in steps:
        rows = plugins[s].transform(rows)
    return list(rows)

Tiny pytest (cements it)

def test_discovers_and_runs_plugins(monkeypatch):
    # Fake an external plugin published via entry points
    class Tag:
        name = "tag"
        def transform(self, rows):
            for r in rows:
                rr = dict(r); rr["tag"] = 1; yield rr

    class FakeEP:  # mimics importlib.metadata.EntryPoint
        def load(self): return Tag
    class FakeEPS:
        def select(self, group): return [FakeEP()]

    # Monkeypatch the imported entry_points symbol in this module
    import types, sys
    this = sys.modules[__name__]
    monkeypatch.setattr(this, "entry_points", lambda: FakeEPS())

    out = run_pipeline([{"name": "a"}], ["upper", "tag"])
    assert out == [{"name": "A", "tag": 1}]

Trade-offs & pitfalls

Pros: Extensible without redeploy; clean core; third parties can innovate independently.
Cons: Indirection; harder to trace; version/compatibility management needed.
Pitfalls:
- Untrusted code loading—whitelist groups/names, sandbox if needed.
- Name clashes—decide last-write wins or error on duplicates.
- Import errors—wrap ep.load() and log/skip bad plugins with clear messages.

Pythonic alternatives

Simple registry: REGISTRY = {"upper": Upper()} for small, static sets.
Module scanning: import myapp.plugins.* with naming convention (no packaging needed).
Config-driven: list dotted paths in YAML/TOML and import with importlib.
Protocols (as above) keep plug-in contracts duck-typed; pydantic to validate plug-in config.

Mini exercise

Add safety to load_plugins:

Reject plugins missing required attributes (name, transform).
Add an optional allowlist parameter (set of names); load only those.
Write a test where one bad plugin is skipped and only allowlisted ones run.

Checks (quick checklist)

Clear contract (Protocol) for plugins.
Discovery isolates failures (bad plugin doesn’t crash the app).
Deterministic resolution for duplicate names.
Configurable allow/deny lists or version checks.
Tests simulate entry point loading and ordering.

Data/ML Engineer Blog

Plug-in discovery (entry points)

When to use

Diagram (text)

Python example (≤40 lines, type-hinted)

Tiny pytest (cements it)

Trade-offs & pitfalls

Pythonic alternatives

Mini exercise

Checks (quick checklist)

YOU MAY HAVE MISSED

Monitoring 101 for Data Engineers

Materialized Views in the Real World

Kafka Ingestion with Apache Doris Routine Load

Structured Logging 101