ML training + experiment tracking + serving (all Python libs)
The big picture (how these fit together)
Data → Train (sklearn/xgboost/lightgbm/catboost/torch/tf)
↘ Tune (optuna / ray[tune]) → pick best model
↘ Track (mlflow / wandb / neptune / comet-ml) → metrics, params, artifacts
Artifacts → Serialize (joblib / cloudpickle / dill) → “model.pkl”
Serve → API (fastapi/starlette + pydantic + uvicorn/gunicorn)
or Model servers (bentoml / ray[serve] / mlserver / tritonclient to NVIDIA Triton)
Runtime optimizations → onnxruntime (optional)
Feature engineering → featuretools (optional)
Training / models (what each is best at)
- scikit-learn – Swiss-army knife for classic ML (tabular, small/medium data).
Strengths: pipelines, preprocessing, cross-val; wide algo coverage. Use when: CPU tabular problems, quick baselines, clean API. - XGBoost – Gradient boosting, very strong on tabular with careful tuning.
Strengths: accuracy, handling missing values; CPU/GPU. Notes: can overfit; watchn_estimators,max_depth,eta. - LightGBM – Faster/lighter boosting (histogram algorithm).
Strengths: speed on large/tabular, categorical support via integer encoding. Notes: sensitive tonum_leaves&min_data_in_leaf. - CatBoost – Boosting with native categorical handling (no heavy encoding).
Strengths: great defaults, less feature engineering; handles text-ish features. Notes: GPU helps; watch training time on huge data. - PyTorch (+ torchvision for images) – Flexible, Pythonic deep learning.
Strengths: custom architectures, research → prod; strong community. Use when: images, custom nets, you want control. - TensorFlow/Keras – DL framework with production tooling (TF Serving, TFX).
Strengths: high-level Keras API, good for production pipelines; XLA. Use when: you want the TF ecosystem.
Quick pick: Tabular → start with LightGBM or XGBoost; Images → PyTorch + torchvision; Need turnkey classic ML → scikit-learn.
Feature tooling
- featuretools – Automated feature engineering on relational/transactional tables.
Use when: you’ve got entity relationships (customers ↔ orders ↔ items) and want fast baseline features. Tip: limit primitives; review leakage.
Experiment tracking / registry
- MLflow – OSS tracking + model registry + artifacts. Runs anywhere (local/S3).
Good for: teams wanting self-host or lightweight tracking. - Weights & Biases (
wandb) – Hosted tracking, great dashboards/sweeps/artifacts.
Good for: collaborative experiments, visualizations, minimal setup. - Neptune (
neptune-client) – Hosted tracking with flexible metadata & dashboards. - Comet (
comet-ml) – Hosted tracking similar to W&B; nice experiment mgmt.
Quick pick: Need on-prem/OSS → MLflow. Want SaaS UX & sweeps → W&B (or Neptune/Comet if your team already uses them).
Tuning (hyperparameters)
- Optuna – Elegant, fast Bayesian + pruners. Tight integrations (sklearn, xgboost, lightgbm, catboost, PyTorch).
Use when: single machine or modest scale; you want the simplest API that still rocks. - Ray Tune (
ray[tune]) – Distributed hyperparam search on a cluster; many schedulers/algos.
Use when: you need to parallelize across CPUs/GPUs or many nodes.
Quick pick: Start Optuna; move to Ray Tune when parallelism/distribution matters.
Batch utils (serialization & parallel)
- joblib – Save/load sklearn pipelines fast; simple parallel loops.
Use for:dump()/load()models; CPU-bound map withParallel(n_jobs=...). - cloudpickle – Serialize dynamic Python objects/functions (more flexible than pickle).
Use for: sending callables to workers (Ray/Dask), custom objects. - dill – Even more permissive pickling (e.g., lambdas).
Use carefully: portability can suffer; preferjoblib/cloudpicklefirst.
Gotcha: Pickles are not stable across Python/library versions. Pin versions or export to ONNX for portability where possible.
Serving APIs & runtimes
- FastAPI + Pydantic + Uvicorn (Starlette under the hood) – Build typed, fast REST inference services.
Pattern:fastapiapp,pydanticrequest/response models,uvicornas ASGI server,gunicornfor multi-worker in prod. - BentoML – Package and deploy models with batteries included (runners, adapters, OCI images).
Great for: standardized packaging and multi-model services. - Ray Serve – Scalable model serving on Ray cluster (Pythonic APIs, autoscaling).
Great for: many models, dynamic routing, distributed workloads. - MLServer – Seldon’s multi-framework model server (supports MLflow, SKL, XGBoost, etc.).
Great for: standard serving protocols; easy Dockerization. - Triton Inference Server (use
tritonclient) – NVIDIA’s high-perf server for GPU/CPU; ensembles, batching.
Great for: high-throughput DL, multiple frameworks, GPUs. Client lives in your Python service. - ONNX Runtime – High-performance runtime for ONNX models (CPU/GPU, quantization).
Great for: portable, fast inference once you export to ONNX.
Quick pick: Simple service → FastAPI. Need packaging/best-practices → BentoML. Many models / scale-out → Ray Serve. GPU throughput → Triton (+ tritonclient). Want portability → export to ONNX and run with onnxruntime.
Two tiny “glue” recipes
1) Train + tune + track (sklearn + Optuna + MLflow) in ~20 lines
import mlflow, optuna
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
X, y = load_breast_cancer(return_X_y=True)
def objective(trial: optuna.Trial) -> float:
params = dict(n_estimators=trial.suggest_int("n_estimators", 100, 600),
max_depth=trial.suggest_int("max_depth", 3, 20),
min_samples_leaf=trial.suggest_int("min_samples_leaf", 1, 10))
with mlflow.start_run():
clf = RandomForestClassifier(**params, n_jobs=-1, random_state=0)
score = cross_val_score(clf, X, y, cv=5, scoring="roc_auc").mean()
mlflow.log_params(params); mlflow.log_metric("roc_auc", score)
return 1.0 - score # Optuna minimizes
study = optuna.create_study()
study.optimize(objective, n_trials=20)
print("best:", study.best_params)
2) Serve the model (FastAPI + Pydantic + Uvicorn)
# save: joblib.dump(model, "model.pkl")
import joblib
from fastapi import FastAPI
from pydantic import BaseModel
class Inp(BaseModel):
mean_radius: float
mean_texture: float
mean_smoothness: float
# ... add required features
app = FastAPI()
model = joblib.load("model.pkl")
@app.post("/predict")
def predict(x: Inp):
X = [[x.mean_radius, x.mean_texture, x.mean_smoothness]]
proba = float(model.predict_proba(X)[0][1])
return {"score": proba}
# run: uvicorn app:app --host 0.0.0.0 --port 8000
Pragmatic “pick one” starter stack
- Tabular:
lightgbmorxgboost→ Optuna → MLflow → serialize with joblib → FastAPI + Uvicorn (later: export to ONNX + onnxruntime if you need portability). - Vision:
torch+torchvision→ W&B for tracking → serve with BentoML or Ray Serve (GPU) → consider Triton for max throughput. - General deep learning (TF):
tensorflow/keras→ W&B/MLflow → TF Serving or BentoML.
Common gotchas & pro tips
- Version pinning: pin Python + libs for training and serving; pickle breakage is real.
- Determinism: set seeds and limit threads for reproducible metrics (
OMP_NUM_THREADS, BLAS). - Feature parity: lock the same preprocessing at train and serve (sklearn
Pipeline, or export preprocessing into ONNX). - Throughput: prefer batch inference (Triton/MLServer) or async endpoints; profile JSON parse & validation.
- Observability: log inputs (hashed), latencies, and outputs; add drift checks (evidently/whylogs) later.
- Resource fit: CPU tabular often beats small DL; use GPU when it actually moves the needle.




