Pandas → QuestDB from Jupyter: the fast path to time-series SQL (ILP + PGWire)
Meta description:
Move time-series data from Pandas to QuestDB in minutes. Learn ILP ingestion, query back to Pandas over PGWire, and use SAMPLE BY/LATEST ON for real analytics.
Why this matters (quick story)
You’ve got a tidy Pandas DataFrame… until it isn’t. Metrics explode, memory caps out, and your “quick” analysis takes minutes. Offload the heavy lifting to QuestDB: ingest straight from Pandas, query with time-series SQL, and pull just the slices you need back into your notebook. It’s the same Python flow—just actually fast.
- Ingest: Pandas → QuestDB via the official Python client (ILP). (py-questdb-client.readthedocs.io)
- Query: Back to Pandas via Postgres wire (PGWire) with SQLAlchemy. (QuestDB)
- Think time-series:
SAMPLE BY,LATEST ON,ASOF JOIN. (QuestDB)
Architecture at a glance
Jupyter / Pandas ──(ILP: questdb[dataframe])──▶ QuestDB
▲ │
└─────────────(PGWire/SQLAlchemy)─────────┘
- ILP (HTTP) is the recommended high-throughput ingest path in the official Python client. (py-questdb-client.readthedocs.io)
- PGWire is ideal for querying into Pandas via
pandas.read_sql. (QuestDB)
Setup (3 steps)
- Run QuestDB (local, Docker, or Cloud). See the docs home to pick your path. (QuestDB)
- Install the client (with DataFrame extras):
python -m pip install -U "questdb[dataframe]"(The extras add Pandas + PyArrow for zero-copy frame ingest.) (py-questdb-client.readthedocs.io) - Optionally open the live demo to experiment with SQL. (QuestDB)
Ingest a Pandas DataFrame (millions of rows/sec style)
Goal: write a sensor_readings DataFrame into QuestDB with a proper timestamp and symbols.
import pandas as pd
from datetime import datetime, timedelta
from questdb.ingress import Sender
# 1) Build a small demo frame
now = pd.Timestamp.utcnow().floor("s")
df = pd.DataFrame({
"device_id": pd.Categorical([f"d-{i%3}" for i in range(10_000)]), # categorical → SYMBOL
"temp_c": pd.Series(range(10_000)).astype("float64") / 100.0,
"ts": [now + timedelta(seconds=i) for i in range(10_000)]
})
# 2) Send to QuestDB via ILP/HTTP
conf = "http::addr=localhost:9000;"
with Sender.from_conf(conf) as sender:
# 'at' is the designated timestamp column; Categorical becomes SYMBOL
sender.dataframe(df, table_name="sensor_readings", at="ts")
sender.dataframeis the simplest/fastest way to push a frame. (py-questdb-client.readthedocs.io)- Categoricals → SYMBOL (dictionary-encoded string) for better scans and joins. You can also pass
symbols=["device_id"]explicitly. (py-questdb-client.readthedocs.io)
Timestamp tip: the client accepts pandas.Timestamp and datetime. Keep timestamps UTC in your pipeline. (QuestDB)
Query back into Pandas (PGWire + SQLAlchemy)
You’ll reuse your familiar Pandas workflow:
import pandas as pd
from sqlalchemy import create_engine
# questdb://user:pass@host:port/dbname (defaults in dev: admin/quest@localhost:8812/qdb)
engine = create_engine("questdb://admin:quest@localhost:8812/qdb")
# Downsample last 24h by 5 minutes
q = """
SELECT
ts,
device_id,
avg(temp_c) AS avg_temp
FROM sensor_readings
WHERE ts >= dateadd('d', -1, now())
SAMPLE BY 5m
"""
with engine.connect() as conn:
agg = pd.read_sql(q, conn)
agg.head()
- The PGWire Python guide shows Pandas + SQLAlchemy integration and connection strings. (QuestDB)
SAMPLE BYis a time-series primitive for downsampling. (QuestDB)
Time-series SQL you’ll use daily
Latest value per device (today):
SELECT *
FROM sensor_readings
WHERE ts IN today()
LATEST ON ts PARTITION BY device_id;
LATEST ONreturns one row per key with the most recent timestamp. (QuestDB)
Match readings with nearest calibration row:
SELECT r.ts, r.device_id, r.temp_c, c.cal_offset
FROM sensor_readings r
ASOF JOIN calibration c ON device_id TOLERANCE 1m;
ASOF JOINaligns rows by time proximity; addTOLERANCEto avoid stale matches. (QuestDB)
Ingest vs. Query: pick the right tool
| Path | Best for | Notes |
|---|---|---|
ILP/HTTP (questdb[dataframe]) | Bulk/stream ingestion from Pandas | Officially recommended ingest path; fastest. (py-questdb-client.readthedocs.io) |
| PGWire (SQLAlchemy/psycopg/asyncpg) | Querying into Pandas/Data apps | Ideal for reads; use SQL features like SAMPLE BY. (QuestDB) |
| Web Console / Demo | Ad-hoc SQL + learning | Try features quickly without code. (QuestDB) |
Best practices (and gotchas you’ll actually hit)
- Set a designated timestamp when ingesting (
at="ts"). Many time-series features assume it. (py-questdb-client.readthedocs.io) - Use SYMBOL for identifiers (
device_id,symbol, etc.). It compresses and speeds equality filters/joins. Index symbols if you filter on them a lot. (QuestDB) - Downsample early (
SAMPLE BY) and pull less into Pandas—keep memory flat. (QuestDB) - Prefer ILP for writes, PGWire for reads. Mixing them the other way usually costs throughput. (py-questdb-client.readthedocs.io)
- UTC everywhere. QuestDB stores timestamps in UTC; normalize at ingest to avoid painful off-by-time bugs. (QuestDB)
- Learn the extensions.
SAMPLE BY,LATEST ON,ASOF JOINare high-leverage and idiomatic in QuestDB. (QuestDB)
Full working notebook to try
QuestDB publishes a ready-to-run Jupyter notebook (“Play with QuestDB”) featuring Pandas + SQL usage and a sizeable dataset. It’s the fastest way to get a feel for the flow. (QuestDB Play)
Internal link ideas (official)
- Python ingestion client (install,
Sender, DataFrame ingest). (QuestDB) - Pandas integration guide (third-party tools). (QuestDB)
- PGWire + Python (psycopg/asyncpg + Pandas via SQLAlchemy). (QuestDB)
- SQL overview (time-series clauses at a glance). (QuestDB)
SAMPLE BY,LATEST ON,ASOF JOINreferences. (QuestDB)- Symbol & Index concepts (when and why). (QuestDB)
Summary
If you’re already in Pandas, QuestDB lets you scale the same workflow:
- Write DataFrames directly with ILP (
questdb[dataframe]). - Run time-series SQL (
SAMPLE BY,LATEST ON,ASOF JOIN) server-side. - Pull compact results back into Pandas over PGWire.
It’s the quickest way to turn a notebook into a real-time analytics stack—without re-training your team on a new API. (py-questdb-client.readthedocs.io)
Call to action: Spin up QuestDB, install the Python client, and point your next notebook at it. Ship the graph, not the out-of-memory error. (QuestDB)
Image prompt
“A clean, modern data architecture diagram showing Pandas/Jupyter sending time-series data via ILP to a QuestDB node, and querying back to Pandas over PGWire; highlight SAMPLE BY/LATEST ON/ASOF JOIN; minimalistic, high-contrast, 3D isometric style.”
Tags
#NoSQL #QuestDB #Pandas #Python #TimeSeries #DataEngineering #SQL #Analytics #PGWire #ILP





Leave a Reply