Pandas → QuestDB from Jupyter: the fast path to time-series SQL (ILP + PGWire)

Meta description:
Move time-series data from Pandas to QuestDB in minutes. Learn ILP ingestion, query back to Pandas over PGWire, and use SAMPLE BY/LATEST ON for real analytics.

Why this matters (quick story)

You’ve got a tidy Pandas DataFrame… until it isn’t. Metrics explode, memory caps out, and your “quick” analysis takes minutes. Offload the heavy lifting to QuestDB: ingest straight from Pandas, query with time-series SQL, and pull just the slices you need back into your notebook. It’s the same Python flow—just actually fast.

Ingest: Pandas → QuestDB via the official Python client (ILP). (py-questdb-client.readthedocs.io)
Query: Back to Pandas via Postgres wire (PGWire) with SQLAlchemy. (QuestDB)
Think time-series: SAMPLE BY, LATEST ON, ASOF JOIN. (QuestDB)

Architecture at a glance

Jupyter / Pandas  ──(ILP: questdb[dataframe])──▶  QuestDB
       ▲                                         │
       └─────────────(PGWire/SQLAlchemy)─────────┘

ILP (HTTP) is the recommended high-throughput ingest path in the official Python client. (py-questdb-client.readthedocs.io)
PGWire is ideal for querying into Pandas via pandas.read_sql. (QuestDB)

Setup (3 steps)

Run QuestDB (local, Docker, or Cloud). See the docs home to pick your path. (QuestDB)
Install the client (with DataFrame extras): python -m pip install -U "questdb[dataframe]" (The extras add Pandas + PyArrow for zero-copy frame ingest.) (py-questdb-client.readthedocs.io)
Optionally open the live demo to experiment with SQL. (QuestDB)

Ingest a Pandas DataFrame (millions of rows/sec style)

Goal: write a sensor_readings DataFrame into QuestDB with a proper timestamp and symbols.

import pandas as pd
from datetime import datetime, timedelta
from questdb.ingress import Sender

# 1) Build a small demo frame
now = pd.Timestamp.utcnow().floor("s")
df = pd.DataFrame({
    "device_id": pd.Categorical([f"d-{i%3}" for i in range(10_000)]),   # categorical → SYMBOL
    "temp_c":    pd.Series(range(10_000)).astype("float64") / 100.0,
    "ts":        [now + timedelta(seconds=i) for i in range(10_000)]
})

# 2) Send to QuestDB via ILP/HTTP
conf = "http::addr=localhost:9000;"
with Sender.from_conf(conf) as sender:
    # 'at' is the designated timestamp column; Categorical becomes SYMBOL
    sender.dataframe(df, table_name="sensor_readings", at="ts")

sender.dataframe is the simplest/fastest way to push a frame. (py-questdb-client.readthedocs.io)
Categoricals → SYMBOL (dictionary-encoded string) for better scans and joins. You can also pass symbols=["device_id"] explicitly. (py-questdb-client.readthedocs.io)

Timestamp tip: the client accepts pandas.Timestamp and datetime. Keep timestamps UTC in your pipeline. (QuestDB)

Query back into Pandas (PGWire + SQLAlchemy)

You’ll reuse your familiar Pandas workflow:

import pandas as pd
from sqlalchemy import create_engine

# questdb://user:pass@host:port/dbname  (defaults in dev: admin/quest@localhost:8812/qdb)
engine = create_engine("questdb://admin:quest@localhost:8812/qdb")

# Downsample last 24h by 5 minutes
q = """
SELECT
  ts,
  device_id,
  avg(temp_c) AS avg_temp
FROM sensor_readings
WHERE ts >= dateadd('d', -1, now())
SAMPLE BY 5m
"""
with engine.connect() as conn:
    agg = pd.read_sql(q, conn)

agg.head()

The PGWire Python guide shows Pandas + SQLAlchemy integration and connection strings. (QuestDB)
SAMPLE BY is a time-series primitive for downsampling. (QuestDB)

Time-series SQL you’ll use daily

Latest value per device (today):

SELECT *
FROM sensor_readings
WHERE ts IN today()
LATEST ON ts PARTITION BY device_id;

LATEST ON returns one row per key with the most recent timestamp. (QuestDB)

Match readings with nearest calibration row:

SELECT r.ts, r.device_id, r.temp_c, c.cal_offset
FROM sensor_readings r
ASOF JOIN calibration c ON device_id TOLERANCE 1m;

ASOF JOIN aligns rows by time proximity; add TOLERANCE to avoid stale matches. (QuestDB)

Ingest vs. Query: pick the right tool

Path	Best for	Notes
ILP/HTTP (`questdb[dataframe]`)	Bulk/stream ingestion from Pandas	Officially recommended ingest path; fastest. (py-questdb-client.readthedocs.io)
PGWire (SQLAlchemy/psycopg/asyncpg)	Querying into Pandas/Data apps	Ideal for reads; use SQL features like `SAMPLE BY`. (QuestDB)
Web Console / Demo	Ad-hoc SQL + learning	Try features quickly without code. (QuestDB)

Best practices (and gotchas you’ll actually hit)

Set a designated timestamp when ingesting (at="ts"). Many time-series features assume it. (py-questdb-client.readthedocs.io)
Use SYMBOL for identifiers (device_id, symbol, etc.). It compresses and speeds equality filters/joins. Index symbols if you filter on them a lot. (QuestDB)
Downsample early (SAMPLE BY) and pull less into Pandas—keep memory flat. (QuestDB)
Prefer ILP for writes, PGWire for reads. Mixing them the other way usually costs throughput. (py-questdb-client.readthedocs.io)
UTC everywhere. QuestDB stores timestamps in UTC; normalize at ingest to avoid painful off-by-time bugs. (QuestDB)
Learn the extensions. SAMPLE BY, LATEST ON, ASOF JOIN are high-leverage and idiomatic in QuestDB. (QuestDB)

Full working notebook to try

QuestDB publishes a ready-to-run Jupyter notebook (“Play with QuestDB”) featuring Pandas + SQL usage and a sizeable dataset. It’s the fastest way to get a feel for the flow. (QuestDB Play)

Internal link ideas (official)

Python ingestion client (install, Sender, DataFrame ingest). (QuestDB)
Pandas integration guide (third-party tools). (QuestDB)
PGWire + Python (psycopg/asyncpg + Pandas via SQLAlchemy). (QuestDB)
SQL overview (time-series clauses at a glance). (QuestDB)
SAMPLE BY, LATEST ON, ASOF JOIN references. (QuestDB)
Symbol & Index concepts (when and why). (QuestDB)

Summary

If you’re already in Pandas, QuestDB lets you scale the same workflow:

Write DataFrames directly with ILP (questdb[dataframe]).
Run time-series SQL (SAMPLE BY, LATEST ON, ASOF JOIN) server-side.
Pull compact results back into Pandas over PGWire.

It’s the quickest way to turn a notebook into a real-time analytics stack—without re-training your team on a new API. (py-questdb-client.readthedocs.io)

Call to action: Spin up QuestDB, install the Python client, and point your next notebook at it. Ship the graph, not the out-of-memory error. (QuestDB)

Image prompt

“A clean, modern data architecture diagram showing Pandas/Jupyter sending time-series data via ILP to a QuestDB node, and querying back to Pandas over PGWire; highlight SAMPLE BY/LATEST ON/ASOF JOIN; minimalistic, high-contrast, 3D isometric style.”

Data/ML Engineer Blog

Pandas → QuestDB

Pandas → QuestDB from Jupyter: the fast path to time-series SQL (ILP + PGWire)

Why this matters (quick story)

Architecture at a glance

Setup (3 steps)

Ingest a Pandas DataFrame (millions of rows/sec style)

Query back into Pandas (PGWire + SQLAlchemy)

Time-series SQL you’ll use daily

Ingest vs. Query: pick the right tool

Best practices (and gotchas you’ll actually hit)

Full working notebook to try

Internal link ideas (official)

Summary

Image prompt

Tags

Alex

Leave a Reply
Cancel reply

Leave a Reply

YOU MAY HAVE MISSED

Kafka Ingestion with Apache Doris Routine Load

Pandas → QuestDB

Snowflake Workspaces

5 Python Design Patterns

Pandas → QuestDB from Jupyter: the fast path to time-series SQL (ILP + PGWire)

Why this matters (quick story)

Architecture at a glance

Setup (3 steps)

Ingest a Pandas DataFrame (millions of rows/sec style)

Query back into Pandas (PGWire + SQLAlchemy)

Time-series SQL you’ll use daily

Ingest vs. Query: pick the right tool

Best practices (and gotchas you’ll actually hit)

Full working notebook to try

Internal link ideas (official)

Summary

Image prompt

Tags

Related Story

Leave a Reply Cancel reply

Leave a Reply

YOU MAY HAVE MISSED

Leave a Reply
Cancel reply