Pandas → QuestDB

Pandas → QuestDB from Jupyter: the fast path to time-series SQL (ILP + PGWire)

Meta description:
Move time-series data from Pandas to QuestDB in minutes. Learn ILP ingestion, query back to Pandas over PGWire, and use SAMPLE BY/LATEST ON for real analytics.


Why this matters (quick story)

You’ve got a tidy Pandas DataFrame… until it isn’t. Metrics explode, memory caps out, and your “quick” analysis takes minutes. Offload the heavy lifting to QuestDB: ingest straight from Pandas, query with time-series SQL, and pull just the slices you need back into your notebook. It’s the same Python flow—just actually fast.

  • Ingest: Pandas → QuestDB via the official Python client (ILP). (py-questdb-client.readthedocs.io)
  • Query: Back to Pandas via Postgres wire (PGWire) with SQLAlchemy. (QuestDB)
  • Think time-series: SAMPLE BY, LATEST ON, ASOF JOIN. (QuestDB)

Architecture at a glance

Jupyter / Pandas  ──(ILP: questdb[dataframe])──▶  QuestDB
       ▲                                         │
       └─────────────(PGWire/SQLAlchemy)─────────┘
  • ILP (HTTP) is the recommended high-throughput ingest path in the official Python client. (py-questdb-client.readthedocs.io)
  • PGWire is ideal for querying into Pandas via pandas.read_sql. (QuestDB)

Setup (3 steps)

  1. Run QuestDB (local, Docker, or Cloud). See the docs home to pick your path. (QuestDB)
  2. Install the client (with DataFrame extras): python -m pip install -U "questdb[dataframe]" (The extras add Pandas + PyArrow for zero-copy frame ingest.) (py-questdb-client.readthedocs.io)
  3. Optionally open the live demo to experiment with SQL. (QuestDB)

Ingest a Pandas DataFrame (millions of rows/sec style)

Goal: write a sensor_readings DataFrame into QuestDB with a proper timestamp and symbols.

import pandas as pd
from datetime import datetime, timedelta
from questdb.ingress import Sender

# 1) Build a small demo frame
now = pd.Timestamp.utcnow().floor("s")
df = pd.DataFrame({
    "device_id": pd.Categorical([f"d-{i%3}" for i in range(10_000)]),   # categorical → SYMBOL
    "temp_c":    pd.Series(range(10_000)).astype("float64") / 100.0,
    "ts":        [now + timedelta(seconds=i) for i in range(10_000)]
})

# 2) Send to QuestDB via ILP/HTTP
conf = "http::addr=localhost:9000;"
with Sender.from_conf(conf) as sender:
    # 'at' is the designated timestamp column; Categorical becomes SYMBOL
    sender.dataframe(df, table_name="sensor_readings", at="ts")

Timestamp tip: the client accepts pandas.Timestamp and datetime. Keep timestamps UTC in your pipeline. (QuestDB)


Query back into Pandas (PGWire + SQLAlchemy)

You’ll reuse your familiar Pandas workflow:

import pandas as pd
from sqlalchemy import create_engine

# questdb://user:pass@host:port/dbname  (defaults in dev: admin/quest@localhost:8812/qdb)
engine = create_engine("questdb://admin:quest@localhost:8812/qdb")

# Downsample last 24h by 5 minutes
q = """
SELECT
  ts,
  device_id,
  avg(temp_c) AS avg_temp
FROM sensor_readings
WHERE ts >= dateadd('d', -1, now())
SAMPLE BY 5m
"""
with engine.connect() as conn:
    agg = pd.read_sql(q, conn)

agg.head()
  • The PGWire Python guide shows Pandas + SQLAlchemy integration and connection strings. (QuestDB)
  • SAMPLE BY is a time-series primitive for downsampling. (QuestDB)

Time-series SQL you’ll use daily

Latest value per device (today):

SELECT *
FROM sensor_readings
WHERE ts IN today()
LATEST ON ts PARTITION BY device_id;
  • LATEST ON returns one row per key with the most recent timestamp. (QuestDB)

Match readings with nearest calibration row:

SELECT r.ts, r.device_id, r.temp_c, c.cal_offset
FROM sensor_readings r
ASOF JOIN calibration c ON device_id TOLERANCE 1m;
  • ASOF JOIN aligns rows by time proximity; add TOLERANCE to avoid stale matches. (QuestDB)

Ingest vs. Query: pick the right tool

PathBest forNotes
ILP/HTTP (questdb[dataframe])Bulk/stream ingestion from PandasOfficially recommended ingest path; fastest. (py-questdb-client.readthedocs.io)
PGWire (SQLAlchemy/psycopg/asyncpg)Querying into Pandas/Data appsIdeal for reads; use SQL features like SAMPLE BY. (QuestDB)
Web Console / DemoAd-hoc SQL + learningTry features quickly without code. (QuestDB)

Best practices (and gotchas you’ll actually hit)

  • Set a designated timestamp when ingesting (at="ts"). Many time-series features assume it. (py-questdb-client.readthedocs.io)
  • Use SYMBOL for identifiers (device_id, symbol, etc.). It compresses and speeds equality filters/joins. Index symbols if you filter on them a lot. (QuestDB)
  • Downsample early (SAMPLE BY) and pull less into Pandas—keep memory flat. (QuestDB)
  • Prefer ILP for writes, PGWire for reads. Mixing them the other way usually costs throughput. (py-questdb-client.readthedocs.io)
  • UTC everywhere. QuestDB stores timestamps in UTC; normalize at ingest to avoid painful off-by-time bugs. (QuestDB)
  • Learn the extensions. SAMPLE BY, LATEST ON, ASOF JOIN are high-leverage and idiomatic in QuestDB. (QuestDB)

Full working notebook to try

QuestDB publishes a ready-to-run Jupyter notebook (“Play with QuestDB”) featuring Pandas + SQL usage and a sizeable dataset. It’s the fastest way to get a feel for the flow. (QuestDB Play)


Internal link ideas (official)

  • Python ingestion client (install, Sender, DataFrame ingest). (QuestDB)
  • Pandas integration guide (third-party tools). (QuestDB)
  • PGWire + Python (psycopg/asyncpg + Pandas via SQLAlchemy). (QuestDB)
  • SQL overview (time-series clauses at a glance). (QuestDB)
  • SAMPLE BY, LATEST ON, ASOF JOIN references. (QuestDB)
  • Symbol & Index concepts (when and why). (QuestDB)

Summary

If you’re already in Pandas, QuestDB lets you scale the same workflow:

  • Write DataFrames directly with ILP (questdb[dataframe]).
  • Run time-series SQL (SAMPLE BY, LATEST ON, ASOF JOIN) server-side.
  • Pull compact results back into Pandas over PGWire.

It’s the quickest way to turn a notebook into a real-time analytics stack—without re-training your team on a new API. (py-questdb-client.readthedocs.io)

Call to action: Spin up QuestDB, install the Python client, and point your next notebook at it. Ship the graph, not the out-of-memory error. (QuestDB)


Image prompt

“A clean, modern data architecture diagram showing Pandas/Jupyter sending time-series data via ILP to a QuestDB node, and querying back to Pandas over PGWire; highlight SAMPLE BY/LATEST ON/ASOF JOIN; minimalistic, high-contrast, 3D isometric style.”


Tags

#NoSQL #QuestDB #Pandas #Python #TimeSeries #DataEngineering #SQL #Analytics #PGWire #ILP

Leave a Reply

Your email address will not be published. Required fields are marked *