Snowflake Cortex AI: Hands-On Review After 3 Months in Production

Snowflake Cortex AI: Hands-On Review After 3 Months in Production

Three months ago, our team started using Snowflake Cortex AI in production. Not as a proof of concept. Not in a sandbox. Real workloads, real data, real money on the line. We were already deep in the Snowflake ecosystem for our data warehouse, so the pitch was compelling: run LLM inference, vector search, and ML forecasting directly where our data lives, without moving anything to external services. After twelve weeks, I have a much clearer picture of where Snowflake Cortex AI genuinely delivers and where it quietly wastes your credits. This is the honest review I wish I had read before we started.

What Snowflake Cortex AI Actually Offers

Before diving into my experience, let me lay out what Cortex AI is and what it is not. Snowflake has been aggressively bundling AI features under the Cortex umbrella, and the marketing makes it sound like one unified product. In practice, it is four distinct feature groups with different maturity levels, different pricing models, and different levels of usefulness.

  • Cortex LLM Functions - SQL-callable functions like COMPLETE, SUMMARIZE, TRANSLATE, EXTRACT_ANSWER, CLASSIFY, and SENTIMENT. These run foundation models (Mistral, Llama, Arctic) directly inside Snowflake.
  • Cortex Search - A managed RAG (Retrieval-Augmented Generation) service. You create a search service on your data, and Snowflake handles embedding, indexing, and retrieval. No external vector database needed.
  • Cortex Fine-Tuning - Fine-tune supported LLMs on your own data without leaving Snowflake. Currently supports Mistral 7B and Llama models.
  • ML Functions - Built-in machine learning for time series forecasting, anomaly detection, and contribution explorer. These are not LLM-based; they are classical ML models that Snowflake trains and runs for you.

Each of these has a different billing mechanism, different limitations, and different levels of production readiness. Let me walk through each one based on what we actually shipped.

Cortex LLM Functions: SQL Meets Large Language Models

This is the headline feature, and I will admit it is genuinely satisfying the first time you run an LLM inference call as a SQL query. No API keys, no HTTP calls, no Python scripts stitching things together. Just SQL.

COMPLETE: The General-Purpose LLM Function

COMPLETE is the most flexible Cortex LLM function. You give it a model name, a prompt, and it returns generated text. We use it to generate product descriptions from structured catalog data for our e-commerce client.

-- Generate product descriptions from structured data
SELECT
    product_id,
    product_name,
    SNOWFLAKE.CORTEX.COMPLETE(
        'mistral-large2',
        CONCAT(
            'Write a concise 2-sentence product description for an e-commerce listing. ',
            'Product: ', product_name, '. ',
            'Category: ', category, '. ',
            'Key features: ', features, '. ',
            'Price range: ', price_tier, '.'
        )
    ) AS generated_description
FROM catalog.products
WHERE description IS NULL
LIMIT 100;

That works. It actually works well. The latency per row is roughly 1.5 to 4 seconds depending on the model and prompt length, which matters enormously when you are processing thousands of rows. We quickly learned to batch these into smaller runs rather than trying to process an entire table at once.

You can also use COMPLETE with structured options for more control:

-- COMPLETE with options: temperature, max_tokens, system prompt
SELECT
    ticket_id,
    SNOWFLAKE.CORTEX.COMPLETE(
        'mistral-large2',
        [
            {'role': 'system', 'content': 'You are a support ticket classifier. Respond with only the category name.'},
            {'role': 'user', 'content': description}
        ],
        {'temperature': 0.1, 'max_tokens': 50}
    ):message::STRING AS category
FROM support.tickets
WHERE auto_category IS NULL;

SUMMARIZE: Surprisingly Useful

SUMMARIZE takes a text column and produces a summary. We use it on customer feedback and support transcripts. The function is straightforward and the results are solid for most use cases.

-- Summarize long customer feedback entries
SELECT
    feedback_id,
    customer_id,
    LENGTH(raw_feedback) AS original_length,
    SNOWFLAKE.CORTEX.SUMMARIZE(raw_feedback) AS summary,
    LENGTH(SNOWFLAKE.CORTEX.SUMMARIZE(raw_feedback)) AS summary_length
FROM customer.feedback
WHERE LENGTH(raw_feedback) > 1000
ORDER BY created_at DESC
LIMIT 50;

One gotcha: SUMMARIZE does not let you control summary length or style. You get what you get. For cases where we needed bullet-point summaries or specific formats, we switched to COMPLETE with a custom prompt. SUMMARIZE is good for the 80% case where you just need a quick digest.

TRANSLATE: Works, With Caveats

Our dataset includes customer reviews in six languages. TRANSLATE handles the common European languages well but struggles with less common language pairs. Here is a straightforward example:

-- Translate customer reviews to English for unified analysis
SELECT
    review_id,
    original_language,
    review_text,
    SNOWFLAKE.CORTEX.TRANSLATE(review_text, original_language, 'en') AS english_text
FROM reviews.international
WHERE original_language != 'en'
  AND translated_at IS NULL;

Quality for French, German, and Spanish to English is comparable to DeepL. Japanese and Korean translations were noticeably worse, with awkward phrasing and occasional meaning drift. For those languages, we still route to an external translation API.

CLASSIFY and SENTIMENT: Quick Wins

CLASSIFY assigns a text to one of your provided categories. SENTIMENT returns a score from -1 to 1. Both are fast and cheap.

-- Classify support tickets into predefined categories
SELECT
    ticket_id,
    subject,
    SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
        description,
        ['billing_issue', 'technical_bug', 'feature_request', 'account_access', 'general_inquiry']
    ):label::STRING AS predicted_category,
    SNOWFLAKE.CORTEX.CLASSIFY_TEXT(
        description,
        ['billing_issue', 'technical_bug', 'feature_request', 'account_access', 'general_inquiry']
    ):score::FLOAT AS confidence
FROM support.tickets
WHERE created_at >= DATEADD('day', -7, CURRENT_TIMESTAMP());

-- Sentiment analysis on product reviews
SELECT
    product_id,
    AVG(SNOWFLAKE.CORTEX.SENTIMENT(review_text)) AS avg_sentiment,
    COUNT(*) AS review_count,
    COUNT_IF(SNOWFLAKE.CORTEX.SENTIMENT(review_text) > 0.3) AS positive_count,
    COUNT_IF(SNOWFLAKE.CORTEX.SENTIMENT(review_text) < -0.3) AS negative_count
FROM reviews.product_reviews
WHERE review_date >= '2025-12-01'
GROUP BY product_id
ORDER BY avg_sentiment ASC;

SENTIMENT is the function I recommend most for teams just starting with Cortex. It is fast, the results are consistent, and it removes the need for external sentiment analysis libraries entirely. CLASSIFY is more variable. It works well with clearly distinct categories but gets confused when categories overlap semantically.

Cortex Search: RAG Without Leaving Snowflake

Cortex Search is Snowflake's answer to the "we need a vector database" problem. You create a search service, point it at a table, and it handles chunking, embedding, and retrieval. For teams that are already in Snowflake and want to build a RAG pipeline without managing Pinecone or Weaviate, this is attractive.

-- Create a Cortex Search service on your knowledge base
CREATE OR REPLACE CORTEX SEARCH SERVICE knowledge_base_search
  ON knowledge_base.articles
  WAREHOUSE = 'CORTEX_WH_S'
  TARGET_LAG = '1 hour'
  AS (
    SELECT
        article_id,
        title,
        content,
        category,
        last_updated
    FROM knowledge_base.articles
    WHERE status = 'published'
  );

Once the service is created, you query it from Python using the Snowflake Python connector:

import json
from snowflake.core import Root

# Connect to the search service
root = Root(session)
search_service = (
    root.databases["KNOWLEDGE_BASE"]
    .schemas["PUBLIC"]
    .cortex_search_services["KNOWLEDGE_BASE_SEARCH"]
)

# Run a search query
results = search_service.search(
    query="How do I reset my API credentials?",
    columns=["article_id", "title", "content"],
    filter={"@eq": {"category": "developer_docs"}},
    limit=5
)

for result in results.results:
    print(f"Score: {result['@search_score']:.3f} | {result['title']}")
    print(f"  {result['content'][:200]}...")
    print()

We built an internal knowledge base search for our documentation team using Cortex Search. Setup was genuinely faster than our previous approach with pgvector. The TARGET_LAG parameter controls how fresh the index stays, which is a nice touch. Set it to one hour and Snowflake automatically re-indexes when data changes.

The limitations are real though. You cannot bring your own embeddings. You cannot choose the embedding model. You cannot access the raw vectors for your own similarity calculations. If you need any of that, you need an external vector store. For straightforward search-and-retrieve workflows, Cortex Search is good enough. For anything more sophisticated, it is not.

Cortex Fine-Tuning: Promising but Early

We fine-tuned a Mistral 7B model on our support ticket classification task. The process is SQL-driven, which is consistent with the Cortex philosophy of keeping everything in SQL.

-- Prepare training data
CREATE OR REPLACE TABLE ml.training_data AS
SELECT
    description AS prompt,
    verified_category AS completion
FROM support.tickets_labeled
WHERE labeled_by = 'human'
  AND label_confidence >= 0.95
SAMPLE (10000 ROWS);

-- Launch fine-tuning job
SELECT SNOWFLAKE.CORTEX.FINETUNE(
    'CREATE',
    'mistral-7b',
    'TRAINING_DB.ML.TRAINING_DATA',
    'VALIDATION_DB.ML.VALIDATION_DATA',
    {'learning_rate': 1e-5, 'num_epochs': 3}
) AS job_id;

-- Check job status
SELECT SNOWFLAKE.CORTEX.FINETUNE('SHOW') AS jobs;

The fine-tuned model improved our classification accuracy from 71% with the base model to 89%. That is a meaningful lift. However, the training took about 6 hours on our 10,000-example dataset, and the cost was approximately $180 in Cortex credits. That is not unreasonable for a one-time training run, but iterating on hyperparameters gets expensive fast.

The biggest limitation is model selection. You are restricted to the models Snowflake supports for fine-tuning. You cannot fine-tune GPT-4, Claude, or Gemini through Cortex. If you want to fine-tune those, you still need to go through their respective platforms.

ML Functions: The Underrated Gem

While everyone focuses on the LLM features, I think Snowflake's ML Functions are actually the most production-ready part of Cortex AI. These are not LLM-based. They use classical ML algorithms for three specific tasks: time series forecasting, anomaly detection, and contribution analysis. And they are excellent.

Forecasting

We replaced a custom Prophet-based forecasting pipeline with Snowflake's built-in FORECAST function. The setup time went from two weeks of engineering to about thirty minutes.

-- Create a forecasting model for daily revenue
CREATE OR REPLACE SNOWFLAKE.ML.FORECAST revenue_forecast(
    INPUT_DATA => SYSTEM$REFERENCE('TABLE', 'analytics.daily_revenue'),
    SERIES_COLNAME => 'business_unit',
    TIMESTAMP_COLNAME => 'date',
    TARGET_COLNAME => 'revenue',
    CONFIG_OBJECT => {'prediction_interval': 0.95}
);

-- Generate 30-day forecast
CALL revenue_forecast!FORECAST(
    FORECASTING_PERIODS => 30,
    CONFIG_OBJECT => {'prediction_interval': 0.95}
);

-- Query the results
SELECT
    series,
    ts AS forecast_date,
    forecast,
    lower_bound,
    upper_bound
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
ORDER BY series, ts;

The accuracy was within 3% of our Prophet model for 7-day forecasts and actually slightly better for 30-day forecasts. The model handles multiple time series (we have 12 business units) automatically via the SERIES_COLNAME parameter. No manual loop needed.

Anomaly Detection

Anomaly detection follows the same pattern. Create a model, train it on historical data, and then use it to flag outliers.

-- Create anomaly detection model on pipeline metrics
CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION pipeline_anomaly_detector(
    INPUT_DATA => SYSTEM$REFERENCE('TABLE', 'monitoring.pipeline_metrics'),
    SERIES_COLNAME => 'pipeline_name',
    TIMESTAMP_COLNAME => 'metric_timestamp',
    TARGET_COLNAME => 'rows_processed',
    LABEL_COLNAME => ''  -- unsupervised
);

-- Detect anomalies in the last 24 hours
CALL pipeline_anomaly_detector!DETECT_ANOMALIES(
    INPUT_DATA => SYSTEM$REFERENCE('TABLE', 'monitoring.pipeline_metrics_latest'),
    SERIES_COLNAME => 'pipeline_name',
    TIMESTAMP_COLNAME => 'metric_timestamp',
    TARGET_COLNAME => 'rows_processed',
    CONFIG_OBJECT => {'prediction_interval': 0.99}
);

-- Alert on detected anomalies
SELECT
    series AS pipeline_name,
    ts AS detected_at,
    y AS actual_value,
    forecast AS expected_value,
    CASE WHEN is_anomaly THEN 'ANOMALY' ELSE 'NORMAL' END AS status
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE is_anomaly = TRUE
ORDER BY ts DESC;

We feed the anomaly results into a Snowflake alert that sends Slack notifications. The entire pipeline runs inside Snowflake with zero external infrastructure. That is the real value proposition here: not that the ML is better than what you could build yourself, but that the operational overhead vanishes.

Contribution Explorer

Contribution Explorer is the least talked about ML function, but it has saved us hours of manual investigation. When a metric changes unexpectedly, it tells you which dimensions contributed most to the change.

-- Why did revenue drop last week?
CREATE OR REPLACE SNOWFLAKE.ML.CONTRIBUTION_EXPLORER revenue_change_analysis(
    INPUT_DATA => SYSTEM$REFERENCE('TABLE', 'analytics.daily_revenue_detailed'),
    LABEL_COLNAME => 'period',  -- 'baseline' vs 'comparison'
    METRIC_COLNAME => 'revenue'
);

CALL revenue_change_analysis!EXPLAIN();

-- View the top contributors to the change
SELECT
    contributor,
    contribution_score,
    baseline_avg,
    comparison_avg,
    relative_change_pct
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
ORDER BY ABS(contribution_score) DESC
LIMIT 10;

Last month, we had an unexplained 18% revenue drop. Contribution Explorer identified in under 60 seconds that it was driven by a single product category in two regions, caused by a pricing configuration error. Manually, that would have taken a data analyst half a day of slicing and dicing dashboards.

Cost Analysis: Cortex Credits vs External APIs

This is the section most people want to read, so let me be direct. Snowflake Cortex AI is not cheap. But the comparison is nuanced.

Cortex LLM functions are billed in Cortex credits, which are separate from your regular compute credits. As of early 2026, the pricing works out roughly like this:

OperationCortex AI Cost (approx)External Alternative (approx)Verdict
COMPLETE (Mistral Large, 1K tokens)~$0.008OpenAI GPT-4o: ~$0.005Cortex is 60% more expensive
COMPLETE (Llama 3.1 70B, 1K tokens)~$0.004Together.ai Llama 70B: ~$0.002Cortex is ~2x more expensive
SUMMARIZE (per document)~$0.01OpenAI with prompt: ~$0.006Cortex slightly more
SENTIMENT (per text)~$0.001AWS Comprehend: ~$0.0001Cortex is 10x more expensive
TRANSLATE (per 1K chars)~$0.005DeepL API: ~$0.002Cortex 2-3x more expensive
Cortex Search (per month, small)~$400-800Pinecone Starter: ~$70Cortex much more expensive
ML Forecast (per model)~$2-5 per trainingProphet on EC2: ~$0.50Cortex more expensive but zero ops

On pure unit cost, Cortex loses almost every comparison. So why did we keep using it? Three reasons:

  1. No data movement. Sending 50 million customer records to an external API means egress costs, security reviews, and compliance headaches. Keeping everything in Snowflake eliminated about $2,000/month in data transfer costs and a six-week security review process.
  2. No infrastructure. We did not need to build, deploy, or monitor a separate inference service. The engineering time savings were roughly 2 FTE-weeks per quarter.
  3. Governance. Every Cortex call is logged in Snowflake's query history. We can audit exactly who ran what LLM function on which data. Try getting that from a standalone API integration.

Our monthly Cortex spend settled at approximately $3,200. The equivalent workload through external APIs would cost about $1,400 in raw API fees, plus approximately $800 in infrastructure and data transfer. So Cortex is roughly 50% more expensive in total cost of ownership, but with dramatically less operational complexity. Whether that trade-off is worth it depends entirely on your team's capacity and priorities.

Limitations and Gotchas: What Nobody Tells You

Three months of production use surfaced a number of issues that are not in the documentation. Here is what caught us off guard.

Latency Is Unpredictable

COMPLETE calls to the same model with the same prompt length can vary from 1.2 seconds to 9 seconds. There is no SLA on Cortex LLM latency. We had a dashboard that called SENTIMENT on incoming support tickets in near-real-time, and the latency spikes made it unusable. We moved that to a scheduled task that runs every 15 minutes instead.

Row Limits on LLM Functions

Do not try to run COMPLETE on a million-row table. Snowflake will throttle you aggressively. We found that batches of 500 to 1,000 rows work best. Anything larger and you start getting timeout errors or the query runs for hours. We built a simple batching pattern:

-- Batch processing pattern for Cortex LLM functions
-- Process in chunks of 500 using a control table
CREATE OR REPLACE TABLE processing.cortex_batch_control AS
SELECT
    ROW_NUMBER() OVER (ORDER BY id) AS row_num,
    id,
    text_column
FROM source_table
WHERE needs_processing = TRUE;

-- Process batch N (run in a loop via task or external orchestrator)
SET batch_size = 500;
SET batch_number = 1;  -- increment this

INSERT INTO results_table (id, llm_output)
SELECT
    id,
    SNOWFLAKE.CORTEX.COMPLETE('mistral-large2', text_column) AS llm_output
FROM processing.cortex_batch_control
WHERE row_num BETWEEN (($batch_number - 1) * $batch_size + 1)
                    AND ($batch_number * $batch_size);

Model Availability Is Limited

You cannot use GPT-4, Claude, or Gemini through Cortex. You are limited to Snowflake Arctic, Mistral models, Llama models, and a few others. For many tasks, Mistral Large is good enough. For complex reasoning or nuanced text generation, the quality gap with GPT-4o or Claude 3.5 is noticeable. We use Cortex for bulk processing tasks where good-enough is fine, and external APIs for high-stakes generations.

Cortex Search Cannot Do Hybrid Search

Cortex Search does vector similarity search. It does not support combining vector search with keyword search (hybrid search), which is what most production RAG systems end up needing. If you need BM25 plus vector similarity with a reranker, you are still going to need an external solution.

No Streaming Support

COMPLETE returns the entire response at once. There is no streaming option. If you are building a chatbot or any interactive application that benefits from token-by-token streaming, Cortex is not the right choice. We built our customer-facing chatbot using direct API calls to Anthropic and only use Cortex for backend batch processing.

Region Restrictions

Not all Cortex features are available in all Snowflake regions. We are on AWS us-east-1 and had full access, but a colleague's team on Azure West Europe was missing several models and the fine-tuning feature entirely. Check the documentation for your region before committing to a Cortex-dependent architecture.

Cortex AI vs Doing It Yourself: An Honest Comparison

For context, here is what our pre-Cortex architecture looked like for the same workloads:

Snowflake (data) --> Airflow DAG --> Python script --> OpenAI API --> Write results back to Snowflake. Total components: 4 services, 2 API keys, 1 orchestrator, 3 monitoring dashboards, and a partridge in a pear tree.

With Cortex:

Snowflake (data) --> SQL query with Cortex function --> Results in same table. Total components: 1 service, 0 API keys, 0 external orchestrators.

That simplification is real and it matters. Every external service is a point of failure, a secret to rotate, a version to upgrade, and a vendor to negotiate with. Cortex eliminates all of that for workloads that fit within its capabilities.

But the capabilities have boundaries. Here is my decision framework after three months:

Use CaseUse CortexUse External
Bulk sentiment analysisYes
Text classification (clear categories)Yes
Summarization (batch)Yes
Translation (major languages)Yes
Time series forecastingAbsolutely
Anomaly detectionAbsolutely
Simple RAG / knowledge base searchYes, if basic
Customer-facing chatbotYes (need streaming)
Complex reasoning tasksYes (GPT-4o/Claude)
Hybrid search (vector + keyword)Yes (Elasticsearch, Vespa)
Translation (CJK languages)Yes (DeepL)
Real-time inference (<500ms)Yes (dedicated endpoint)
Custom embedding modelsYes (bring your own)

The Verdict: Where Snowflake Cortex AI Shines

After three months of production use, my overall assessment of Snowflake Cortex AI is cautiously positive. It is not a replacement for a dedicated ML platform. It is not going to obsolete your OpenAI or Anthropic API keys. But it fills a very specific and genuinely useful niche: bringing AI capabilities to data teams who are already in Snowflake and want to augment their data pipelines without building external infrastructure.

The ML functions (forecasting, anomaly detection, contribution explorer) are the standout. They are production-ready, reasonably priced for the zero-ops convenience, and genuinely save engineering time. I would recommend these to any Snowflake customer without hesitation.

The LLM functions are useful for batch processing where you need good-enough quality on large volumes of data and the operational simplicity outweighs the cost premium. They are not suitable for latency-sensitive or quality-critical applications.

Cortex Search is good for simple internal search applications but lacks the flexibility of dedicated vector databases. Fine-tuning is promising but limited in model selection and expensive to iterate on.

The biggest risk I see is lock-in. Every Cortex function is Snowflake-proprietary SQL. If you build your entire ML pipeline on Cortex and later need to move to Databricks or BigQuery, you are rewriting everything. For some teams, that is an acceptable trade-off. For others, it is a dealbreaker.

My recommendation: start with the ML functions and SENTIMENT. Those deliver the most value with the least downside. Evaluate the LLM functions for batch workloads where data governance matters more than per-token cost. Skip Cortex Search unless your search requirements are genuinely simple. And keep your external API integrations for anything customer-facing or quality-critical.

Snowflake Cortex AI is not the future of ML engineering. But it is a genuinely useful addition to the Snowflake toolkit, and after three months, it has earned a permanent place in our stack for the right workloads.

Leave a Comment