pip
pip install Connect Clients Databases CREATE DATABASE DROP DATABASE SHOW DATABASES USE Native Queries Projects CREATE PROJECT DROP PROJECT SHOW PROJECTS Tables & Views CREATE TABLE DROP TABLE CREATE VIEW ALTER VIEW DROP VIEW SQL Reference SELECT INSERT INTO UPDATE DELETE JOIN CASE WHEN CTEs (WITH) Knowledge Bases Overview CREATE KNOWLEDGE_BASE Parameters ALTER KNOWLEDGE_BASE INSERT INTO KB Query / Semantic Search Hybrid Search DESCRIBE KB DROP KNOWLEDGE_BASE Agents Overview CREATE AGENT SELECT FROM AGENT ALTER AGENT DROP / SHOW AGENTS Jobs CREATE JOB LAST keyword DROP JOB SHOW / Query JOBS Functions Standard Functions Variables FROM_ENV() SQL Reference MindsDB Query Engine — Semantic Search via SQL MindsDB lets you build semantic search pipelines entirely in SQL. Connect your data sources, shape them with views and tables, index unstructured content into knowledge bases, and expose everything through a natural-language agent — all without leaving your SQL client. 🗄️ Connect Data ──→ 🧠 Index in KB ──→ 🤖 Query via Agent optional 🔧 Views + 📁 Projects Quickstart Five SQL statements to go from zero to a working semantic search agent: SQL Copy -- 1. Connect your data source CREATE DATABASE my_pg WITH ENGINE = 'postgres', PARAMETERS = {"host":"localhost","port":5432,"user":"user","password":"pass","database":"mydb"}; -- 2. Create a project to organize work CREATE PROJECT search_project; -- 3. Create a knowledge base (semantic index) CREATE KNOWLEDGE_BASE search_project.docs_kb USING embedding_model = {"provider":"openai","model_name":"text-embedding-3-large","api_key":"sk-..."}, content_columns = ['body'], metadata_columns = ['title','category'], id_column = 'doc_id'; -- 4. Index your data INSERT INTO search_project.docs_kb SELECT doc_id, title, category, body FROM my_pg.documents; -- 5. Build the agent CREATE AGENT search_project.my_agent USING model = {"provider":"openai","model_name":"gpt-4o","api_key":"sk-..."}, data = {"knowledge_bases":["search_project.docs_kb"]}, prompt_template = 'docs_kb contains product documentation. Answer user questions using it.'; -- Query it SELECT answer FROM search_project.my_agent WHERE question = 'How do I reset my password?'; Setup # Docker The fastest way to run MindsDB locally. Choose the image that fits your integrations: Image Includes mindsdb/mindsdb:latest mysql, postgresql, snowflake, bigquery, mssql, salesforce mindsdb/mindsdb:huggingface All of the above + HuggingFace BASH Copy docker run --name mindsdb_container \ -e MINDSDB_APIS=http,mysql \ -p 47334:47334 -p 47335:47335 \ mindsdb/mindsdb Access the MindsDB editor at http://127.0.0.1:47334 . The MySQL API is available at port 47335 . 💡 Persist data: add -v $(pwd)/mdb_data:/root/mdb_storage to keep models and configs across restarts. ✓ With auth: pass -e MINDSDB_USERNAME=admin -e MINDSDB_PASSWORD=password to enable login. ⚠ Local databases: use http://host.docker.internal instead of localhost when the target DB runs in another container. Container management BASH Copy docker stop mindsdb_container # stop docker start mindsdb_container # restart (preserves state) docker logs -f mindsdb_container # follow logs Install integration dependencies Once the container is running, install extra integrations directly from the editor ( Settings → Manage Integrations ), or via shell: BASH Copy docker exec mindsdb_container pip install mindsdb[pgvector] # pip install BASH Copy pip install mindsdb # with extras pip install mindsdb[pgvector,openai,postgres] BASH — start Copy python -m mindsdb Editor: http://127.0.0.1:47334 · MySQL API: port 47335 · PostgreSQL API: port 47336 # Connect Clients MindsDB exposes a MySQL-compatible wire protocol. Any MySQL client can connect: Client Host Port User Password MySQL CLI / DBeaver / TablePlus 127.0.0.1 47335 mindsdb (empty) MindsDB Editor http://127.0.0.1:47334 SQLAlchemy mysql+pymysql://mindsdb@127.0.0.1:47335/mindsdb BASH — MySQL CLI Copy mysql -h 127.0.0.1 --port 47335 -u mindsdb -p PYTHON — SQLAlchemy Copy from sqlalchemy import create_engine engine = create_engine("mysql+pymysql://mindsdb@127.0.0.1:47335/mindsdb") with engine.connect() as conn: result = conn.execute("SELECT answer FROM my_agent WHERE question = 'hello'") Databases Databases are connections to external data sources — your Postgres, MySQL, S3, Snowflake, MongoDB, etc. MindsDB never copies your data ; it queries it live each time. # CREATE DATABASE CREATE DATABASE Connect an external data source CREATE DATABASE [IF NOT EXISTS] datasource_name [WITH] [ENGINE [=] engine_name] [,] [PARAMETERS [=] { "key": "value", ... }]; datasource_name Unique identifier for this connection within MindsDB. engine_name The handler to use (e.g. 'postgres' , 'mysql' , 'mongodb' , 'snowflake' , 's3' ). PARAMETERS JSON object of connection parameters specific to the engine (host, port, user, password, database, etc.). Examples PostgreSQL Copy CREATE DATABASE my_postgres WITH ENGINE = 'postgres', PARAMETERS = { "host": "127.0.0.1", "port": 5432, "user": "postgres", "password": "password", "database": "mydb" }; MySQL Copy CREATE DATABASE my_mysql WITH ENGINE = 'mysql', PARAMETERS = { "host": "127.0.0.1", "port": 3306, "user": "root", "password": "password", "database": "mydb" }; MongoDB Copy CREATE DATABASE my_mongo WITH ENGINE = 'mongodb', PARAMETERS = { "host": "mongodb+srv://user:pass@cluster.mongodb.net", "database": "mydb" }; 💡 Query all available data handlers: SELECT * FROM information_schema.handlers WHERE type = 'data'; # DROP DATABASE DROP DATABASE Remove a connected data source DROP DATABASE [IF EXISTS] database_name; ⚠ Dropping a database removes the connection only. No data in the external source is affected. # SHOW DATABASES SQL Copy -- List all databases and projects SHOW DATABASES; SHOW FULL DATABASES; -- Filter to data sources only SHOW FULL DATABASES WHERE type = 'data'; -- Via information_schema SELECT * FROM information_schema.databases; -- List tables in a database SHOW TABLES FROM datasource_name; # USE Switch context to a database or project. Subsequent queries can omit the database prefix. SQL Copy USE datasource_name; -- Now you can query directly SELECT * FROM table_name LIMIT 10; -- Switch to a project USE my_project; # Native Queries Pass database-native syntax through MindsDB without translation. Useful for database-specific functions, MongoDB-QL, Snowflake SQL extensions, etc. SQL — PostgreSQL native query Copy SELECT * FROM my_postgres ( SELECT model, year, price, ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml, COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell FROM demo_data.used_cars ); SQL — MongoDB-QL native query Copy SELECT * FROM my_mongo ( db.products.find({"category": "electronics"}).limit(50) ); SQL — create view from native query Copy CREATE VIEW enriched_cars FROM my_postgres ( SELECT *, ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml FROM demo_data.used_cars ); Projects Projects are namespaces that hold knowledge bases, agents, views, and jobs. They let you organize your AI pipeline by use case (e.g. search_prod , search_staging ). # CREATE PROJECT CREATE PROJECT CREATE PROJECT [IF NOT EXISTS] project_name; SQL Copy CREATE PROJECT search_project; CREATE PROJECT IF NOT EXISTS analytics; # DROP PROJECT DROP PROJECT DROP PROJECT [IF EXISTS] project_name; # SHOW PROJECTS SQL Copy -- List all projects SHOW DATABASES WHERE type = 'project'; SHOW FULL DATABASES WHERE type = 'project'; -- List objects in a project SHOW TABLES FROM project_name; SHOW VIEWS FROM project_name; SHOW KNOWLEDGE_BASES FROM project_name; Tables & Views Use tables and views to shape and materialize data before indexing it into a knowledge base. Views are saved SELECT statements (virtual); tables are materialized results (physical). # CREATE TABLE Create an empty table or materialize a query result into an integration. CREATE TABLE Empty table or filled from query -- Empty table CREATE TABLE integration_name.table_name ( column_name data_type, ... ); -- Filled from query CREATE TABLE integration_name.table_name (SELECT ...); -- Replace if exists CREATE OR REPLACE TABLE integration_name.table_name (SELECT ...); SQL — materialize cleaned data Copy -- Clean and materialize support tickets for KB ingestion CREATE OR REPLACE TABLE my_pg.clean_tickets ( SELECT id, category, TRIM(LOWER(subject)) AS subject, body, created_at FROM my_pg.raw_support_tickets WHERE body IS NOT NULL AND LENGTH(body) > 20 ); # DROP TABLE DROP TABLE DROP TABLE table_name; DROP TABLE files.uploaded_file; # CREATE VIEW A view is a saved SELECT that executes on every access. Perfect for data preparation before sending to a knowledge base. CREATE VIEW CREATE VIEW [IF NOT EXISTS] [project_name.]view_name AS ( SELECT columns FROM integration_name.table_name WHERE ... ); SQL — create filtered view for KB ingestion Copy -- View joining tickets with product metadata for richer indexing CREATE VIEW search_project.enriched_tickets AS ( SELECT t.id, t.body, t.created_at, p.name AS product_name, p.category AS product_category FROM my_pg.support_tickets t JOIN my_pg.products p ON t.product_id = p.id WHERE t.status = 'closed' ); SQL — list views Copy SHOW VIEWS; SHOW FULL VIEWS; # ALTER VIEW ALTER VIEW -- Standard syntax ALTER VIEW view_name [AS] ( SELECT * FROM integration_name.table_name ); -- With explicit source ALTER VIEW view_name FROM integration_name ( SELECT * FROM table_name ); # DROP VIEW DROP VIEW DROP VIEW [IF EXISTS] view_name; SQL Reference # SELECT MindsDB supports standard SELECT with all standard clauses. Queries that reference one integration are pushed down to that engine. Cross-integration queries are executed in MindsDB's DuckDB-backed engine. SQL — basic select Copy SELECT location, MAX(sqft) FROM my_pg.home_rentals GROUP BY location LIMIT 5; SQL — subquery on integration data Copy -- Wrap in subquery when integration doesn't support GROUP BY SELECT type, MAX(bedrooms), LAST(price) FROM my_mongo ( db.house_sales.find().limit(300) ) GROUP BY 1; SQL — UNION ALL Copy SELECT id, content, 'tickets' AS source FROM my_pg.support_tickets UNION ALL SELECT id, body AS content, 'forums' AS source FROM my_pg.forum_posts; # INSERT INTO Insert rows into an integration table from a subquery. The destination table must already exist. INSERT INTO INSERT INTO integration_name.table_name (SELECT ...); SQL Copy -- Archive processed tickets into a separate table INSERT INTO my_pg.archived_tickets ( SELECT * FROM my_pg.support_tickets WHERE resolved_at < '2024-01-01' ); # UPDATE SQL — standard update Copy UPDATE my_pg.support_tickets SET status = 'archived' WHERE resolved_at < '2024-01-01'; SQL — update from select Copy UPDATE my_pg.products ON product_id FROM ( SELECT product_id, new_description AS description FROM my_pg.product_updates ); # DELETE SQL Copy -- Simple delete DELETE FROM my_pg.table_name WHERE column_name = 'value'; -- Delete with subquery DELETE FROM my_pg.support_tickets WHERE id IN ( SELECT id FROM my_pg.resolved_tickets WHERE resolved_at < '2023-01-01' ); # JOIN Standard SQL JOINs work across tables within the same integration or after bridging with a view/subquery. Use JOINs to denormalize and enrich data before indexing. SQL — cross-table join for data prep Copy -- Enrich tickets with user and product info before KB insert SELECT t.id, t.body, u.name AS user_name, u.plan AS user_plan, p.name AS product_name, p.category AS product_category FROM my_pg.support_tickets t JOIN my_pg.users u ON t.user_id = u.id JOIN my_pg.products p ON t.product_id = p.id WHERE t.body IS NOT NULL; SQL — multi-source join via subqueries Copy SELECT pg_data.id, mongo_data.tags FROM (SELECT id, title FROM my_pg.articles) AS pg_data JOIN (SELECT article_id, tags FROM my_mongo.article_tags) AS mongo_data ON pg_data.id = mongo_data.article_id; # CASE WHEN Standard conditional logic in SELECT, WHERE, and other clauses. SQL Copy SELECT id, body, CASE WHEN priority = 1 THEN 'critical' WHEN priority BETWEEN 2 AND 3 THEN 'high' WHEN priority = 4 THEN 'medium' ELSE 'low' END AS priority_label FROM my_pg.support_tickets; # CTEs (WITH) Common Table Expressions create named temporary result sets for modular, readable queries. SQL Copy WITH -- Step 1: Get recent tickets recent AS ( SELECT id, user_id, product_id, body FROM my_pg.support_tickets WHERE created_at > '2024-01-01' ), -- Step 2: Join product info enriched AS ( SELECT r.id, r.body, p.name AS product_name, p.category AS category FROM recent r JOIN my_pg.products p ON r.product_id = p.id ) -- Final: Insert into KB INSERT INTO search_project.tickets_kb SELECT * FROM enriched; Knowledge Bases A knowledge base is the semantic index at the heart of MindsDB's search capabilities. It combines an embedding model , an optional reranking model , and a vector store to enable context-aware retrieval over any data you load into it. 🧠 Knowledge bases match content by meaning , not keywords. "reset credentials" and "forgot password" return the same document even though no words overlap. How it works 1 Create Register the KB with an embedding model, optional reranking model, storage backend, and column mapping. 2 Insert Feed rows from any table or view. Each row is chunked, embedded, and written to the vector store. 3 Query Use WHERE content = '...' for semantic search, metadata columns for filtering, and relevance to threshold results. 4 Connect to Agent Reference the KB in a CREATE AGENT statement — the agent reasons over it automatically. # CREATE KNOWLEDGE_BASE CREATE KNOWLEDGE_BASE CREATE KNOWLEDGE_BASE [project_name.]kb_name USING embedding_model = { "provider": "...", "model_name": "...", "api_key": "..." }, reranking_model = { "provider": "...", "model_name": "...", "api_key": "..." }, storage = vector_db_conn.storage_table, metadata_columns = ['col1', 'col2', ...], content_columns = ['col_a', 'col_b', ...], id_column = 'id_col'; SQL — full example Copy -- First connect PGVector as the vector store CREATE DATABASE my_pgvector WITH ENGINE = 'pgvector', PARAMETERS = { "host": "127.0.0.1", "port": 5432, "database": "postgres", "user": "user", "password": "password", "distance": "cosine" }; -- Create the knowledge base CREATE KNOWLEDGE_BASE search_project.support_kb USING embedding_model = { "provider": "openai", "model_name": "text-embedding-3-large", "api_key": "sk-..." }, reranking_model = { "provider": "openai", "model_name": "gpt-4o", "api_key": "sk-...", "method": "multi-class" }, storage = my_pgvector.support_kb_store, metadata_columns = ['product_name', 'priority', 'created_at'], content_columns = ['subject', 'body'], id_column = 'ticket_id'; # Parameters embedding_model Required. Converts text into vector representations for similarity search. Provider Required params Optional params openai model_name , api_key base_url , api_version openai_azure model_name , api_key , base_url , api_version — google model_name , api_key — bedrock model_name , aws_access_key_id , aws_region_name , aws_secret_access_key aws_session_token snowflake model_name , api_key , account_id — ollama model_name , base_url — 💡 Define default_embedding_model in config.json to avoid specifying it on every CREATE KNOWLEDGE_BASE . reranking_model Optional. Scores result chunks for relevance using an LLM. Supports same providers as embedding_model . Set to false to disable. SQL — disable reranking Copy CREATE KNOWLEDGE_BASE my_kb USING embedding_model = { ... }, reranking_model = false, ...; method Description multi-class (default) Classifies each chunk into 4 relevance levels (0.25 / 0.5 / 0.75 / 1.0). Relevance = weighted sum of class probabilities. binary Relevant / not relevant. Uses log probability of the positive class. storage The vector database to store embeddings in. Connect it first with CREATE DATABASE . ✓ Recommended: PGVector ≥ 0.8.0 for best performance and hybrid search support. 💡 MindsDB Docker Desktop Extension includes a built-in PGVector — storage is optional when using it. metadata_columns Array of column names used as metadata. Metadata enables fast pre-filtering before or alongside semantic search. ⚠ A column cannot be in both metadata_columns and content_columns . content_columns Array of column names whose text gets chunked and embedded. Multiple columns are concatenated. Defaults to a column named content if not specified. id_column Column that uniquely identifies each source row. Optional — defaults to the MD5 hash of content columns. Used for upsert logic when re-inserting data. SQL — auto-generate ID when none exists Copy INSERT INTO my_kb ( SELECT ROW_NUMBER() OVER (ORDER BY created_at) AS id, * FROM my_pg.raw_documents ); # ALTER KNOWLEDGE_BASE Modify an existing KB configuration. The storage backend and embedding model type cannot be changed (would break existing embeddings), but you can rotate API keys, swap reranking models, and update column mappings. ALTER KNOWLEDGE_BASE ALTER KNOWLEDGE_BASE kb_name USING param_name = value, ...; SQL — rotate API key Copy ALTER KNOWLEDGE_BASE support_kb USING embedding_model = { "api_key": "sk-new-key" }; SQL — swap reranking model Copy ALTER KNOWLEDGE_BASE support_kb USING reranking_model = { "provider": "google", "model_name": "gemini-2.0-flash", "api_key": "AIza..." }; SQL — update metadata columns Copy -- Add new fields while keeping existing ones filterable ALTER KNOWLEDGE_BASE support_kb USING metadata_columns = ['product_name', 'priority', 'created_at', 'region']; ⚠ Changing metadata_columns doesn't remove old stored metadata. Only the columns listed in the most recent ALTER can be used in WHERE filters going forward. # INSERT INTO Knowledge Base Feed data into the KB. Rows are chunked, embedded, and stored in the vector database. SQL — insert from table Copy INSERT INTO search_project.support_kb SELECT ticket_id, subject, body, product_name, priority, created_at FROM my_pg.support_tickets; SQL — insert from view (enriched data) Copy INSERT INTO search_project.support_kb SELECT * FROM search_project.enriched_tickets; SQL — incremental insert (new rows only) Copy INSERT INTO search_project.support_kb SELECT ticket_id, subject, body, product_name, priority, created_at FROM my_pg.support_tickets WHERE created_at > ( SELECT MAX(created_at) FROM search_project.support_kb ); Performance options SQL — skip duplicate check for bulk loads Copy -- Faster inserts when you know there are no duplicates INSERT INTO my_kb SELECT * FROM my_pg.documents USING kb_no_upsert = true; 💡 Track insert progress: SELECT * FROM information_schema.queries; # Query Knowledge Base / Semantic Search Query a knowledge base using the content pseudo-column for semantic search, metadata columns for structured filtering, and relevance for scoring. Output columns Column Description id Source row identifier (from id_column ) chunk_id Chunk identifier: <id>:<n>of<total>:<start>to<end> chunk_content The actual text of this chunk metadata JSON blob with all stored metadata fields metadata cols Individual metadata columns (e.g. product_name , priority ) distance Raw vector distance (lower = more similar) relevance Score 0–1 from reranking model (or 1/(1+distance) if no reranker) Semantic search SQL — basic semantic search Copy SELECT * FROM search_project.support_kb WHERE content = 'how do I reset my password' LIMIT 5; SQL — with relevance threshold Copy SELECT id, chunk_content, product_name, relevance FROM search_project.support_kb WHERE content = 'billing issue' AND relevance >= 0.6 LIMIT 10; Metadata filtering SQL — combine semantic + metadata filter Copy -- Only search within a specific product and priority SELECT id, chunk_content, relevance FROM search_project.support_kb WHERE content = 'cannot connect' AND product_name = 'DataSync Pro' AND priority <= 2 AND relevance >= 0.5; SQL — metadata-only filter (no vector search) Copy SELECT * FROM search_project.support_kb WHERE product_name = 'DataSync Pro' AND created_at BETWEEN '2024-01-01' AND '2024-12-31'; Supported filtering operators Type Operators Semantic (content col) = 'query' , LIKE 'query' , NOT LIKE , IN ('q1','q2') , NOT IN , OR , AND (intersection) Metadata = , != , <> , > , < , >= , <= , BETWEEN , LIKE , IN , NOT IN , AND , OR , NOT Exclusion id != x , id NOT IN (SELECT id FROM kb WHERE content = '...') 💡 Default LIMIT is 10. Default relevance threshold is ≥ 0 (no filtering). Specify both to control result count and quality independently. # Hybrid Search Hybrid search combines semantic similarity (vector embeddings) with exact keyword matching (BM25 full-text index). Use it when your queries include specific identifiers, acronyms, product codes, or technical terms that embeddings might miss. ⚠ Hybrid search requires PGVector as the knowledge base storage backend. SQL — enable hybrid search (default alpha) Copy SELECT * FROM search_project.support_kb WHERE content = 'ACME-213 error' AND hybrid_search = true; -- alpha defaults to 0.5 SQL — tune the semantic/keyword balance Copy -- hybrid_search_alpha: 0 = pure keyword, 1 = pure semantic SELECT * FROM search_project.support_kb WHERE content = 'ticket ERR-4421' AND hybrid_search_alpha = 0.3; -- lean toward exact keyword match SQL — disable reranker for hybrid search Copy -- Uses alpha-weighted average of BM25 + embedding scores instead SELECT * FROM search_project.support_kb WHERE content = 'ERR-4421' AND hybrid_search_alpha = 0.2 AND reranking = false; How it works When you trigger hybrid search, both paths run in parallel : Path Method Best for Semantic Embedding vector similarity Conceptual queries, natural language, paraphrases Keyword BM25 full-text index Exact terms, product codes, acronyms, ticket IDs Results from both paths are merged and reranked (via the KB's reranking model if available, or via alpha-weighted averaging if not). ✓ When to use hybrid search: any time users search for specific identifiers, technical terms, model numbers, or internal terminology alongside natural language queries. # DESCRIBE KNOWLEDGE_BASE SQL Copy DESCRIBE KNOWLEDGE_BASE support_kb; -- List all knowledge bases SHOW KNOWLEDGE_BASES; SHOW KNOWLEDGE_BASES WHERE project = 'search_project'; The DESCRIBE output includes: NAME , PROJECT , STORAGE , PARAMS (embedding/reranking config), INSERT_STARTED_AT , INSERT_FINISHED_AT , PROCESSED_ROWS , ERROR . # DROP KNOWLEDGE_BASE DROP KNOWLEDGE_BASE Removes the KB and all stored embeddings DROP KNOWLEDGE_BASE [IF EXISTS] kb_name; DROP KNOWLEDGE_BASE [IF EXISTS] project_name.kb_name; ⚠ This permanently removes all embeddings and metadata from the vector store. Cannot be undone. Agents An agent is the conversational interface over your data. It combines an LLM with access to knowledge bases and database tables, enabling natural language queries over structured and unstructured data alike. This is the final piece of the semantic search pipeline. How agents work 1 Input Processing Builds a real-time data catalog from 5-row samples of each connected object. Extracts the question and structures LLM input. 2 Planning Determines which knowledge bases and tables are relevant. Prepares SQL queries as needed. 3 Exploration Loop Executes queries, collects results, adjusts if needed. Up to 20 queries per request. 4 Synthesis Aggregates results and synthesizes a natural language or structured response. 💡 Performance tip: keep connected objects to ≤ 10. Create views to pre-aggregate and simplify data before connecting to the agent. The clearer your prompt_template , the more accurate the responses. # CREATE AGENT CREATE AGENT CREATE AGENT [project_name.]agent_name USING model = { "provider": "openai", "model_name": "gpt-4o", "api_key": "sk-...", "base_url": "https://...", -- optional "api_version": "2024-02-01" -- optional, required for Azure }, data = { "knowledge_bases": ["project.kb_name", ...], "tables": ["datasource.table_name", ...] }, prompt_template = 'describe your data here', timeout = 30, mode = 'text'; -- 'text' or 'sql' SQL — semantic search agent Copy CREATE AGENT search_project.support_agent USING model = { "provider": "openai", "model_name": "gpt-4o", "api_key": "sk-..." }, data = { "knowledge_bases": ["search_project.support_kb"], "tables": [ "my_pg.products", "my_pg.users" ] }, prompt_template = ' search_project.support_kb contains customer support tickets with fields: - chunk_content: ticket text - product_name: product the ticket is about - priority: 1 (critical) to 4 (low) - created_at: submission date my_pg.products contains product catalog. my_pg.users contains user account data. Answer questions accurately and cite which tickets are relevant. ', timeout = 60; Supported LLM providers Provider Notable models openai gpt-4o, gpt-4.1, gpt-4.1-mini, o3-mini, o4-mini anthropic claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307 google gemini-2.5-pro-preview-03-25, gemini-2.0-flash, gemini-1.5-pro bedrock All AWS Bedrock models (requires aws_region_name, aws_access_key_id, aws_secret_access_key) ollama llama2, mistral, mixtral, codellama, gemma, phi, qwen, and more nvidia_nim meta/llama-3_1-70b-instruct, mistralai/mistral-large, and more writer palmyra-x5, palmyra-x4 Connect all objects from a source at once SQL — wildcard data connection Copy CREATE AGENT my_agent USING model = { ... }, data = { "knowledge_bases": ["search_project.*"], -- all KBs in project "tables": ["my_pg.*"] -- all tables in datasource }, prompt_template = '...'; Use default model from config JSON — config.json Copy { "default_llm": { "provider": "openai", "model_name": "gpt-4o", "api_key": "sk-..." } } When default_llm is set, omit the model parameter from CREATE AGENT . # SELECT FROM AGENT Query the agent with a natural language question. The agent returns either a free-text answer or structured columns depending on how you write the SELECT. SQL — natural language answer Copy SELECT answer FROM search_project.support_agent WHERE question = 'What are the most common issues with DataSync Pro?'; SQL — structured output Copy -- Agent formats its response to match the requested columns SELECT issue_type, ticket_count, example_ticket_id FROM search_project.support_agent WHERE question = 'What are the top 5 issue types for DataSync Pro this month?'; SQL — override params at query time Copy -- Test with a different model without changing the agent definition SELECT answer FROM search_project.support_agent WHERE question = 'Summarize open critical tickets' USING model = { "provider": "anthropic", "model_name": "claude-3-5-sonnet-20241022", "api_key": "sk-ant-..." }; # ALTER AGENT ALTER AGENT Update model, data, or prompt — any combination ALTER AGENT agent_name USING model = { ... }, data = { "knowledge_bases": [...], "tables": [...] }, prompt_template = '...'; SQL — add a new KB to an agent Copy ALTER AGENT search_project.support_agent USING data = { "knowledge_bases": [ "search_project.support_kb", "search_project.docs_kb" -- new KB added ], "tables": ["my_pg.products"] }; SQL — upgrade the model Copy ALTER AGENT search_project.support_agent USING model = { "provider": "openai", "model_name": "gpt-4.1", "api_key": "sk-..." }; # DROP / SHOW AGENTS SQL Copy -- Drop an agent DROP AGENT agent_name; DROP AGENT project_name.agent_name; -- List agents SHOW AGENTS; SHOW AGENTS WHERE project = 'search_project'; SHOW AGENTS WHERE name = 'support_agent'; Jobs Jobs schedule any SQL statement (or sequence of statements) to run automatically — once at a future time, or repeatedly on an interval. Use jobs to keep your knowledge bases up to date as new data arrives. # CREATE JOB CREATE JOB CREATE JOB [IF NOT EXISTS] [project_name.]job_name [AS] ( <statement_1>[; <statement_2>][; ...] ) [START <date>] [END <date>] [EVERY [number] <period>] [IF (<condition_statement>)]; Clause Description START <date> When to begin. Defaults to now if omitted. END <date> When to stop recurring. Omit for indefinite repetition. EVERY [n] <period> Repetition frequency. Omit to run once. Period values: minute , hour , day , week , month . IF (...) Only execute if the condition query returns rows. Date format: 'YYYY-MM-DD HH:MM:SS' or 'YYYY-MM-DD' . Timezone: UTC. SQL — refresh KB every hour Copy CREATE JOB search_project.refresh_support_kb ( INSERT INTO search_project.support_kb SELECT ticket_id, subject, body, product_name, priority, created_at FROM my_pg.support_tickets WHERE created_at > LAST ) EVERY hour; SQL — conditional job (only if new data exists) Copy CREATE JOB search_project.conditional_kb_refresh ( INSERT INTO search_project.support_kb SELECT ticket_id, subject, body, product_name, priority, created_at FROM my_pg.support_tickets WHERE created_at > LAST ) EVERY 30 minutes IF ( SELECT * FROM my_pg.support_tickets WHERE created_at > LAST ); SQL — multi-step job with dynamic table name Copy CREATE JOB search_project.daily_snapshot ( CREATE TABLE my_pg.