Building Data Pipelines with dlt MCP and Continue

What You'll Build

An AI-powered data pipeline development system that uses Continue’s AI agent with dlt MCP to inspect pipeline execution, retrieve schemas, analyze datasets, and debug load errors - all through simple natural language prompts

Prerequisites

Before starting, ensure you have:

Continue account with Hub access
- Read: Understanding Configs — How to get started with Hub configs
Python 3.8+ installed locally
A dlt pipeline project (or create one during this guide)
Basic understanding of data pipelines

For all options, first:

Install Continue CLI

npm i -g @continuedev/cli

Install dlt

pip install dlt

To use agents in headless mode, you need a Continue API key.

dlt MCP Workflow Options

🚀 Fastest Path to Success

Skip the manual setup and use our pre-built dlt Agent that includes the dlt MCP and optimized data pipeline workflows for more consistent results. You can remix this agent to customize it for your specific needs.

After ensuring you meet the Prerequisites above, you have two paths to get started:

⚡ Quick Start (Recommended)
🛠️ Manual Setup

Load the Pre-Built Agent

Navigate to your pipeline project directory and run:

cn --agent continuedev/dlt-agent

This agent includes:

dlt MCP pre-configured and ready to use
Pipeline-focused rules for data engineering best practices

Run Your First Pipeline Inspection

Start with a comprehensive pipeline check:

# TUI mode
Inspect the execution of my dlt pipeline and summarize the load info, including timing and file sizes.

That’s it! The agent handles everything automatically.

Why Use the Agent? The pre-built dlt Agent provides consistent pipeline development workflows and handles MCP configuration automatically, making it easier to get started with AI-powered data engineering. You can remix and customize this agent later to fit your team’s specific workflow.

Agent Requirements

To use the pre-built dlt Agent, you need either:

Continue CLI Pro Plan with the models add-on, OR
Your own API keys added to Continue Mission Control secrets (same as manual setup)

The agent will automatically detect and use your configuration along with the pre-configured dlt MCP for pipeline operations.

dlt MCP vs dlt+ MCP

Understanding the Difference

dlt MCP is focused on local pipeline development and inspection. It provides tools to:

Inspect pipeline execution and load information
Retrieve schema metadata from your local pipelines
Query dataset records from destination databases
Analyze load errors, timings, and file sizes

dlt+ MCP extends these capabilities with cloud-based features for production deployments:

Connect to dlt+ Projects and manage deployments
Monitor pipeline runs across multiple environments
Access centralized logging and observability
Collaborate with team members on pipeline development

For local development and getting started, dlt MCP is the right choice. Consider dlt+ MCP when you need production deployment features and team collaboration.

Pipeline Development Recipes

Now you can use natural language prompts to develop and debug your dlt pipelines. The Continue agent automatically calls the appropriate dlt MCP tools.

You can add prompts to your agent’s configuration for easy access in future sessions. Go to your agent in the Continue Mission Control, click Edit, and add prompts under the Prompts section.

Where to run these workflows:

IDE Extensions: Use Continue in VS Code, JetBrains, or other supported IDEs
Terminal (TUI mode): Run cn to enter interactive mode, then type your prompts
CLI (headless mode): Use cn -p "your prompt" for headless commands

Test in Plan Mode First: Before running pipeline operations that might make changes, test your prompts in plan mode (see the Plan Mode Guide; press Shift+Tab to switch modes in TUI/IDE). This shows you what the agent will do without executing it.To run any of the example prompts below in headless mode, use cn -p "prompt"

About the —auto flag: The --auto flag enables tools to run continuously without manual confirmation. This is essential for headless mode where the agent needs to execute multiple tools automatically to complete tasks like pipeline inspection, schema retrieval, and error analysis.

Pipeline Inspection

Inspect Pipeline Execution

Review pipeline execution details including load timing and file sizes.Prompt:

Inspect my dlt pipeline execution and provide a summary of the load info.
Show me the timing breakdown and file sizes for each table.

Schema Management

Retrieve Schema Metadata

Get detailed schema information for your pipeline’s tables.Prompt:

Show me the schema for my users table including all columns,
data types, and any constraints.

Data Exploration

Query Dataset Records

Retrieve and analyze records from your destination database.Prompt:

Get the last 10 records from my orders table and show me
the distribution of order statuses.

Error Debugging

Analyze Load Errors

Investigate and understand pipeline load errors.Prompt:

Check for any load errors in my last pipeline run. If there are errors,
explain what went wrong and suggest fixes.

Pipeline Creation

Build New Pipeline

Create a new dlt pipeline from an API or data source.Prompt:

Help me create a new dlt pipeline that loads data from the
JSONPlaceholder API users endpoint into DuckDB.

Schema Evolution

Handle Schema Changes

Review and manage schema evolution in your pipelines.Prompt:

Check if my pipeline schema has evolved since the last run.
Show me what columns were added or modified.

Continuous Data Pipelines with GitHub Actions

This example demonstrates a Continuous AI workflow where data pipeline validation runs automatically in your CI/CD pipeline in headless mode using the dlt Assistant agent. Consider remixing this agent to add your organization’s specific validation rules.

Add GitHub Secrets

Navigate to Repository Settings → Secrets and variables → Actions and add:

CONTINUE_API_KEY: Your Continue API key from hub.continue.dev/settings/api-keys
Any required database credentials for your destination

The workflow uses the pre-built dlt Agent with --agent continuedev/dlt-agent. This agent comes pre-configured with the dlt MCP and optimized rules for pipeline operations. You can remix this agent to customize the validation rules and prompts for your specific pipeline requirements.

Create Workflow File

This workflow automatically validates your dlt data pipelines on pull requests using the Continue CLI in headless mode. It inspects pipeline schemas, checks for errors, and posts a summary report as a PR comment. The workflow can also be triggered manually via workflow_dispatch. Create .github/workflows/dlt-pipeline-validation.yml in your repository:

name: Data Pipeline Validation with dlt MCP

on:
  pull_request:
    branches: [main]
  workflow_dispatch:

jobs:
  validate-pipeline:
    runs-on: ubuntu-latest
    env:
      CONTINUE_API_KEY: ${{ secrets.CONTINUE_API_KEY }}

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "18"

      - name: Install dlt
        run: |
          pip install dlt
          echo "✅ dlt installed"

      - name: Install Continue CLI
        run: |
          npm install -g @continuedev/cli
          echo "✅ Continue CLI installed"

      - name: Validate Pipeline Schema
        run: |
          echo "🔍 Validating pipeline schema..."
          cn --agent continuedev/dlt-agent \
             -p "Inspect the pipeline schema and verify all required tables
                 and columns are present. Flag any missing or unexpected changes." \
             --auto

      - name: Check Pipeline Health
        run: |
          echo "📊 Checking pipeline health..."
          cn --agent continuedev/dlt-agent \
             -p "Analyze the last pipeline run for errors or warnings.
                 Report any issues that need attention." \
             --auto

      - name: Comment Pipeline Report on PR
        if: always() && github.event_name == 'pull_request'
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          REPORT=$(cn --agent continuedev/dlt-agent \
             -p "Generate a concise summary (200 words or less) of:
                 - Pipeline schemas and row counts
                 - Any load errors or warnings
                 - Performance metrics (timing, file sizes)
                 - Recommended improvements" \
             --auto)

          gh pr comment ${{ github.event.pull_request.number }} --body "$REPORT"

The dlt MCP works with your local pipeline state. Make sure your CI environment has access to the necessary pipeline configuration and credentials.

Pipeline Development Best Practices

Implement automated pipeline quality checks using Continue’s rule system. See the Rules deep dive for authoring tips.

Schema Validation

"Before committing pipeline changes, verify the schema
matches expectations and flag any unexpected modifications."

Error Handling

"When load errors occur, analyze the error details and
suggest specific code fixes to handle the data issues."

Performance Monitoring

"Track pipeline execution times and file sizes. Alert if
performance degrades significantly from baseline."

Data Quality

"After each pipeline run, validate row counts and check for
null values in critical columns."

Troubleshooting

Pipeline Not Found

"Check if there's a dlt pipeline in the current directory.
If not, help me initialize a new pipeline."

Destination Connection Issues

"Verify the destination connection and credentials for my pipeline.
Test the connection and report any issues."

Schema Inference Problems

Verification Steps:

dlt MCP is installed via Continue Mission Control
Pipeline directory is accessible
Destination database credentials are configured
Pipeline has been run at least once

What You’ve Built

After completing this guide, you have a complete AI-powered data pipeline development system that: ✅ Uses natural language — Simple prompts instead of complex pipeline commands ✅ Debugs automatically — AI analyzes errors and suggests fixes ✅ Runs continuously — Automated validation in CI/CD pipelines ✅ Ensures quality — Pipeline checks prevent bad data from shipping

Continuous AI

Your data pipeline workflow now operates at Level 2 Continuous AI - AI handles routine pipeline inspection and debugging with human oversight through review and approval of changes.

Next Steps

Inspect your first pipeline - Try the pipeline inspection prompt on your current project
Debug load errors - Use the error analysis prompt to fix any issues
Set up CI pipeline - Add the GitHub Actions workflow to your repo
Create new pipelines - Use AI to scaffold new data sources
Monitor performance - Track pipeline execution metrics over time

Additional Resources

dlt Documentation

Complete dlt platform documentation

Continue Mission Control

Explore more MCP integrations and agents

dlt Blog: MCP Deep Dive

Learn about AI agents, MCP, and Continue integration

MCP Concepts

Deep dive into dlt MCP integration

Guides

Cookbooks

What You'll Build

​Prerequisites

​dlt MCP Workflow Options

🚀 Fastest Path to Success

​dlt MCP vs dlt+ MCP

Understanding the Difference

​Pipeline Development Recipes

​Pipeline Inspection

Inspect Pipeline Execution

​Schema Management

Retrieve Schema Metadata

​Data Exploration

Query Dataset Records

​Error Debugging

Analyze Load Errors

​Pipeline Creation

Build New Pipeline

​Schema Evolution

Handle Schema Changes

​Continuous Data Pipelines with GitHub Actions

​Add GitHub Secrets

​Create Workflow File

​Pipeline Development Best Practices

Schema Validation

Error Handling

Performance Monitoring

Data Quality

​Troubleshooting

​Pipeline Not Found

​Destination Connection Issues

​Schema Inference Problems

​What You’ve Built

Continuous AI

​Next Steps

​Additional Resources

dlt Documentation

Continue Mission Control

dlt Blog: MCP Deep Dive

MCP Concepts

Prerequisites

dlt MCP Workflow Options

dlt MCP vs dlt+ MCP

Pipeline Development Recipes

Pipeline Inspection

Schema Management

Data Exploration

Error Debugging

Pipeline Creation

Schema Evolution

Continuous Data Pipelines with GitHub Actions

Add GitHub Secrets

Create Workflow File

Pipeline Development Best Practices

Troubleshooting

Pipeline Not Found

Destination Connection Issues

Schema Inference Problems

What You’ve Built

Next Steps

Additional Resources