+36% Accuracy
    3x Improvement
    Multimodal AI
    Few-Shot Learning

    From 18% to 54%: How Few-Shot Prompting Tripled Our Extraction Accuracy

    Your AI can't read technical documents? Neither could ours—until we combined computer vision preprocessing with strategic few-shot examples.

    Technical document extraction with few-shot prompting

    The Problem: Your AI Can't Read Technical Documents

    You've tried GPT-4 Vision on your technical drawings. The results? Garbage.

    Standard AI approaches fail on specialized documents because:

    • No domain knowledge: The model doesn't understand switchboard layouts, tier structures, or component conventions
    • No visual anchors: It can't distinguish what's important from what's noise
    • Inconsistent outputs: Every response has a different format, breaking your downstream processing
    • Missed sections: Critical data gets overlooked entirely

    Our client faced exactly this. They needed to extract tier numbers, widths, ventilation status, and component values from electrical switchboard drawings.

    First attempt with zero-shot prompting: 18.2% accuracy.

    That's not a typo. The AI got it right less than 1 in 5 times. Production-ready? Not even close.

    What if you could triple that accuracy without changing models or increasing costs?


    What We Built: Few-Shot + Computer Vision Pipeline

    We combined two techniques that individually help, but together create something powerful:

    The Approach

    TechniqueWhat It DoesImpact
    CV PreprocessingHighlights key areas before LLM sees the imageFocuses attention
    Few-Shot ExamplesShows the model exactly what success looks likeTeaches patterns
    Structured OutputEnforces JSON schema with PydanticGuarantees valid data

    The Results

    MetricZero-ShotFew-Shot (4 Examples)Improvement
    Exact Match Rate18.2%54.5%+36.3%
    Field-Level Accuracy78.7%92.6%+13.9%
    Tier Count Accuracy72.7%100.0%+27.3%

    The tier count went from 72.7% to 100%. The zero-shot model fundamentally misunderstood the document structure. With examples, it got it right every single time.


    Step 1: Computer Vision Preprocessing

    Before the LLM ever sees your document, prepare it visually.

    What We Did

    We trained a lightweight detection model (Roboflow) to identify key components:

    • MCMP plates: Yellow overlay + green border
    • Metering units: Blue border outline

    Why This Works

    The LLM receives a "cheat sheet" image. Instead of scanning the entire complex drawing, it knows exactly where to focus.

    Think of it like highlighting a textbook before an exam. The content is the same, but the important parts are marked.

    The Code

    # Run CV model on original image
    detections = cv_model.predict(original_image)
    
    # Create highlighted overlay
    highlighted = original_image.copy()
    for detection in detections:
        if detection.class_name == "mcmp_plate":
            draw_overlay(highlighted, detection.bbox, color="yellow", border="green")
        elif detection.class_name == "metering_unit":
            draw_border(highlighted, detection.bbox, color="blue")
    
    # Now send highlighted image to LLM
    

    Step 2: Few-Shot Examples That Actually Work

    The difference between good and great few-shot prompting is example selection.

    Bad Examples

    • All similar complexity
    • Same document type
    • No edge cases

    Good Examples (What We Used)

    • Example 1: Simple layout, minimal components
    • Example 2: Heavy ventilation, complex tier structure
    • Example 3: Mixed components, unusual widths
    • Example 4: Edge case with missing data

    The Conversation Structure

    messages = [
        # System prompt with domain expertise
        SystemMessage(content="""
            You are an expert at analyzing electrical switchboard drawings.
            Extract: tier_count, widths, ventilation, components.
            Output valid JSON matching the provided schema.
        """),
    
        # Example 1: Simple case
        HumanMessage(content=[
            {"type": "text", "text": "Analyze this switchboard drawing."},
            {"type": "image_url", "image_url": {"url": example_1_url}}
        ]),
        AIMessage(content=json.dumps(example_1_output)),
    
        # Example 2: Complex ventilation
        HumanMessage(content=[
            {"type": "image_url", "image_url": {"url": example_2_url}}
        ]),
        AIMessage(content=json.dumps(example_2_output)),
    
        # Example 3: Mixed components
        HumanMessage(content=[
            {"type": "image_url", "image_url": {"url": example_3_url}}
        ]),
        AIMessage(content=json.dumps(example_3_output)),
    
        # Example 4: Edge case
        HumanMessage(content=[
            {"type": "image_url", "image_url": {"url": example_4_url}}
        ]),
        AIMessage(content=json.dumps(example_4_output)),
    
        # Now the actual task
        HumanMessage(content=[
            {"type": "text", "text": "Analyze this new drawing."},
            {"type": "image_url", "image_url": {"url": actual_image_url}}
        ])
    ]
    

    Key insight: 4 diverse examples beat 20 similar ones. Quality and coverage matter more than quantity.


    Step 3: Enforce Structure with Pydantic

    Even with great examples, LLMs can output malformed JSON. We use Pydantic to guarantee valid outputs.

    The Schema

    from pydantic import BaseModel
    from typing import List, Optional
    
    class ComponentSpec(BaseModel):
        type: str
        value: Optional[float]
        unit: str
    
    class TierData(BaseModel):
        tier_number: int
        width_mm: int
        has_ventilation: bool
        components: List[ComponentSpec]
    
    class SwitchboardExtraction(BaseModel):
        tier_count: int
        tiers: List[TierData]
        total_width_mm: int
        extraction_confidence: float
    

    Structured Output with LangChain

    from langchain_google_genai import ChatGoogleGenerativeAI
    
    llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro")
    structured_llm = llm.with_structured_output(SwitchboardExtraction)
    
    result = structured_llm.invoke(messages)
    # result is guaranteed to be a valid SwitchboardExtraction object
    

    No more JSON parsing errors. No more missing fields. No more type mismatches.


    Why This Works: The Psychology of AI Learning

    1. Pattern Recognition Over Instruction Following

    Telling an LLM "extract tier numbers" is vague. Showing it 4 examples of tier extraction teaches the pattern implicitly.

    Analogy: Teaching a child to ride a bike by showing them, not by describing the physics of balance.

    2. Visual Priming Reduces Cognitive Load

    The CV preprocessing acts like selective attention. The model doesn't waste capacity parsing irrelevant diagram elements.

    Analogy: A highlighted textbook vs. a wall of unmarked text.

    3. Structured Output Eliminates Variability

    Without schema enforcement, every response is a surprise. With Pydantic, you know exactly what you're getting.

    Analogy: A tax form vs. a blank page that says "describe your income."


    What You Can Apply Today

    For Any Document Extraction Project

    1. Don't start with the LLM. Use CV to preprocess and highlight key regions first.

    2. Build a diverse example set. Cover edge cases, not just happy paths. 4-6 high-quality examples beats 20 mediocre ones.

    3. Enforce structure. Use Pydantic, Zod, or JSON Schema. Never trust free-form LLM output in production.

    4. Measure properly. Track exact match rate AND field-level accuracy. They tell different stories.

    When to Use This Approach

    Use CaseExpected Improvement
    Technical drawings30-40% accuracy boost
    Medical forms20-35% accuracy boost
    Financial documents25-40% accuracy boost
    Handwritten forms15-30% accuracy boost

    The Technical Stack

    ComponentToolWhy
    LLMGoogle Gemini 2.5 ProBest multimodal performance for technical docs
    CV ModelRoboflow (custom trained)Fast inference, easy annotation
    FrameworkLangChainStructured output support
    ValidationPydantic v2Strict mode, great error messages
    DeploymentFastAPI + ModalServerless, auto-scaling

    Results Summary

    BeforeAfterImpact
    18.2% exact match54.5% exact match3x improvement
    Inconsistent JSONGuaranteed schemaZero parsing errors
    Manual review requiredAutomated pipelineHours saved daily
    Prototype qualityProduction readyDeployed to client

    This approach transformed an experimental failure into a production system processing hundreds of documents daily.

    Struggling with Document Extraction Accuracy?

    We've boosted accuracy by 30%+ across multiple industries. Let's analyze your documents and show you what's possible.

    Free consultation • Send us sample documents • No commitment