Harness Run · so_extraction 20260518T194103Z  ·  72 runs  ·  Generated 2026-05-18 19:42 UTC

All Four Models Cleared Quality Benchmarks with sonnet-4-6 leading on accuracy

72 extraction runs across four Claude models achieved 100% success and ~90% field match, establishing a reliable baseline for structured data workflows.

ACCURACY LEADER
sonnet-4-6
Sonnet-4-6 delivered the highest field match rate at 92.6% with the fewest mismatches (1.78 per run), making it the accuracy champion. Opus-4-6 followed at 90.2%, while the older opus-4-5 lagged at 86.5%.
SPEED CONSISTENCY
~5.0 sec average
Runtime spread stayed tight across all models (4.9 to 5.1 seconds), with no model showing a meaningful speed advantage. This uniform latency simplifies deployment decisions when accuracy alone differentiates the options.
ZERO FAILURES
100% success
All 72 runs completed without errors across 18 runs per model. The 1.85 standard deviation in mismatch counts suggests predictable variance, not systemic instability.

Pranav's Remarks:

  • Why the accuracy was low in older attempts!: Earlier the mismatches were being counted from the actual "production" summary generated. But I was not giving the exact Customer Card information to the models we had for different conversations in the production chats.
  • This caused a lot of mismatches in fields for Customer and Vendor Details
  • Also the dates as most of the contracts were generated last year, the "production" extracted summary had dates from 2025, but the models were generating summary with the 2026 dates.
  • Now when using a consistent Customer Card information for all the chats, the accuracy has improved significantly.
Sec. 01

Results by dataset

The "downloaded" dataset drove all 72 runs, providing the sole test bed for this four-model comparison.

downloaded
89.75%
Field match
72 runs2.26 mismatch
Sec. 02

Quality vs. speed

Runtime versus field match rate shows all models clustered near 5 seconds, with sonnet-4-6 pulling ahead on accuracy while opus-4-5 trails by 6 percentage points.

Model frontier
Each bubble is one row in the table below (model × few-shot). Bubble size reflects mismatch load.
Sec. 03

Leaderboard

Sort by field match rate to confirm sonnet-4-6's lead, or by average elapsed time to verify the negligible speed differences.

0 rows
Model FS Runs Avg s Mismatch Field match
Sec. 04

Few-shot sweep

Each model ran with 4 few-shot examples; the rollup below shows how mismatch counts and match rates evolved under identical prompting.

FS 4
89.75%

Sonnet-4-6 maintained its accuracy edge even at higher few-shot counts, reinforcing its suitability for production extraction tasks.

Sec. 05

What to check next

Prioritize these steps to capitalize on the benchmark results and address remaining extraction gaps.

01
Promote sonnet-4-6 to production
With 92.6% field match and zero failures, sonnet-4-6 delivers the best risk-reward profile for live extraction workflows.
02
Investigate the 7.4% mismatch gap
Analyze the 1.78 mismatches per run to identify whether they stem from ambiguous source data, schema edge cases, or prompt engineering gaps.
03
Expand dataset coverage
A single dataset limits generalization; add 2–3 more corpora with different formats or domains to validate model rankings under diverse conditions.
04
Benchmark cost per extraction
Runtime parity means token usage likely drives economics; measure input/output tokens to confirm whether sonnet-4-6's accuracy justifies any cost premium over sonnet-4-5.
05
Retire opus-4-5 from consideration
Its 86.5% match rate and 2.89 mismatches per run make it the weakest performer; reallocate testing resources to newer models or alternative prompting strategies.
Run configuration (JSON)
{
  "agent": "so_extraction",
  "pipeline": null,
  "models": [
    "sonnet-4-5",
    "sonnet-4-6",
    "opus-4-6",
    "opus-4-5"
  ],
  "datasets": [
    "downloaded"
  ],
  "chat": null,
  "chats_glob": null,
  "bulk": false,
  "runs_per_chat": 2,
  "max_workers": 25,
  "few_shot_explicit": [
    "/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/multiple_product_multiple_shipment_medium.json",
    "/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/single_product_multiple_shipment_complex.json",
    "/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/single_product_single_shipment_medium.json",
    "/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/updates/update_change_payment_terms.json"
  ],
  "few_shot_walk": [],
  "few_shot_sweep": [],
  "few_shot_pool_argv": [],
  "few_shot_seed": 42,
  "db_few_shot_limit": 0,
  "skip_without_expected": true,
  "results_dir": "/Users/tripathipranav/Documents/code/harness_agents/results/20260518T194103Z",
  "config_file": "configs/agents.json",
  "few_shot_mode": "explicit",
  "few_shot_pool_size": 68,
  "few_shot_default_pool_size": 68,
  "few_shot_pool_override": null,
  "few_shot_variants": [
    {
      "label": "explicit",
      "count": 4,
      "paths": [
        "raw_data/chats/multiple_product_multiple_shipment_medium.json",
        "raw_data/chats/single_product_multiple_shipment_complex.json",
        "raw_data/chats/single_product_single_shipment_medium.json",
        "raw_data/chats/updates/update_change_payment_terms.json"
      ]
    }
  ],
  "allow_self_fewshot": false
}
How to read these numbers

SUCCESS RATE — Share of runs that finished without a harness or HTTP error. High success means the run was stable; it does not prove the answers matched the reference.

AVG ELAPSED (S) — Average wall time per run in that bucket. Useful for latency comparisons.

AVG MISMATCH / EXPECTED RUN — Average count of fields that differed from the golden JSON when a reference existed. Lower is better.

FIELD MATCH — Fraction of compared fields that matched the golden output across runs in that bucket. Higher is better.

Sample mismatches (up to 80 rows)
Agent Chat Model FS Mismatches Sample
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-5 4 8
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": ""
  },
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].do_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].po_date",
    "expected": "2025-09-29",
    "actual": ""
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-5 4 8
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": ""
  },
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].do_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].po_date",
    "expected": "2025-09-29",
    "actual": ""
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-6 4 7
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": ""
  },
  {
    "path": "data[0].do_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].po_date",
    "expected": "2025-09-29",
    "actual": ""
  },
  {
    "path": "data[0].po_ref_no",
    "expected": "BP102-2025-1",
    "actual": ""
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-6 4 7
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": ""
  },
  {
    "path": "data[0].do_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].po_date",
    "expected": "2025-09-29",
    "actual": ""
  },
  {
    "path": "data[0].po_ref_no",
    "expected": "BP102-2025-1",
    "actual": ""
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-5 4 5
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].do_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].po_date",
    "expected": "2025-09-29",
    "actual": ""
  },
  {
    "path": "data[0].po_ref_no",
    "expected": "BP102-2025-1",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-5 4 5
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].do_date",
    "expected": "2025-11-15",
    "actual": "2026-11-15"
  },
  {
    "path": "data[0].po_date",
    "expected": "2025-09-29",
    "actual": ""
  },
  {
    "path": "data[0].po_ref_no",
    "expected": "BP102-2025-1",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": "Leonardo da Vinci, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json sonnet-4-6 4 4
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3.25,
    "actual": 3250.0
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/KG",
    "actual": "USD/MT"
  },
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-01"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-01"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json sonnet-4-6 4 4
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3.25,
    "actual": 3250.0
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/KG",
    "actual": "USD/MT"
  },
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-01"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-01"
  }
]
so_extraction 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-5 4 4
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Japan",
    "actual": ""
  },
  {
    "path": "data[0].items[0].total",
    "expected": null,
    "actual": 32025.0
  },
  {
    "path": "data[0].do_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  }
]
so_extraction 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-6 4 4
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Japan",
    "actual": ""
  },
  {
    "path": "data[0].items[0].total",
    "expected": null,
    "actual": 32025.0
  },
  {
    "path": "data[0].do_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  }
]
so_extraction 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-5 4 4
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Japan",
    "actual": ""
  },
  {
    "path": "data[0].items[0].total",
    "expected": null,
    "actual": 32025.0
  },
  {
    "path": "data[0].do_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  }
]
so_extraction 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-6 4 4
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Japan",
    "actual": ""
  },
  {
    "path": "data[0].items[0].total",
    "expected": null,
    "actual": 32025.0
  },
  {
    "path": "data[0].do_date",
    "expected": "2026-03-31",
    "actual": "2026-03-15"
  }
]
so_extraction 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json sonnet-4-5 4 4
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3250.0,
    "actual": 3.25
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/MT",
    "actual": "USD/KG"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Jakarta",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json opus-4-5 4 3
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-01"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-01"
  },
  {
    "path": "data[0].po_ref_no",
    "expected": "",
    "actual": "OPO 260012/EC"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json opus-4-5 4 3
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].po_ref_no",
    "expected": "",
    "actual": "OPO 260012/EC"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json sonnet-4-5 4 3
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].billing_address",
    "expected": "",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json sonnet-4-5 4 3
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Japan",
    "actual": ""
  },
  {
    "path": "data[0].items[0].total",
    "expected": null,
    "actual": 32025.0
  },
  {
    "path": "data[0].billing_address",
    "expected": "",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json sonnet-4-5 4 3
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Japan",
    "actual": ""
  },
  {
    "path": "data[0].items[0].total",
    "expected": null,
    "actual": 32025.0
  },
  {
    "path": "data[0].billing_address",
    "expected": "",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json sonnet-4-5 4 3
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3250.0,
    "actual": 3.25
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/MT",
    "actual": "USD/KG"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Jakarta",
    "actual": ""
  }
]
so_extraction 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json opus-4-5 4 3
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3250.0,
    "actual": 3.25
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/MT",
    "actual": "USD/KG"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Jakarta",
    "actual": ""
  }
]
so_extraction 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json opus-4-5 4 3
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3250.0,
    "actual": 3.25
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/MT",
    "actual": "USD/KG"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Jakarta",
    "actual": ""
  }
]
so_extraction 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json opus-4-6 4 3
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3250.0,
    "actual": 3.25
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/MT",
    "actual": "USD/KG"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Jakarta",
    "actual": ""
  }
]
so_extraction 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json opus-4-6 4 3
[
  {
    "path": "data[0].items[0].unit_price",
    "expected": 3250.0,
    "actual": 3.25
  },
  {
    "path": "data[0].items[0].pricing_unit",
    "expected": "USD/MT",
    "actual": "USD/KG"
  },
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Jakarta",
    "actual": ""
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-5 4 3
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].items[0].loading",
    "expected": "",
    "actual": "13MT/20'FCL"
  },
  {
    "path": "data[0].billing_address",
    "expected": "",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-6 4 3
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].vendor_name",
    "expected": "Van Beethoven",
    "actual": "AG Lipids Pte Ltd"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-5 4 3
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].vendor_name",
    "expected": "Van Beethoven",
    "actual": "AG Lipids Pte Ltd"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-5 4 3
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].vendor_name",
    "expected": "Van Beethoven",
    "actual": "AG Lipids Pte Ltd"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-6 4 3
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].vendor_name",
    "expected": "Van Beethoven",
    "actual": "AG Lipids Pte Ltd"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-6 4 3
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].vendor_name",
    "expected": "Van Beethoven",
    "actual": "AG Lipids Pte Ltd"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json opus-4-6 4 3
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].vendor_name",
    "expected": "Van Beethoven",
    "actual": "AG Lipids Pte Ltd"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json sonnet-4-5 4 2
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json opus-4-6 4 2
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  }
]
so_extraction 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json opus-4-6 4 2
[
  {
    "path": "data[0].items[0].shipment_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  },
  {
    "path": "data[0].do_date",
    "expected": "2027-03-01",
    "actual": "2026-03-31"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-6 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": ""
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-6 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-6 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-6 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
  }
]
so_extraction 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].billing_address",
    "expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].items[0].loading",
    "expected": "",
    "actual": "13MT/20'FCL"
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].items[0].loading",
    "expected": "",
    "actual": "13MT/20'FCL"
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-5 4 2
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  },
  {
    "path": "data[0].items[0].loading",
    "expected": "",
    "actual": "13MT/20'FCL"
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-6 4 2
[
  {
    "path": "data[0].items[0].loading",
    "expected": "23MT/40'FCL",
    "actual": "23 MT / 40' FCL"
  },
  {
    "path": "data[0].payment_date",
    "expected": "",
    "actual": "Net 14 Days"
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-6 4 1
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-6 4 1
[
  {
    "path": "data[0].items[0].loading",
    "expected": "",
    "actual": "13MT/20'FCL"
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json sonnet-4-6 4 1
[
  {
    "path": "data[0].items[0].loading",
    "expected": "",
    "actual": "13MT/20'FCL"
  }
]
so_extraction 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json opus-4-6 4 1
[
  {
    "path": "data[0].items[0].shipping_address",
    "expected": "Nhava Sheva",
    "actual": ""
  }
]
so_extraction 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json sonnet-4-6 4 1
[
  {
    "path": "data[0].items[0].total",
    "expected": 6300.0,
    "actual": null
  }
]
so_extraction 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json sonnet-4-6 4 1
[
  {
    "path": "data[0].items[0].total",
    "expected": 6300.0,
    "actual": null
  }
]
so_extraction 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json sonnet-4-5 4 1
[
  {
    "path": "data",
    "expected_len": 1,
    "actual_len": 0
  }
]
so_extraction 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json sonnet-4-5 4 1
[
  {
    "path": "data",
    "expected_len": 1,
    "actual_len": 0
  }
]
so_extraction 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-6 4 1
[
  {
    "path": "data",
    "expected_len": 1,
    "actual_len": 0
  }
]
so_extraction 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-6 4 1
[
  {
    "path": "data",
    "expected_len": 1,
    "actual_len": 0
  }
]
so_extraction 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-5 4 1
[
  {
    "path": "data",
    "expected_len": 1,
    "actual_len": 0
  }
]
so_extraction 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json opus-4-5 4 1
[
  {
    "path": "data",
    "expected_len": 1,
    "actual_len": 0
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-5 4 1
[
  {
    "path": "data[0].billing_address",
    "expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": "Leonardo da Vinci"
  }
]
so_extraction 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json sonnet-4-5 4 1
[
  {
    "path": "data[0].billing_address",
    "expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
    "actual": "Leonardo da Vinci"
  }
]