Ten models processed downloaded dataset prompts in under 13 seconds on average, with field match rates ranging from 80% to 92%.
This benchmark evaluated structured extraction performance on a single dataset containing prompts with expected field outputs.
The quality-speed frontier shows clear clustering: fastest models (sonnet and opus families) complete runs in under 7 seconds, while top-accuracy models balance speed and precision.
All ten models achieved 100% success rates with consistent performance across four few-shot configurations, processing 9 runs each.
| Model | FS | Runs | Avg s | Mismatch | Field match |
|---|
Few-shot performance remained stable across all models, with no configuration showing systematic advantages.
The 2.1 mismatch standard deviation indicates **minimal few-shot sensitivity**, suggesting prompt engineering may have limited impact on these models' extraction capabilities.
Focus efforts on closing the 13-point gap between top and bottom performers while preserving sub-second latency advantages.
{
"agent": "so_extraction",
"pipeline": null,
"models": [
"sonnet-4-5",
"sonnet-4-6",
"opus-4-5",
"opus-4-6",
"openai:4.1",
"openai:5.2",
"openai:5-mini",
"openai:5.4",
"gemini:gemini-2.5-pro",
"gemini:gemini-2.5-flash"
],
"datasets": [
"downloaded"
],
"chat": null,
"chats_glob": null,
"bulk": false,
"runs_per_chat": 1,
"max_workers": 20,
"few_shot_explicit": [
"/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/multiple_product_multiple_shipment_medium.json",
"/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/single_product_multiple_shipment_complex.json",
"/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/single_product_single_shipment_medium.json",
"/Users/tripathipranav/Documents/code/harness_agents/raw_data/chats/updates/update_change_payment_terms.json"
],
"few_shot_walk": [],
"few_shot_sweep": [],
"few_shot_pool_argv": [],
"few_shot_seed": 42,
"db_few_shot_limit": 0,
"skip_without_expected": true,
"results_dir": "/Users/tripathipranav/Documents/code/harness_agents/results/20260518T192830Z",
"config_file": "configs/agents.json",
"few_shot_mode": "explicit",
"few_shot_pool_size": 68,
"few_shot_default_pool_size": 68,
"few_shot_pool_override": null,
"few_shot_variants": [
{
"label": "explicit",
"count": 4,
"paths": [
"raw_data/chats/multiple_product_multiple_shipment_medium.json",
"raw_data/chats/single_product_multiple_shipment_complex.json",
"raw_data/chats/single_product_single_shipment_medium.json",
"raw_data/chats/updates/update_change_payment_terms.json"
]
}
],
"allow_self_fewshot": false
}SUCCESS RATE — Share of runs that finished without a harness or HTTP error. High success means the run was stable; it does not prove the answers matched the reference.
AVG ELAPSED (S) — Average wall time per run in that bucket. Useful for latency comparisons.
AVG MISMATCH / EXPECTED RUN — Average count of fields that differed from the golden JSON when a reference existed. Lower is better.
FIELD MATCH — Fraction of compared fields that matched the golden output across runs in that bucket. Higher is better.
| Agent | Chat | Model | FS | Mismatches | Sample |
|---|---|---|---|---|---|
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:5.4 | 4 | 12 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3.05,
"actual": null
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": ""
},
{
"path": "data[0].items[0].ship_term",
"expected": "EXW",
"actual": ""
},
{
"path": "data[0].items[0].delivery_terms",
"expected": "EXW",
"actual": ""
},
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": ""
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:5-mini | 4 | 8 | [
{
"path": "data[0].items[0].delivery_terms",
"expected": "EXW",
"actual": "EXW Japan"
},
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
},
{
"path": "data[0].do_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].po_ref_no",
"expected": "4520000944",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | opus-4-5 | 4 | 8 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40' FCL"
},
{
"path": "data[0].do_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].po_date",
"expected": "2025-09-29",
"actual": ""
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | opus-4-5 | 4 | 8 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40' FCL"
},
{
"path": "data[0].do_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].po_date",
"expected": "2025-09-29",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:5-mini | 4 | 7 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].delivery_terms",
"expected": "CIF Busan",
"actual": ""
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:5-mini | 4 | 7 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].delivery_terms",
"expected": "CIF Busan",
"actual": "CIF"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:5-mini | 4 | 6 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": "USD/kg"
},
{
"path": "data[0].items[0].delivery_terms",
"expected": "EXW",
"actual": "EXW cargo ready by March 2026"
},
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": ""
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | openai:5.4 | 4 | 6 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
},
{
"path": "data[0].items[0].total",
"expected": 29250.0,
"actual": null
},
{
"path": "data[0].po_ref_no",
"expected": "PO-IMP-BIB-2601-017",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:5.2 | 4 | 6 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].packing",
"expected": "25kg printed paper bag",
"actual": "25kg printed paper bags"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:5.2 | 4 | 5 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].items[0].total",
"expected": 5850.0,
"actual": 5.8500000000000005
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].shipping_method",
"expected": "Collection Against OPO 260012/EC",
"actual": ""
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | gemini:gemini-2.5-flash | 4 | 5 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": "USD/kg"
},
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Japan",
"actual": ""
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
},
{
"path": "data[0].do_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:5.2 | 4 | 5 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32.025
},
{
"path": "data[0].do_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].shipping_method",
"expected": "Collection",
"actual": "Collection against PO 4520000944"
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | openai:5-mini | 4 | 5 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
},
{
"path": "data[0].po_ref_no",
"expected": "PO-IMP-BIB-2601-017",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | sonnet-4-6 | 4 | 5 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].po_date",
"expected": "2025-09-29",
"actual": ""
},
{
"path": "data[0].po_ref_no",
"expected": "BP102-2025-1",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | sonnet-4-5 | 4 | 5 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].do_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].po_date",
"expected": "2025-09-29",
"actual": ""
},
{
"path": "data[0].po_ref_no",
"expected": "BP102-2025-1",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": "Leonardo da Vinci, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea"
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:5.4 | 4 | 5 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].billing_address",
"expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | gemini:gemini-2.5-flash | 4 | 5 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40\tFCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
},
{
"path": "data[0].delivery_terms",
"expected": "CIF Busan",
"actual": "CIF"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | sonnet-4-5 | 4 | 5 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].do_date",
"expected": "2025-11-15",
"actual": "2026-11-15"
},
{
"path": "data[0].po_date",
"expected": "2025-09-29",
"actual": ""
},
{
"path": "data[0].po_ref_no",
"expected": "BP102-2025-1",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": "Leonardo da Vinci, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:5.2 | 4 | 5 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
},
{
"path": "data[0].shipping_method",
"expected": "",
"actual": "Unknown"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | sonnet-4-6 | 4 | 4 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3.25,
"actual": 3250.0
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": "USD/MT"
},
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-01"
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-01"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:5.4 | 4 | 4 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].items[0].total",
"expected": 5850.0,
"actual": null
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | opus-4-5 | 4 | 4 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Japan",
"actual": ""
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
},
{
"path": "data[0].do_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | opus-4-6 | 4 | 4 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Japan",
"actual": ""
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
},
{
"path": "data[0].do_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | openai:5.2 | 4 | 4 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:5-mini | 4 | 4 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": "USD/kg"
},
{
"path": "data[0].items[0].delivery_terms",
"expected": "EXW",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].delivery_terms",
"expected": "EXW",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | gemini:gemini-2.5-pro | 4 | 4 | [
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
},
{
"path": "data[0].shipping_method",
"expected": "",
"actual": "Unknown"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:4.1 | 4 | 4 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:5.4 | 4 | 4 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD"
},
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | gemini:gemini-2.5-flash | 4 | 4 | [
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40\nThe Shipment Terms can have only these values \"EXW\", \"FOB\", \"CIF\", \"DDP\" (find approriate value for the Shipment Terms from chat messages). If not found it should be "
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
},
{
"path": "data[0].delivery_terms",
"expected": "CIF Busan",
"actual": "CIF"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | sonnet-4-5 | 4 | 3 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].billing_address",
"expected": "",
"actual": "Leonardo da Vinci"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | opus-4-5 | 4 | 3 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].po_ref_no",
"expected": "",
"actual": "OPO 260012/EC"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | gemini:gemini-2.5-pro | 4 | 3 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].po_ref_no",
"expected": "",
"actual": "OPO 260012/EC"
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | gemini:gemini-2.5-pro | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
},
{
"path": "data[0].shipping_address",
"expected": "",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:5.4 | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": ""
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:5.2 | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": ""
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:5-mini | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist \t6 Satara, Maharashtra \t6 412803"
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | sonnet-4-5 | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Japan",
"actual": ""
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
},
{
"path": "data[0].billing_address",
"expected": "",
"actual": "Leonardo da Vinci"
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:4.1 | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Japan",
"actual": ""
},
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
},
{
"path": "data[0].shipping_method",
"expected": "Collection",
"actual": ""
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | sonnet-4-5 | 4 | 3 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | opus-4-6 | 4 | 3 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | opus-4-5 | 4 | 3 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | gemini:gemini-2.5-flash | 4 | 3 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].delivery_terms",
"expected": "CIF Jakarta",
"actual": "CIF"
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | openai:4.1 | 4 | 3 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Jakarta",
"actual": ""
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | sonnet-4-5 | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].items[0].loading",
"expected": "",
"actual": "13MT/20'FCL"
},
{
"path": "data[0].billing_address",
"expected": "",
"actual": "Leonardo da Vinci, "
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:5-mini | 4 | 3 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": ""
},
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | openai:4.1 | 4 | 3 | [
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 08__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | opus-4-6 | 4 | 3 | [
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40' FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | sonnet-4-6 | 4 | 3 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "FeedBEST Company Limited, Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
},
{
"path": "data[0].shipping_address",
"expected": "Factory 354-58 Mojeon-1 Sobuk-gu Republic of Korea",
"actual": ""
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | opus-4-6 | 4 | 3 | [
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40' FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 09__2025-09-29__120363403592950429_g_us__d586d853-694c-42f9-93be-bc7ba5b2110c.json | gemini:gemini-2.5-pro | 4 | 3 | [
{
"path": "data[0].items[0].loading",
"expected": "23MT/40'FCL",
"actual": "23 MT / 40’ FCL"
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": "AG Lipids Pte Ltd"
},
{
"path": "data[0].payment_date",
"expected": "",
"actual": "Net 14 Days"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | opus-4-6 | 4 | 2 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | gemini:gemini-2.5-flash | 4 | 2 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": ""
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": ""
}
] |
| so_extraction | 01__2026-02-24__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:4.1 | 4 | 2 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
},
{
"path": "data[0].do_date",
"expected": "2027-03-01",
"actual": "2026-03-31"
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | sonnet-4-6 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": ""
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | sonnet-4-5 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": "Leonardo da Vinci"
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | opus-4-5 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | opus-4-6 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | gemini:gemini-2.5-flash | 4 | 2 | [
{
"path": "data[0].delivery_terms",
"expected": "CIF Nhava Sheva",
"actual": "CIF"
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": ""
}
] |
| so_extraction | 02__2026-02-09__120363426578757754_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:4.1 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].billing_address",
"expected": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist Satara, Maharashtra - 412803",
"actual": "GIIAVA (India) Pvt. Ltd., Plot No. C3, Wai MIDC, Dist – Satara, Maharashtra – 412803"
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | sonnet-4-6 | 4 | 2 | [
{
"path": "data[0].items[0].shipment_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
},
{
"path": "data[0].do_date",
"expected": "2026-03-31",
"actual": "2026-03-15"
}
] |
| so_extraction | 04__2026-01-29__120363408498669191_g_us__4b9c2faa-94dd-4236-abcc-398807051f21.json | gemini:gemini-2.5-pro | 4 | 2 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3250.0,
"actual": 3.25
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/MT",
"actual": "USD/KG"
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | opus-4-5 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].items[0].loading",
"expected": "",
"actual": "13MT/20'FCL"
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:5.4 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:5.2 | 4 | 2 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | gemini:gemini-2.5-flash | 4 | 2 | [
{
"path": "data[0].delivery_terms",
"expected": "CIF Nhava Sheva",
"actual": "CIF"
},
{
"path": "data[0].billing_address",
"expected": "",
"actual": "Leonardo da Vinci"
}
] |
| so_extraction | 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:5.4 | 4 | 2 | [
{
"path": "data[0].items[0].total",
"expected": 6300.0,
"actual": null
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | sonnet-4-6 | 4 | 2 | [
{
"path": "data[0].items[0].unit_price",
"expected": 3.5,
"actual": 3500.0
},
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": "USD/MT"
}
] |
| so_extraction | 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | openai:5.2 | 4 | 2 | [
{
"path": "data[0].items[0].total",
"expected": 6300.0,
"actual": 6.3
},
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 03__2026-01-30__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | gemini:gemini-2.5-pro | 4 | 1 | [
{
"path": "data[0].items[0].total",
"expected": null,
"actual": 32025.0
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | sonnet-4-6 | 4 | 1 | [
{
"path": "data[0].items[0].loading",
"expected": "",
"actual": "13MT/20'FCL"
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | openai:4.1 | 4 | 1 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | opus-4-6 | 4 | 1 | [
{
"path": "data[0].items[0].shipping_address",
"expected": "Nhava Sheva",
"actual": ""
}
] |
| so_extraction | 05__2026-01-20__120363407382355715_g_us__12a4f3a7-d506-4d32-ae06-3f76508c6abd.json | gemini:gemini-2.5-pro | 4 | 1 | [
{
"path": "data[0].items[0].loading",
"expected": "",
"actual": "13MT/20'FCL"
}
] |
| so_extraction | 06__2026-01-06__120363421131250401_g_us__e05574ec-b110-4554-9fc3-3abb4f9011a8.json | gemini:gemini-2.5-flash | 4 | 1 | [
{
"path": "data[0].items[0].pricing_unit",
"expected": "USD/KG",
"actual": "USD/kg"
}
] |
| so_extraction | 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:4.1 | 4 | 1 | [
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | sonnet-4-5 | 4 | 1 | [
{
"path": "data",
"expected_len": 1,
"actual_len": 0
}
] |
| so_extraction | 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | opus-4-5 | 4 | 1 | [
{
"path": "data",
"expected_len": 1,
"actual_len": 0
}
] |
| so_extraction | 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:5.4 | 4 | 1 | [
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |
| so_extraction | 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | opus-4-6 | 4 | 1 | [
{
"path": "data",
"expected_len": 1,
"actual_len": 0
}
] |
| so_extraction | 07__2025-12-23__120363403074656566_g_us__8f477a8f-2a60-4e0a-bf0e-8cc3cdf1dc9f.json | openai:5.2 | 4 | 1 | [
{
"path": "data[0].vendor_name",
"expected": "Van Beethoven",
"actual": ""
}
] |