Paid Terminal Bench project
This project proposes a Terminal-Bench task focused on financial operations and spreadsheet automation rather than traditional software engineering. The agent is provided with a collection of Excel workbooks containing transaction records, account summaries, and reconciliation reports that contain multiple inconsistencies introduced during data processing.
The objective is to identify and repair discrepancies between transaction-level data and account-level summaries using command-line tools and available spreadsheet-processing libraries. The agent must analyze workbook contents, detect missing or duplicated entries, correct formula errors, update derived calculations, and ensure that final balances match across all reports.
The task is designed to reflect realistic accounting and finance workflows that organizations routinely perform. Success is verified programmatically through independent tests that validate balance consistency, formula correctness, reconciliation accuracy, and preservation of non-target workbook content.
Unlike common coding benchmarks, this task emphasizes data integrity, spreadsheet manipulation, financial reasoning, and careful handling of structured business documents. The challenge requires the agent to inspect existing files, understand relationships between multiple worksheets, apply targeted corrections, and produce a fully reconciled set of outputs without modifying verification artifacts.
The environment is fully self-contained and offline, with all required datasets included inside the task package. Automated tests verify that reconciliation rules are satisfied while ensuring that agents cannot bypass validation through hardcoded outputs or direct modification of test files.