Competition Learnings Applied
This product is built on concrete learnings from NM i AI 2026 — Norway's National AI Championship. We built a working Tripletex agent (ranked 33rd of 369) and studied 22 competing teams' implementations. Every architectural decision traces back to evidence.
Architecture Decisions From Competition
1. LLM Agent with Function-Calling > Hardcoded Handlers
Evidence: 8 of 12 teams with working code used LLM agent loops. KreativKI (664 lines, Gemini function-calling) outperformed our 3800-line handler approach.
Applied: The Accounting API uses function-calling agent loop. The LLM decides which Tripletex/Fiken API calls to make, sees errors, and recovers. No hardcoded handler per task type.
Impact: ~800 lines instead of ~3800. Automatically handles new task types without code changes.
2. Conversation Layer Separate From Execution
Evidence: NikolaiPoverud (68.98, highest documented score) used pre-fetch + agent loop. hansjto used a two-pass approach (Opus executes, Sonnet verifies). Oddonline used a "think" tool before executing.
Applied: OpenClaw handles conversation (multi-turn, context, clarification). Accounting API handles execution (stateless, one task). Clean separation.
3. Pre-Fetch Common Data During LLM Call
Evidence: We implemented this during the competition — saved 3-5 seconds per complex task. NikolaiPoverud used ThreadPoolExecutor with 10 workers.
Applied: On each request, the Accounting API pre-fetches accounts, departments, VAT types, divisions in parallel while the LLM processes. Results cached per request.
4. Auto-Fix API Errors Between Turns
Evidence: NikolaiPoverud auto-fixed 422 errors by removing invalid fields and retrying. alexrajo used OpenAPI preflight validation to prevent errors. Oddonline blocked endpoints after 3 failures.
Applied: The Accounting API's agent loop includes error recovery: strip invalid fields, retry with corrections, block repeated failures. The LLM also sees the error and adjusts.
Tripletex API Learnings
Supplier Invoice — What Actually Works
Our experience: amountCurrency on POST body → 500 from proxy. voucherDate → 422.
hansjto's approach: Use /incomingInvoice?sendTo=ledger instead of /supplierInvoice.
NikolaiPoverud's approach: 2-posting with vatType (net on amount, gross on amountGross). Auto-fix net/gross math errors.
Applied: The TripletexProvider tries multiple approaches:
1. /incomingInvoice?sendTo=ledger (hansjto's method)
2. /supplierInvoice with inline voucher postings
3. Raw voucher fallback
Payroll — Skip the Salary API
Our experience: /salary/transaction always fails with "Arbeidsforholdet ikke knyttet mot virksomhet" (employment not linked to division).
NikolaiPoverud (68.98): System prompt says "STOP after voucher. Do NOT try /salary/transaction — it always fails." Uses manual voucher: debit 5000/5001, credit 2930.
Applied: TripletexProvider uses manual voucher for payroll. More reliable than the salary API.
Bank Reconciliation — No Reconciliation Endpoint
Our experience: We built POST /bank/reconciliation + PUT /bank/reconciliation/match/:suggest. Scored 2/10.
All other teams: None use the reconciliation endpoint. They parse CSV and register individual payments.
Applied: Parse bank statement, match transactions to invoices, register payments individually.
Year-End Closing — Separate Vouchers Per Asset
Our experience: One combined depreciation voucher → scored 4.5/10. Separate per asset → improvement.
hansjto: GET /balanceSheet BEFORE postings. Calculate tax from actual balances + new amounts.
Applied: One voucher per depreciation asset. Read actual P&L from ledger for tax calculation.
Travel Expense Per Diem
NikolaiPoverud: Does NOT set rateCategory. Just sets rate, count, location.
hansjto: Key trick: rateType.id = rateCategory.id (same value). count = overnight stays, NOT days.
Applied: Per diem with explicit rate from prompt. Count = overnight stays.
Stable IDs Across Sandboxes
Confirmed by natti1399 and Oddonline: VAT type IDs are stable:
| VAT | ID |
|---|---|
| 25% output | 3 |
| 15% output | 31 |
| 12% output | 33 |
| 0% output | 5 |
| 25% input | 1 |
Applied: Hardcoded in TripletexProvider. Saves API lookup per request.
Fields That Break Things
From our testing and other teams:
| Field | Endpoint | Problem |
|---|---|---|
amountCurrency |
POST /supplierInvoice | 500 from proxy |
voucherDate |
POST /supplierInvoice | 422 "field doesn't exist" |
isCustomer |
POST /customer | readOnly |
isSupplier |
POST /supplier | readOnly |
amount |
POST /supplierInvoice | readOnly |
departmentNumber |
various | Often causes validation errors |
Applied: Provider strips known-bad fields before sending. OpenAPI preflight validation (from alexrajo's approach).
LLM Strategy Learnings
Model Choice
| Model | Teams Using | Strengths |
|---|---|---|
| Gemini 2.5 Flash | KreativKI, alexrajo, natti1399 | Fast (~2s), cheap, native function-calling |
| Gemini 2.5 Pro | maksdunajski, sth1712 | Better reasoning, slower |
| Claude Opus 4.6 | NikolaiPoverud, hansjto | Best reasoning, code execution |
| Claude Haiku | us, kleiven | Fast, cheap, good enough for parsing |
Applied: Gemini 2.5 Flash for conversation (fast, cheap). Claude for complex tasks if needed (fallback).
System Prompt Design
Common across all teams: - Full API reference embedded (~5K chars) - Hardcoded stable IDs (VAT types, currency) - "Fresh sandbox" instruction - "Minimize write calls" instruction - Norwegian accounting conventions
Oddonline's innovation: Per-task prompt injection — only include the relevant task pattern, not all patterns. Keeps context focused.
Applied: Base system prompt + task-specific recipes injected dynamically.
Error Prevention
alexrajo: OpenAPI preflight validation — parse Tripletex schema, strip readOnly fields, validate required fields BEFORE hitting the API.
Oddonline: Invalid field blocklist in prompt — explicitly tell the LLM which field names DON'T exist.
Applied: Both approaches. Preflight validation in provider. Blocklist in system prompt.
Testing Learnings
Save Every Request
Our approach: Saved all 294 competition requests to disk. Audited every one against regex parser. Found 25 misclassifications.
Applied: Every request to the Accounting API is logged with full input/output. Replay testing against saved requests.
Test Before Deploy
Our mistake: Deployed --max-tokens CLI flag that broke ALL LLM calls. Took a submission to discover.
Applied: CLI/SDK validation test in CI. Integration test against real Tripletex sandbox before deploy.
Mutation Fuzzer (Pling-AS)
Innovation: When a task scores low, auto-generate mutation candidates (different VAT IDs, remove fields) and resubmit.
Applied: Not directly, but the auto-fix between agent turns achieves a similar effect dynamically.
Speed Learnings
Token TTL Is Not a Timer
Our discovery: 237-second task scored 7.5/10. Token TTL is >300s or not time-based.
Applied: No artificial timeout. Let tasks run to completion.
Reduce API Round-Trips
Our implementation: Account cache per request saved 10-20 duplicate lookups.
NikolaiPoverud: Pre-fetch with 10 parallel workers.
Applied: Pre-fetch + cache. Common accounts, departments, VAT types loaded once.
CLI Subprocess Is Slow
Our measurement: CLI = ~4.5s (0.9s overhead + 3.7s API). Direct SDK = ~2-3s.
Applied: Always use SDK directly. Never CLI subprocess.
What No Team Solved
Even the best teams had these unsolved:
- Bank reconciliation entity — no team uses
/bank/reconciliation. All just register payments. - Receipt expense — consistently low scores across teams. PDF OCR + correct account mapping is hard.
- Year-end tax with empty sandbox — if no pre-existing income, tax = 0. Hard to know if sandbox has data.
Applied: These are known limitations documented for human accountant review.
Summary: Competition → Product
| Competition Learning | Product Feature |
|---|---|
| Agent > handlers | Function-calling agent loop |
| Conversation needs context | OpenClaw with structured facts |
| Pre-fetch saves time | Parallel data loading |
| Auto-fix 422s | Error recovery between turns |
/incomingInvoice for SI |
Multi-strategy provider |
| Skip salary API | Manual voucher for payroll |
| Hardcode VAT IDs | Static config, not API lookup |
| Save all requests | Full audit trail + replay testing |
| Test before deploy | CI with sandbox integration tests |
| 22 teams studied | Best practices from each applied |