Data-OnCall
first placeFirst place. 4-agent incident response team for data quality bugs — fine-tuned Llama 3.1 + LoRA for NL→GraphQL, end-to-end in 54.8s at ~2¢/run.
First place, solo build.
A 4-agent system that takes a data quality alert from DataHub and produces a complete incident response: triage, root-cause investigation against lineage, fix proposal, and human-readable writeup. Each agent owns one stage and hands off via structured artifacts.
- Coordinator (Kimi-K2-Thinking) — long-horizon planner with visible reasoning traces
- Detective (Llama 3.1 8B + LoRA) — lineage tracer via NL→GraphQL
- Reality-Checker (same fine-tune, different system prompt)
- Fixer (MiniMax-M2.5) — writes the postmortem back to the catalog via Python SDK
Fine-tuned the LoRA myself on 300 synthetic NL→GraphQL pairs targeting narrow DataHub query patterns. Validation loss dropped 34% over 3 epochs, monotonic, no overfitting.
End-to-end in 54.8 seconds at ~2¢ per run. Found three planted bugs by exact row count: 5,632 truncated seller IDs, 7,955 deleted customers, 988 NULL categories.
The motivating problem was real: Elias (my personal agent on OpenClaw) had been telling guests that one of my properties had a hot tub. It doesn’t. The hallucination had propagated across LanceDB / Postgres / Qdrant / Gemini embeddings, and I had no traceability. DataHub gave me the metadata catalog + lineage graph I needed to root-cause it.
Solve your own problem — it’s almost always someone else’s problem too.