Client overview
The client is a leading multinational technology company based in China, developing an AI coding agent designed to solve long-horizon software engineering tasks across multiple programming languages and complex codebases
Business Challenges
- Large-scale, high-quality trajectory data is required to ensure stable learning and strong generalization.
- Tasks must be diverse and progressively harder, aligned with the model roadmap (difficulty ramp-up).
- Near-perfect step accuracy is mandatory, with strict QC—one incorrect step can invalidate an entire trajectory.
- Broad coverage is required across programming languages and request types (bug fixes, environment issues, feature development, etc.).
- Scaling requires experienced developers with years of hands-on development and project maintenance experience.
- A stable, consistent benchmark and scoring approach is needed to track progress over time and ensure training stays aligned with evaluation targets.
Project Detail
Long-Horizon Trajectory Collection & Recording, QA Framework Design & Execution, Task & Scenario Authoring, Multi-Language Code Coverage, AI Agent Evaluation & Progress Tracking

Solutions
1.Standardized production pipeline: Task creation → execution → revision → automated QA → human QA to increase throughput while reducing errors.
2.Roadmap-aligned task generation: Fast refresh cycles, difficulty control, and phased domain expansion across training waves.
3.End-to-end QA with clear rubrics: Multi-layer checks (peer review, spot checks, gold-standard tasks) plus metrics such as pass rate, edit distance, time-to-fix, and self-recovery.
4.Scalable delivery via outsourcing: Rapid onboarding, periodic performance measurement, and skill-tiering to optimize cost vs. quality.
5.Trajectory quality control: Reasoning trace validation, tool-use verification, and reproducibility checks.
6.Evaluation-ready datasets: Standardized formats compatible with agent benchmarks.
7.Continuous improvement loop: Evaluation results drive task refresh and QA updates.







