Simulated App Usage Recording for Smarter AI Training

Client overview

Our client is a research group at a leading U.S. technology university. They are developing an AI Agent in the form of a Computer Use Agent (CUA), capable of interacting with and operating across platforms on macOS, Windows, and Linux. At the current stage, the agent can analyze requests, translate them into goals, and execute the corresponding tasks.

Business Challenges

Requires a very large, high-quality trajectory dataset for stable and generalizable CUA learning.
Struggles are common with complex, stateful UI components (e.g., select boxes, editable fields, modals).
Tasks must stay diverse and increase in complexity as the model evolves.
Near-perfect step accuracy and strict QC are needed—one wrong step can invalidate the entire outcome.
Must cover macOS, Windows, and Linux to avoid OS-specific failures due to UI/behavior differences.
Needs scalable, high-skill resourcing while keeping cost efficient.
Requires reliable benchmarks to track progress and align training with evaluation criteria.

Project Detail

Trajectory Collection & Recording, Task Authoring & Scenario Design, Multi-OS Generalization, OSWorld-based Environment & Evaluation Harness, Benchmark Construction & Protocol, AI Evaluation, Modified Thinking, Data Governance

Solutions

1.Operate a standardized recording + revision pipeline with experienced operators to deliver high throughput.

2.Proactively design scenarios targeting known failure modes and hard UI interactions to improve coverage where models break.

3.Maintain a flexible task-generation pipeline aligned with the model roadmap for fast task refresh and difficulty ramp-up.

4.Implement end-to-end QA with clear rubrics, multi-layer checks (peer review, spot checks, gold standards), and metrics tracking (success rate, lead time, self-recovery).

5.Standardize cross-OS guidelines and execute tasks across all three platforms to reflect deployment conditions and reduce OS-specific errors.

6.Scale capacity through a proven outsourcing model with rapid training, performance measurement, and competitive cost.

7.Build benchmark task suites that balance difficulty and discriminability, reflect real-world workflows, and provide actionable signals for iteration.

Client overview

Business Challenges

Project Detail

Solutions

Related Posts

LTS GLOBAL DIGITAL SERVICES

ABOUT US

CONTACT US

SOLUTIONS

OUR INDUSTRIES

RESOURCES

ABOUT US

CAREERS

Client overview

Business Challenges

Project Detail​

Solutions

Related Posts

2D Segmentation for Component Tagging

LiDAR Annotation for Autonomous Driving

Large-Scale Gaze Data Collection for Hands-Free AI Systems

LTS GLOBAL DIGITAL SERVICES

ABOUT US

CONTACT US

SOLUTIONS

OUR INDUSTRIES

RESOURCES

ABOUT US

CAREERS

Project Detail