Robotics data infrastructure
Real-World Data Engine for Physical AI.
Training-ready manipulation datasets: overhead + egocentric video paired with real world actions and movements.
Built with teams training robot manipulation and VLA policies
Why now
Real manipulation data is the bottleneck.
Foundation models for robotics are scaling fast. Data collection infrastructure is not keeping up.
Insufficient real-world data
Sim-to-real transfer remains brittle. Models need high-volume, high-fidelity manipulation footage from actual work cells to generalize.
Collection is slow and expensive
Standing up capture rigs, hiring operators, and managing consent/privacy takes months. Most teams skip it and suffer in evaluation.
No standard for quality or format
Every lab ships data differently. Inconsistent frame rates, unlabeled failures, and missing calibration data break training pipelines.
What we deliver
End-to-end data infrastructure for robot learning.
Synchronized capture
Overhead + egocentric video streams synchronized with gripper state and end-effector trajectory signals. Sub-frame alignment across all modalities.
QA + versioning
Every dataset drop includes train/val splits, per-episode metadata, camera calibration files, failure labels, and retry counts. Immutable version IDs.
Pipeline-ready drops
Structured exports with a clear dataset schema. Compatible with common dataloaders. No custom parsing required.
How it works
Three steps from facility to training data.
Deploy capture kits
We ship calibrated camera rigs and signal taps to your facility. No custom hardware integration. Plug into existing work cells.
Collect in real workflows
Operators run normal tasks while we record. Overhead, egocentric, and gripper telemetry are captured simultaneously and time-aligned.
QA + ship dataset drops
We validate every episode, label successes and failures, generate train/val splits, and deliver versioned dataset drops ready for your pipeline.
Dataset spec
Structured, documented, ready to load.
Every drop ships with a schema manifest. Here is an example of the fields included per episode frame.
| Field | Type | Description |
|---|---|---|
| timestamp | int64 | Unix epoch in microseconds |
| camera_overhead_id | string | Unique stream ID for overhead camera |
| camera_ego_id | string | Unique stream ID for egocentric camera |
| gripper_state | float32[6] | Joint angles + open/close scalar |
| trajectory_id | string | Episode-level trajectory identifier |
| task_label | string | Semantic task name (e.g. pick_place_box) |
| success | bool | Whether the episode succeeded |
| retries | uint8 | Number of retry attempts |
| site_id | string | Facility / work-cell identifier |
| annotator_qa_flag | bool | Human QA pass/fail flag |
Sample JSON
{
"timestamp": 1706140800000000,
"camera_overhead_id": "cam_oh_03a",
"camera_ego_id": "cam_ego_03a",
"gripper_state": [0.12, -0.45, 1.02, 0.33, -0.78, 0.85],
"trajectory_id": "traj_2024_01_25_0042",
"task_label": "pick_place_box",
"success": true,
"retries": 0,
"site_id": "wh_east_cell_07",
"annotator_qa_flag": true
}Use cases
Built for real-world robot learning.
Humanoids
Dexterous manipulation data for humanoid policy training.
Warehouse picking
High-volume pick-and-place data from live fulfillment cells.
Packaging / kitting
Multi-object assembly and packaging task episodes.
Returns / sortation
Sorting and inspection workflows with failure labeling.
Teleop training
Paired teleoperation demonstrations for imitation learning.
Eval sets
Curated held-out evaluation datasets with ground truth.
Security + privacy
Your data, handled responsibly.
Informed consent collected from all facility participants before capture begins.
Facility permissions and data processing agreements signed before equipment deployment.
Face and badge redaction applied automatically; manual review available on request.
Role-based access controls and encrypted storage for all raw and processed data.
Full audit trail for data access, export, and deletion. SOC 2 readiness in progress.
FAQ
Frequently asked questions
Get in touch
Request access
Tell us about your project. We respond within 48 hours.