uCO3D is a large-scale 3D vision dataset and toolkit centered on turn-table videos of everyday objects drawn from the LVIS taxonomy. It provides about 170,000 full videos per object instance rather than still frames, along with per-video annotations including object masks, calibrated camera poses, and multiple flavors of point clouds. Each sequence also ships with a precomputed 3D Gaussian Splat reconstruction, enabling fast, differentiable rendering workflows and modern implicit/point-based modeling experiments. The repository includes automated downloaders with checksum verification, fine-grained controls to fetch only selected modalities or super-categories, and a lightweight Python API for loading frames, geometry, and splats on demand. Metadata is indexed in SQLite for quick queries at scale, and helper builders handle alignment, undistortion, frame extraction from videos, and cropping around the object.
Features
- ~170k turn-table videos across ~1000 LVIS categories grouped into 50 super-categories
- Rich per-sequence annotations: segmentation masks, camera poses, and three point cloud types
- Precomputed 3D Gaussian Splat reconstructions for every sequence
- Download by modality and/or super-category to control storage footprint
- Python API with frame-level loading from videos, builders for alignment and cropping, and utilities for splat rendering
- SQLite metadata and subset list files for scalable queries, train/val splits, and custom set creation