Human Archive Is Building the Multimodal Dataset the Robotics Industry Needs
Human Archive, a Winter 2026 Y Combinator company, is pursuing one of the least glamorous and most consequential bottlenecks in modern robotics: data. The startup, founded in 2026 by Raj Patel, Rushil Agarwal, Samay Maini, and Shloke Patel, specializes in AI robotics training data – positioning itself as the multimodal data provider for frontier labs and general-purpose robotics companies training the foundation models that will operate humanoid and embodied systems in the physical world.
The wager is a familiar one in AI history, translated into atoms. Large language models reached their current capabilities only once internet-scale text became tractable to collect, clean, and train on. Robotics has no equivalent corpus. The physical world does not organize itself into conveniently scraped archives, and the data that does exist tends to be narrow – single-task, single-environment, and rarely aligned across modalities like vision, proprioception, force, and language. Human Archive is betting that whoever solves the data collection problem for embodied intelligence will quietly enable the next generation of robotic capability, in the same way that web-scale text unlocked modern language models.
The company’s AI robotics training data pipeline starts with custom hardware deployed across residential and manufacturing settings, capturing aligned multimodal data that reflects the way humans actually move, manipulate objects, and navigate spaces. The team has thought carefully about the fine line between biomimicry and its application to humanoid systems – a distinction that matters because the long-term customer is not a simulation benchmark but a robot that will share environments with people. The raw data then flows through internal QA, anonymization, and annotation pipelines, producing diverse, high-fidelity datasets that are ready to be used in training.
That combination of custom capture hardware, disciplined post-processing, and deliberate attention to privacy is what makes the offering unusually hard to replicate. Frontier robotics labs need data that is labeled, aligned, anonymized, and diverse enough to support generalization. Most will not build that pipeline themselves because doing so requires hardware engineering, human operations across multiple geographies, and the legal and privacy rigor to collect data in real residential and industrial environments without generating liability. Human Archive is building the full stack specifically so its customers do not have to.
The founding team’s commitment is written into the story itself. The four co-founders dropped out of Stanford and Berkeley and moved to Asia to collect what they aim to make the world’s largest annotated multimodal dataset. That is an unusually physical bet for a YC company and signals confidence that this is a historic inflection point for embodied intelligence. The company is hiring broadly and operationally: open roles include Head of Engineering, Industrial Designer, Operations Manager, QA and Hardware Test Engineer, Embedded and Electrical Engineer, Firmware Engineer, Mechanical Engineer for Wearable Systems, and even a Videographer – a hiring pattern that reads less like a software startup and more like a small manufacturing and field-operations company.