TwelveLabs Raises $100 Million to Build Video Superintelligence With AWS as Its Cloud Backbone

More than 90 percent of the world's data is video, and almost none of it is truly searchable. Surveillance footage, sports archives, enterprise training libraries, broadcast content, and government operational video sit largely untouched because the tools to index, understand, and query it at scale have not existed at a practical cost. TwelveLabs was founded to change that, and the company has now raised $100 million in a Series B round to push its vision of what it calls video superintelligence further into production.
The round was co‑led by NEA and NAVER Ventures, the latter marking its first investment since launching its Silicon Valley venture arm in mid‑2025. Amazon participated as a major backer alongside returning investors Radical Ventures, Korea Investment Partners, and Index Ventures. Quadrille Capital and Red Bull Ventures joined as new investors in this round. The raise pushes TwelveLabs' total funding past $207 million, following a $50 million Series A in 2024 co‑led by NEA and NVIDIA's NVentures.
The AWS Partnership Behind the Headline Number
The funding itself is notable, but the structural detail that will have longer‑term implications is the formalisation of AWS as TwelveLabs' preferred cloud provider. Amazon's participation in this round is not a passive financial bet. The two companies have signed a multiyear commitment that ties TwelveLabs' infrastructure roadmap directly to AWS, with video inference workloads being optimised specifically for Amazon's Trainium AI chips. Crucially, new TwelveLabs foundation models will launch on AWS first before becoming available on other platforms.
The company's models have been available on Amazon Bedrock for more than a year, but this round converts that commercial relationship into something more structural. Jason Bennett, who serves as VP and Global Head of Startups and Venture Capital at AWS, described TwelveLabs as a company that has been expanding the limits of what AI can perceive and reason about since its earliest days, and framed the Bedrock track record as evidence that the partnership is built on genuine customer value rather than strategic optics.
Who Built TwelveLabs and Why
TwelveLabs was founded in 2021 by Jae Lee, Dave Chung, Aiden Lee, Soyoung Lee, and SJ Kim. All five founders met while serving together in South Korea's military cyber operations command, and Lee previously worked as a lead data scientist at South Korea's Ministry of National Defence before assembling the founding team. The company is headquartered in San Francisco with a significant presence in Seoul, and it has grown from roughly 58 employees a year ago to approximately 178 as of June 2026, reflecting the kind of headcount expansion that typically precedes a major commercial push.
Lee has described the founding thesis in terms that cut against the prevailing assumptions of the AI industry, arguing that the substrate of machine intelligence is recorded reality in motion rather than language. In his framing, language is downstream of understanding, and video is the data that understanding ultimately has to answer to. That conviction has guided the company away from the generative video race that has consumed much of the market's attention and toward the harder, less visible problem of making existing video comprehensible to machines.
What the Technology Actually Does
Most large language models handle video by sampling a handful of frames and processing them alongside a transcript. This approach loses enormous amounts of information, cannot reason about motion, temporal relationships, or audio context, and requires the model to start the analysis over from scratch every time a new query is submitted. TwelveLabs was built to work differently.
The company's architecture is built around two core models. Marengo 3.0, released late last year, is described as the company's most powerful video embedding model. It processes every sound, every spoken word, and every on‑screen motion across a video's full timeline, converting the raw footage into a semantic layer that machines can search and reason over. Pegasus 1.5, the company's more recent release, works alongside Marengo to convert video content into structured data, identifying scene boundaries, named entities, temporal segments, and summaries. Together, the two models enable a system that understands a piece of video once, stores the result as structured memory, and can then respond to subsequent queries by reasoning over that stored understanding rather than re‑analysing the footage each time.
This architecture is what TwelveLabs is now calling a Video Cognition System, and the Series B will fund the continued development of that full stack as the company moves beyond foundation models into a more complete intelligence layer for video.
Markets and Traction
TwelveLabs has established its deepest commercial footprint in media and entertainment, where the ability to search and analyse large video archives has immediate practical value for content licensing, sports highlight generation, and broadcast archive monetisation. But the company has also moved into the public sector, working with governments to apply video intelligence to mission‑critical operational workflows, a category that includes security, surveillance analysis, and incident response.
Additional verticals generating demand include advertising, sports analytics, automotive, and enterprise security. The breadth of that list reflects the fact that video data problems are universal across industries, even if the specific use cases look very different from one sector to the next.
Alongside the Series B, TwelveLabs launched a closed beta of Rodeo, an AI‑powered video creation tool. The addition of a creation capability alongside the company's existing understanding and analysis platform represents a meaningful expansion of scope, and it will test whether TwelveLabs can compete across the full lifecycle of video AI rather than owning only the intelligence and retrieval layer.
Where the Money Goes
The $100 million will fund research and development, expanded hiring, and geographic growth. The company plans to open new offices in New York and London alongside its existing San Francisco and Seoul bases. New York and London represent the two largest concentrations of media, finance, and enterprise buyers likely to pay for production‑scale video intelligence, making the expansion more commercially motivated than symbolically.
The competitive context matters here. Runway reached a $5.3 billion valuation in February 2026 after a $315 million round, building toward world models that simulate full 3D physical environments. OpenAI's Sora and Google's Veo have dominated the generative video narrative. TwelveLabs is making a deliberate choice to sit outside that framing, arguing that the more durable business is extracting intelligence from the world's existing video archive rather than generating new video from prompts. Whether that distinction holds as the leading generative models add stronger understanding and retrieval capabilities is the strategic question that will define how TwelveLabs' next few years play out.
NEA partner Tiffany Luck, who backed the company from its earliest stages, described TwelveLabs as purpose‑built to turn millions of hours of footage into intelligence that compounds over time, adding that as video understanding moves from novel capability to essential infrastructure, the company is positioned to define what that infrastructure looks like. That framing, intelligence compounding from video archives rather than generated on demand, is the thesis the $100 million is now being asked to prove.





