Case Study
•
a16z Case Study
How a Top 5 VC by AUM is building an internal deal sourcing tool with Crustdata

Company
Top 5 VC Firm by AUM
use case
Proprietary Founder Sourcing
company overview
This firm is one of the most prominent venture capital firms in the world. With an internal data science and product team dedicated to building proprietary sourcing infrastructure, they've moved beyond off-the-shelf VC platforms to gain a competitive edge in identifying founders before anyone else does.
scale
~200,000 people updated weekly
Company overlay
The Challenge
Impossible to Codify a Unique Investment Thesis
Data Quality & Deduplication Challenges
Freshness & Latency Requirements
The Solution
Raw Data Infrastructure to Build On
Real-Time Automated Enrichment
Reliable Unique Identifiers That Solve Deduplication
Infrastructure Built for Scale
Results & Benefits
Thesis-Driven Sourcing Without Platform Constraints
First-Mover Advantage on Founder Discovery
Engineering Focus Shifted from Data Cleaning to Product Building
The Challenge
The firm was seeing the same founders as everyone else, at the same time. Their out-of-the-box sourcing tools couldn't surface founders who fit their thesis before competitors did, resulting in lower allocation in rounds, missing high ROI investments and losing the ability to build relationships before competitive processes started.
Their data and tech teams set out to build a proprietary founder monitoring system, but existing data infrastructure couldn't support the vision.
Impossible to Codify a Unique Investment Thesis
Standard VC platforms decide which founders surface based on generic criteria and every firm subscribed to the same platform runs the same playbook.
The firm's thesis involves signals and combinations that no off-the-shelf platform exposes as filters
They wanted to weight founder signals differently by sector — a founder with previous entrepreneurial experience matters more in B2B SaaS while a deep technical background or PhD matters more in deep-tech
The sourcing edge they wanted couldn't exist inside a shared platform. If competitors can subscribe to the same alerts, they lose their edge
They wanted to be alerted when companies started hiring ex-founders in leadership positions, a pattern they have seen over the years, precedes rapid growth
Existing platforms optimized for later-stage companies, not early signals
Data Quality & Deduplication Challenges
Working with multiple data providers meant constant resolution of duplicate records and inconsistent identifiers.
"There's been an extensive process of finding the edge cases and implementing fairly intensive post-processing before we can actually use them in our knowledge graph."
No reliable unique identifier across providers
Engineering resources drained by data cleaning instead of product building
Freshness & Latency Requirements
The firm's internal tools needed to provide results quickly. But most data providers relied on human-in-the-loop processes that introduced unpredictable latency.
Other "real-time" providers actually used manual data collection behind the scenes
Latency issues eroded trust in the data and slowed workflows
No infrastructure for true just-in-time enrichment at scale
The Solution
Crustdata provided the data infrastructure layer for their proprietary founder monitoring system — a live knowledge graph that updates in real-time without human intervention.
Raw Data Infrastructure to Build On
Off-the-shelf VC platforms mean shared edge — competitors using the same platform can replicate the same filters, signals, and alert logic
Proprietary data sources like internal documents and partner notes can't be uploaded to a third-party platform but can be integrated into an in-house system built on raw data APIs
Internal tooling compounds over time — every new signal, scoring model, and feedback loop from partners makes the system harder to replicate. A SaaS platform's improvements benefit all subscribers equally; internal tooling only benefits the firm
Real-Time Automated Enrichment
Unlike other providers they had evaluated, Crustdata's architecture is fully automated.
"Other providers were relying on basically human-in-the-loop. There is a person involved in the collection of that metadata. Crustdata is automated end-to-end. That's a different approach than I've seen generally followed by other folks in the space."
API request triggers live crawlers to fetch information from the web instantly
Dedicated crawler resources available for high-priority requests
Reliable Unique Identifiers That Solve Deduplication
Crustdata's knowledge graph anchors every entity to a stable unique identifier and maps all other data sources to this anchor.
Datapoints from multiple external sources mapped to core identifiers
Identifiers stay consistent, and no engineering time is wasted on deduplication pipelines
Infrastructure Built for Scale
The firm processes hundreds of millions of records and updates approximately 200,000 people weekly. Crustdata's infrastructure supports this volume without breaking.
Real-Time API: Person enrichment, company jobs, posts, and reactions on demand
Profile Watcher: Automated monitoring of founded job changes, social posts, and early signals
Low Latency: Sub-2-second response times for enrichment
