NEW

Watcher API Is live! Real-time signals on key people & accounts

Track promotions, job changes, and funding rounds the moment they happen

Data

Delivery Methods

Solutions

Sign in

Case Study

a16z Case Study

How a Top 5 VC by AUM is building an internal deal sourcing tool with Crustdata

Company

Top 5 VC Firm by AUM

use case

Proprietary Founder Sourcing

company overview

This firm is one of the most prominent venture capital firms in the world. With an internal data science and product team dedicated to building proprietary sourcing infrastructure, they've moved beyond off-the-shelf VC platforms to gain a competitive edge in identifying founders before anyone else does.

scale

~200,000 people updated weekly

Company overlay

The Challenge

Impossible to Codify a Unique Investment Thesis

Data Quality & Deduplication Challenges

Freshness & Latency Requirements

The Solution

Raw Data Infrastructure to Build On

Real-Time Automated Enrichment

Reliable Unique Identifiers That Solve Deduplication

Infrastructure Built for Scale

Results & Benefits

Thesis-Driven Sourcing Without Platform Constraints

First-Mover Advantage on Founder Discovery

Engineering Focus Shifted from Data Cleaning to Product Building

The Challenge

The firm was seeing the same founders as everyone else, at the same time. Their out-of-the-box sourcing tools couldn't surface founders who fit their thesis before competitors did, resulting in lower allocation in rounds, missing high ROI investments and losing the ability to build relationships before competitive processes started.

Their data and tech teams set out to build a proprietary founder monitoring system, but existing data infrastructure couldn't support the vision.

Impossible to Codify a Unique Investment Thesis

Standard VC platforms decide which founders surface based on generic criteria and every firm subscribed to the same platform runs the same playbook.

  • The firm's thesis involves signals and combinations that no off-the-shelf platform exposes as filters

    • They wanted to weight founder signals differently by sector — a founder with previous entrepreneurial experience matters more in B2B SaaS while a deep technical background or PhD matters more in deep-tech

    • The sourcing edge they wanted couldn't exist inside a shared platform. If competitors can subscribe to the same alerts, they lose their edge

    • They wanted to be alerted when companies started hiring ex-founders in leadership positions, a pattern they have seen over the years, precedes rapid growth

  • Existing platforms optimized for later-stage companies, not early signals

Data Quality & Deduplication Challenges

Working with multiple data providers meant constant resolution of duplicate records and inconsistent identifiers.

"There's been an extensive process of finding the edge cases and implementing fairly intensive post-processing before we can actually use them in our knowledge graph."

  • No reliable unique identifier across providers

  • Engineering resources drained by data cleaning instead of product building

Freshness & Latency Requirements

The firm's internal tools needed to provide results quickly. But most data providers relied on human-in-the-loop processes that introduced unpredictable latency.

  • Other "real-time" providers actually used manual data collection behind the scenes

  • Latency issues eroded trust in the data and slowed workflows

  • No infrastructure for true just-in-time enrichment at scale

The Solution

Crustdata provided the data infrastructure layer for their proprietary founder monitoring system — a live knowledge graph that updates in real-time without human intervention.

Raw Data Infrastructure to Build On

  • Off-the-shelf VC platforms mean shared edge — competitors using the same platform can replicate the same filters, signals, and alert logic

  • Proprietary data sources like internal documents and partner notes can't be uploaded to a third-party platform but can be integrated into an in-house system built on raw data APIs

  • Internal tooling compounds over time — every new signal, scoring model, and feedback loop from partners makes the system harder to replicate. A SaaS platform's improvements benefit all subscribers equally; internal tooling only benefits the firm

Real-Time Automated Enrichment

Unlike other providers they had evaluated, Crustdata's architecture is fully automated.

"Other providers were relying on basically human-in-the-loop. There is a person involved in the collection of that metadata. Crustdata is automated end-to-end. That's a different approach than I've seen generally followed by other folks in the space."

  • API request triggers live crawlers to fetch information from the web instantly

  • Dedicated crawler resources available for high-priority requests

Reliable Unique Identifiers That Solve Deduplication

Crustdata's knowledge graph anchors every entity to a stable unique identifier and maps all other data sources to this anchor.

  • Datapoints from multiple external sources mapped to core identifiers

  • Identifiers stay consistent, and no engineering time is wasted on deduplication pipelines

Infrastructure Built for Scale

The firm processes hundreds of millions of records and updates approximately 200,000 people weekly. Crustdata's infrastructure supports this volume without breaking.

  • Real-Time API: Person enrichment, company jobs, posts, and reactions on demand

  • Profile Watcher: Automated monitoring of founded job changes, social posts, and early signals

  • Low Latency: Sub-2-second response times for enrichment