Why Exa Doesn't Work for Structured People Search

Exa's People Search API returns structured fields, but the search behind them isn't reliable at scale. Here's why, and what to use instead.

Published

May 4, 2026

Written by

Nithish

Reviewed by

Manmohit Grewal

Read time

7

minutes

Why Exa Doesn't Work for Structured People Search

Exa's People Search API returns structured fields for name, title, company, and location. For teams building products that need people data via API, the interface is clean and early test results look right. The problems emerge at scale, when natural language queries produce increasingly unreliable results that require manual review, supplemental data sources, and deduplication logic that shouldn't be necessary.

A venture fund we spoke with ran 500 leads through Exa for deal sourcing and found roughly one that was useful. Other teams building on Exa's people search described a similar progression. The first few results looked promising, then the quality dropped sharply as volume increased. Duplicates, irrelevant web content, and profiles that didn't match the search criteria became the norm rather than the exception.

The root cause is how Exa translates natural language queries into search operations over unstructured web content. Without structured filters to constrain results, precision drops as query complexity increases.

Exa Returns Structured Fields, but the Search Behind Them Isn't Reliable

Exa Websets extracts structured data from web pages. When you query for "VP of Engineering at Series B fintech companies," you get back fields like name, title, company, and email. The output looks like a database query response.

The difference is where those fields come from. A structured people database resolves a person's identity across multiple verified sources, tying a profile to a company because employment records, LinkedIn data, and company filings agree on the relationship. Exa's people search runs embeddings over unstructured web content, which means a person can get matched to a company because both appeared in the same article or blog post.

Say you query for "Head of Data Science at Series B fintech companies." A structured database returns people who currently hold that title at companies verified to be Series B fintech. Exa's embedding-based search returns web pages where those terms co-occur, such as a blog post interviewing someone who used to be Head of Data Science, a conference talk where a fintech founder mentioned their data team, or a job listing for a role that hasn't been filled yet. All of these match the embedding and produce structured-looking output fields, but none of them are the person you were looking for.

We heard versions of this from every team that had put Exa's people search into production. The venture fund mentioned above found that most results were articles and web pages that mentioned target profiles in passing rather than returning verified people records. A team building an engineer-ranking product reported that about 30% of results needed careful manual review, and the rest were, in their words, "absolute nonsense." The data looked structured in the API response, but the underlying matches were unreliable.

Duplicates compound the problem. When search pulls from unstructured web sources, the same person can appear multiple times from different pages. One CRE sales platform reported "a lot of duplicates" that turned signal extraction into a manual deduplication exercise. At the scale Exa Websets operates, with a cap of 100 results per search, roughly half the fetched profiles can be redundant after deduplication. That means paying for results you will discard.

Entity resolution solves this. A people data API that matches a person to a company because verified records from 10+ sources confirm the relationship, rather than because both names appeared on the same web page, produces results you can trust without manual review. Crustdata's People Discovery API resolves identities across LinkedIn profiles, company registrations, funding records, and employment data so that each result represents a distinct, verified person.

The More Specific Your Search Gets, the Worse Your Results Get

Most search tools improve with specificity, but Exa's people search does the opposite.

Exa takes your natural language query and runs it against web content using semantic embeddings. There are no structured filters for title, seniority, company stage, or geography. You describe what you want in prose, and Exa returns web pages whose content is semantically similar to your description. The output looks like a filtered database response, but nothing was actually filtered.

A product team building AI-powered purchase-intent prediction described the experience with Exa: "you can kind of describe what you want in natural language, but you can't really filter too much. So I end up with a lot of results that I don't really need."

Natural language descriptions can't express precise constraints

When you search for "VP of Engineering at Series B fintech companies in the Northeast," you're expressing at least five constraints, including title seniority, title function, company funding stage, company industry, and geography. A structured filter API evaluates each of these independently and returns only the intersection. Exa's embedding-based search treats the entire description as a single semantic input and returns pages that feel similar to the overall meaning. A news article about fintech funding in the Northeast that mentions a VP of Engineering will match the embedding even though it contains no actionable people record.

The gap widens with each additional qualifier. Every constraint you add to a natural language description increases the semantic surface area instead of narrowing the result set. More concepts in the query means more web pages that partially overlap with some subset of those concepts, so the search returns more loosely related content rather than less.

Title and role variations can't be expressed without filters

A VP of Operations and a Director of Operations occupy the same seniority band but use different title strings. A hardware engineer and an electrical engineer often do the same work. Without structured seniority-level or function filters, there is no way to tell Exa "these titles are equivalent for my search." You either run separate searches for each title variation and merge the results yourself, or accept that your search misses the people who match your actual criteria but use different words for the same role.

An LLM alone can't fix this without structured filters

Even Claude or GPT cannot reliably compose people searches from natural language unless structured filters exist underneath. An LLM can understand that "senior GCP infrastructure engineer at a robotics company" should filter by current title containing variations of infrastructure, platform, or cloud engineer, by skills including Google Cloud Platform, by company industry including robotics and automation, and by seniority level at or above senior.

But the LLM needs those filters to exist as composable API parameters. Without them, the LLM generates a natural language prompt that gets matched against web content embeddings, reproducing the same problem. Teams building Claude Code skills or MCP server integrations for people search need a filter-first API for the LLM to target.

Crustdata's People Discovery API exposes 60+ filters with nested boolean logic, including current title, past title, company, seniority level, function, geography, skills, education, and job changes. When an LLM maps a natural language query to these filters, the context stays intact because the filters are semantic constraints, not keywords. "Senior GCP infrastructure engineer at a robotics company in the Midwest" becomes an intersection of five explicit filters rather than a single embedding query against the open web.

Every Exa Workflow Becomes a Three-Tool Pipeline

Even teams that accept Exa's search quality end up bolting on additional tools because the API does not include verified contact data, does not guarantee data freshness, and caps results at 100 per search.

Contact enrichment is an add-on, not part of the core search

Exa Websets can enrich results with email addresses, but this runs as a separate enrichment step on top of the search, costing additional credits and processing time per result. The emails are sourced through web research rather than a verified contact database, so accuracy depends on what Exa's crawlers can find on the open web for each person.

A product team building on Exa for people discovery ran into this. They could find relevant profiles through Exa's search, but the enriched contact data was inconsistent enough that they still needed a separate provider for reliable email addresses. The search itself was already returning too many irrelevant results, and the results that were relevant still required validation from a second source before the team could use them for outreach. They estimated they were spending as much engineering time on the contact data pipeline as they had on the original Exa integration.

No freshness guarantees

Exa's data freshness depends on its web crawl frequency, with no SLA on how recently a profile was verified or updated. If the last crawl of a person's web presence happened three weeks ago, the API returns three-week-old data without indicating its age. A person who changed jobs, got promoted, or moved companies in that window still shows up under their old title and old employer.

One sales platform we spoke with found both Exa and their supplemental data sources returning outdated information, describing the experience as "more like a dev" because they spent time prompting and re-prompting the search to get reasonable results rather than getting clean data back from a single query.

For workflows where data recency matters, such as job changes, promotions, or new hires, teams add CoreSignal or a live scraping layer on top. Each additional data source adds integration complexity, deduplication logic, and cost.

The deduplication tax

When you combine results from Exa, a contact enrichment tool, and a freshness layer, overlapping records need to be reconciled. Exa might return a profile, PDL might return a slightly different version of the same person with a different title, and CoreSignal might show yet another variation with a more recent job change. Deciding which version is correct, merging the fields, and deduplicating across sources requires custom logic that scales linearly with result volume.

One recruiting platform supplementing Exa with CoreSignal for LinkedIn profile matching was effectively running two parallel search pipelines and merging the output. The engineering effort to maintain that merge layer was ongoing, not a one-time integration.

This three-tool pattern, where Exa handles discovery, PDL or Apollo handles contact enrichment, and CoreSignal or a live scraper handles freshness, appeared across the teams we spoke with that had started on Exa. What started as a single API integration had become a multi-vendor pipeline with three billing relationships, three failure modes, and deduplication logic gluing them together.

A single people data API that returns structured profiles with contact data and freshness guarantees in one response eliminates the second and third tools entirely. Crustdata's People Enrichment API includes verified business emails, 90+ datapoints per profile, and real-time enrichment from live sources in a single response.

What to Look for in a People Data API

The problems above are not unique to Exa. Any people search tool that relies on NL-to-keyword translation, web extraction without entity resolution, or search-only output without enrichment will produce similar failure modes at scale. Before integrating a people data provider, five questions separate APIs that work in production from APIs that require workarounds.

Does search use structured filters or keyword matching?

Keyword-based search tools convert your query into tags and return whatever matches those tags independently. Structured filter APIs let you compose constraints that intersect, evaluating current title AND company AND geography AND seniority level together as a single query. What matters just as much as the number of filters is whether you can nest and combine them freely. A query like "VP or Director of Engineering at Series B-C companies in healthcare, hired in the last 6 months" requires OR logic across titles, AND logic across company stage and industry, and a date filter on job start, all in the same request. Crustdata's People Discovery API supports 60+ filters with nested boolean logic that lets you mix and match any combination of these constraints, so that query runs as a precise filter intersection rather than a keyword expansion.

Is the data entity-resolved or web-extracted?

This is the question that predicts whether you will spend time on deduplication. Web-extracted data ties a person to a company because both appeared on the same web page. Entity-resolved data from multiple verified sources ties a person to a company because the relationship is confirmed across those sources. Ask the provider how many sources contribute to a single person record and how conflicts between sources get resolved.

Does it include contact data?

If you need to reach the people you find, the search API should return verified business emails and phone numbers in the same response. Bolting on a separate contact enrichment provider adds cost, latency, and a deduplication step between two record sets that should never have been separate. A people data API that includes contact enrichment in its core response means one integration, one billing model, and no record-matching layer to maintain.

What are the freshness guarantees?

"How fresh is the data?" is the wrong question. Ask how recently a specific profile was verified. Crawl-based indexes reflect whatever the last crawl captured, which could be weeks old with no indication of how old the data is. Real-time enrichment, like Crustdata's people enrichment, queries live sources when you make the request, so the response reflects the person's profile as it exists now rather than when some crawler last visited their web presence.

Can your LLM compose queries against it?

If you are building an AI-powered workflow, your LLM needs to translate user intent into API calls. Structured filters with documented parameter names give the LLM explicit constraints to compose against. When a user asks for "VPs of Engineering at healthcare companies in the Southeast," the LLM can map that to specific filter parameters for title, industry, and region. Free-text search endpoints force the LLM to craft prompts that will be keyword-split by the search tool, reproducing the same precision problems. Crustdata's filter-first API is designed to work with Claude, GPT, and other LLMs through the MCP server or direct API integration, giving the LLM a structured target rather than a free-text input that will lose context in translation.

Where This Leaves You

Exa's API is well-designed and its web search works for general content retrieval. The problems described here are specific to structured people search, where precision, entity resolution, and contact data determine whether the API output is production-ready or requires manual intervention. If your people search workflow involves multiple vendors, manual deduplication, or result quality that degrades with specificity, the architecture underneath the search is the bottleneck.

Try Crustdata's People Search API to see how structured filters handle the queries that keyword-based search cannot, or book a demo to walk through your specific use case with the team.

Data

Delivery Methods

Solutions