How to Resolve User Identity from Sparse Signup Data

The waterfall logic, coverage rates by input type, and confidence-based routing that PLG teams need to turn a personal email into an actionable company record in under two seconds.

Published

May 10, 2026

Written by

Manmohit Grewal

Reviewed by

Abhilash Chowdhary

Read time

minutes

How to Resolve User Identity from Sparse Signup Data

The waterfall logic, coverage rates by input type, and confidence-based routing that PLG teams need to turn a personal email into an actionable company record in under two seconds.

Most B2B products that accept self-serve signups collect an email, maybe a name, and nothing else. On a call with a B2B payments platform processing 3,000 daily merchant signups, their growth team described the same gap: they capture only an email and a company name, and more than half of their high-value merchants never appeared in their enrichment provider's database.

That sparse record feeds a pipeline where sales cannot tell a solo developer from a Fortune 500 procurement lead. This guide covers how to build the enrichment waterfall, what coverage to expect by input type, and how to score and route signups based on match confidence.

What Identity Resolution Means at Signup

Most content on identity resolution describes a marketing concept, covering topics like stitching anonymous website visits into a unified customer profile, building identity graphs across devices, or matching ad impressions to conversions. Those are real problems, but none of them apply when someone fills out your signup form.

Signup identity resolution is narrower. You have a form submission with one to three fields. You need a company record back in under two seconds. The trigger is a user arriving, the latency budget is measured in single-digit seconds, and the output has to be good enough to drive an immediate routing decision, whether that means sending the lead to sales, to self-serve onboarding, or to a manual review queue.

The distinction matters because the constraints are different from other enrichment patterns. CRM enrichment runs nightly on a known set of records, and outbound prospecting enriches lists you chose in advance, while signup enrichment has to resolve an arbitrary email or domain, fill in missing fields, score the result, and write the output back while the user is still on the page.

Why Sparse Input Breaks Standard Enrichment

Enrichment providers are built to work with work emails. Hand them janemarley@starlabs.com, for example, and they can resolve the domain, match the person, and return a full company record with headcount, funding stage, and technology stack. Hand them jane.m@gmail.com and the same provider returns nothing.

Published benchmarks on a 3,200-record independent test show personal email enrichment match rates between 12% and 28%, while work email enrichment typically lands between 80% and 98%. That is a three-to-four-times performance drop when switching from domain-based input to a personal address.

PLG products hit this gap harder than any other B2B motion because they optimize signup forms for conversion. Asking for a work email kills conversion rates. Every field added to a signup form reduces conversions, so most PLG products capture only a name and an email, and the email is personal more often than not.

Coverage Rates by Input Type

The table below shows enrichment coverage across different input combinations. The first two rows are based on published independent benchmarks. Rows three through five are directional estimates based on what PLG teams we spoke with reported from their own enrichment pipelines:

Input Available	Person Match Rate	Company Match Rate	Notes
Work email only	50-75%	90-98% (domain lookup)	Company resolution is near-certain from domain. Person match depends on database coverage of smaller companies.
Personal email only	12-28%	5-15%	Worst case. No domain signal. Provider must cross-reference against people databases.
Name + personal email	30-45%	20-35%	Name disambiguates common emails. Match rate improves with uncommon names.
Name + company (free text)	55-75%	60-80%	Company field enables fuzzy company match, then person lookup within company.
Name + personal email + role or company	50-70%	45-65%	Combined fields lift coverage, but free-text company is noisy (abbreviations, misspellings).

The downstream cost of low coverage is quite signficant. An Outfunnel analysis of 8,300 B2B leads found that leads with enriched firmographic data converted at over 40%, while leads without enrichment data converted at 6-7%. When 60% of your signups arrive with personal emails and your single-provider enrichment resolves 20% of them, you are leaving roughly half of your total signup volume unscored, unrouted, and invisible to sales.

How to Build a Signup Enrichment Waterfall

The standard advice for improving enrichment coverage is "add more providers." That is correct but only partially true. A signup waterfall differs from an outbound waterfall in three ways: it runs in real time (sub-two-second target for in-app decisions), it handles personal emails as the primary input (not work emails), and it must produce a routing decision, not only a filled CRM record. The architecture has to account for all three.

Step 1: Classify the input before calling any API

Not every signup needs the same enrichment path. Classify the email type first:

Work email (domain differs from gmail.com, outlook.com, yahoo.com, and other freemail providers): Route to domain-based company enrichment. This is the fast path with the highest match rate.
Personal email with a name: Route to reverse email lookup with name as a disambiguation signal.
Personal email, no name: Route to email-only reverse lookup. Expect the lowest match rate, and queue for deferred enrichment.
SSO or social login (GitHub, Google Workspace, Twitter/X): Extract the social profile identifier. If GitHub, the profile often includes employer and location. If Google Workspace, the domain may be a company domain even though the user signed in via OAuth.

This classification step is a few lines of code, but it saves API credits by routing each signup to the enrichment path with the highest expected yield for its input type.

Step 2: Run first-pass enrichment

For work emails, call the company enrichment endpoint with the domain, then the people enrichment endpoint with the email:

# Company enrichment from work email domain
curl -X GET "https://api.crustdata.com/screener/company?company_domain=anysphere.dev&fields=headcount,funding_and_investment" \
 -H "Authorization: Token $AUTH_TOKEN"

# People enrichment from email
curl -X GET "https://api.crustdata.com/screener/person?business_email=florian@anysphere.dev" \
 -H "Authorization: Token $AUTH_TOKEN"

# Company enrichment from work email domain
curl -X GET "https://api.crustdata.com/screener/company?company_domain=anysphere.dev&fields=headcount,funding_and_investment" \
 -H "Authorization: Token $AUTH_TOKEN"

# People enrichment from email
curl -X GET "https://api.crustdata.com/screener/person?business_email=florian@anysphere.dev" \
 -H "Authorization: Token $AUTH_TOKEN"

# Company enrichment from work email domain
curl -X GET "https://api.crustdata.com/screener/company?company_domain=anysphere.dev&fields=headcount,funding_and_investment" \
 -H "Authorization: Token $AUTH_TOKEN"

# People enrichment from email
curl -X GET "https://api.crustdata.com/screener/person?business_email=florian@anysphere.dev" \
 -H "Authorization: Token $AUTH_TOKEN"

For personal emails with a name, call the people enrichment endpoint with first name, last name, and email. The API cross-references these fields against its people database to find the matching professional profile:

curl -X GET "https://api.crustdata.com/screener/person?first_name=Florian&last_name=Martens&personal_email=florian.m@gmail.com" \
 -H "Authorization: Token $AUTH_TOKEN"

curl -X GET "https://api.crustdata.com/screener/person?first_name=Florian&last_name=Martens&personal_email=florian.m@gmail.com" \
 -H "Authorization: Token $AUTH_TOKEN"

curl -X GET "https://api.crustdata.com/screener/person?first_name=Florian&last_name=Martens&personal_email=florian.m@gmail.com" \
 -H "Authorization: Token $AUTH_TOKEN"

If this returns a match with a current employer, you now have a company domain. Run the company enrichment call on that domain to fill in firmographics.

MCP path: If you are building with Claude Code or another MCP-compatible agent runtime, you can configure the Crustdata MCP server and call these same enrichment endpoints as tool calls without writing HTTP client code. A Claude Code agent with Crustdata's MCP server configured can run the full classify-enrich-score pipeline in a single conversation turn.

Step 3: Run second-pass enrichment on partial matches

First-pass enrichment often returns a person with no current employer, or a company with no headcount data. Before routing, run a second pass:

If you have a person but no company, check the person's most recent employer from their work history. Enrich that company.
If you have a company but no person, use the company's decision-maker data (returned by the company enrichment endpoint with the decision_makers field) to find the person.
If you have a company domain but firmographic fields are thin (no headcount, no funding data), pass the domain to the company enrichment endpoint with enrich_realtime=true to trigger live lookup of data from the web instead of a database lookup.

Step 4: Web search fallback for emerging companies

Some companies are too new, too small, or too niche to appear in any enrichment database. A fintech growth team told us that mainstream enrichment providers covered roughly 40% of their signups because many were emerging startups not yet indexed anywhere. For this long tail, a web search API call using the company name (if provided) or the person's name and email domain can surface a company website, a press mention, or a social profile that gives you enough signal to route the lead.

Web search should only run as a fallback when the first two passes return nothing, and only for the subset of signups where the email domain suggests a real company rather than freemail. It adds latency (5-10 seconds) and returns unstructured results that require parsing.

How to Score and Route Based on Match Confidence

The enrichment response tells you what it found, but it leaves you to determine how much to trust the result. A response that returns a company name but no headcount, no funding data, and no verified person match is a weaker signal than a response with a full firmographic profile and a confirmed person record.

Build a confidence tier and route accordingly:

Confidence Tier	What the enrichment returned	Routing action
Tier 1: Full match	Company identified with headcount, funding, and industry. Person confirmed at company with title and seniority.	Score against ICP. If score passes threshold, route to AE or BDR within minutes.
Tier 2: Partial match	Company identified but firmographic data is thin (no headcount or funding). Person matched but current employer unconfirmed.	Route to self-serve onboarding. Queue for re-enrichment after 7 days (company data may update). Flag for sales if product engagement signals fire.
Tier 3: Zero match	No company resolved. Only the person's name and email are known.	Route to product-led onboarding with no sales touch. Re-enrich after first product action reveals more context (workspace name, team invite domain, billing email).

A GTM engineer rebuilding lead scoring at a B2B infrastructure company described the data dependency plainly: "Funding info is quite frequently missing or wrong. And that's one of our bigger indicators in terms of if we're interested in talking to a company." When a single field is missing, the score changes. Routing on a score built from incomplete data creates false negatives (enterprise accounts routed to self-serve) and false positives (solo developers routed to AEs).

The confidence tier avoids this by separating data completeness from ICP fit. A Tier 2 lead might be a perfect ICP match if the missing headcount data were filled in. Routing it to self-serve with a re-enrichment trigger preserves the opportunity without wasting sales time on a lead that might also be a solo hobbyist.

What to Do When Enrichment Returns Nothing

Zero-match signups fall into two categories, and each requires a different response.

Emerging companies not yet indexed. A CIAM platform running a startup program told us they were stitching together ten or more data providers and still could not identify high-potential startups at the point of free-tier signup because the companies had not been indexed by any provider yet. For these, deferred enrichment is the right approach. Store the signup, let the user engage with the product, and re-enrich after 30 days. Emerging companies add employees, publish websites, and appear in funding databases over time.

Personal emails with no company signal. Some users sign up with throwaway or purely personal emails and provide no other identifying information. For these, progressive profiling works better than upfront enrichment. After the user hits a value moment in the product (creates a workspace, invites a teammate, integrates a tool), ask for a company name or work email in context. The ask feels natural when it is tied to a product action the user just completed.

Start Resolving Signup Identity Today

PLG teams that convert free users into pipeline resolve identity at signup rather than waiting for a human to research each lead. The architecture is a classification step, a multi-pass enrichment waterfall, a confidence tier, and a routing rule, and none of it requires a CDP, an identity graph vendor, or a six-month implementation.

Sign up for Crustdata's free tier (100 credits included) to test the People Enrichment and Company Enrichment APIs against your own signup data. For high-volume signup enrichment (10,000+ signups per month), book a demo to walk through waterfall architecture and pricing at scale.

Manmohit covers real-time data infrastructure and intelligence layers for sales, recruiting, and investment platforms. At Crustdata, he leads engineering, transforming live data changes like job moves, hiring spikes, funding events and more into structured, reliable APIs that product teams can use to build automated workflows.