ATS Integration: How to Enrich Your Own Candidate Database via API

How to fill, govern, and refresh candidate records in your own ATS from an enrichment API, including the schema flags and retention lifecycle that keep you compliant.

Published

Jun 12, 2026

Written by

Manmohit Grewal

Reviewed by

Nithish

Read time

minutes

ATS Integration: How to Enrich Your Own Candidate Database via API

The five recruiting-software teams we spoke with all built their own applicant tracking system, and all of them hit the same three problems. They had to fill each candidate record with real data, keep it fresh as people moved jobs, and avoid paying to enrich the same person twice. None of them wanted a vendor's enrichment feature, and none of them were wiring into someone else's ATS. They were calling an enrichment API straight into a database they owned.

That is the version of "ats integration" this guide is about. It means filling and governing a candidate record in your own system from a candidate enrichment API, which is a different job from connecting to a Greenhouse or a Lever or switching on a vendor's enrichment feature. The generic mechanics of backfilling a database, choosing write strategies, and setting up watchers are covered in other guides, and this one links to them. Here the focus is the part that is specific to a candidate record, including the compliance design that a recruiting product needs and a sales CRM does not. You can follow along on Crustdata's free tier, which includes 100 credits.

What to fill on a candidate record, and what to confirm

A candidate record is only as useful as the data on it. One team described how their tool reads the open job post, then uses it to fill the candidate's fields from the API. The fields worth filling at intake are the ones a recruiter scores against: skills, employment history, education, the headline, and location. A person enrichment call turns a name or a profile URL into exactly that structured record.

curl -X POST 'https://api.crustdata.com/person/search' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filters": [
      {"field": "experience.employment_details.current.title", "type": "(.)", "value": "Insurance Advisor"},
      {"field": "location", "type": "in", "value": ["Netherlands"]}
    ],
    "fields": ["basic_profile", "professional_network", "education"],
    "limit": 25
  }'

curl -X POST 'https://api.crustdata.com/person/search' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filters": [
      {"field": "experience.employment_details.current.title", "type": "(.)", "value": "Insurance Advisor"},
      {"field": "location", "type": "in", "value": ["Netherlands"]}
    ],
    "fields": ["basic_profile", "professional_network", "education"],
    "limit": 25
  }'

curl -X POST 'https://api.crustdata.com/person/search' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filters": [
      {"field": "experience.employment_details.current.title", "type": "(.)", "value": "Insurance Advisor"},
      {"field": "location", "type": "in", "value": ["Netherlands"]}
    ],
    "fields": ["basic_profile", "professional_network", "education"],
    "limit": 25
  }'

Each profile comes back with the employment history, the normalized title, education, and the identity fields, which map cleanly onto the columns you already have. The fields array lets you request only the sections you store, so you are not paying to hydrate data you will throw away. The mapping is usually one to one:

Enrichment response	Candidate record column	Auto-fill or confirm
`experience.employment_details`	work_history	auto-fill
`basic_profile.normalized_title`	current_title	auto-fill
`education.schools`	education	auto-fill
`location`	location	auto-fill
headline / summary	headline	auto-fill
profile photo, and any sensitive attribute	photo	confirm

Store the raw enrichment payload alongside the mapped columns. When a field you care about gets added later, you can backfill from what you already pulled instead of paying for a second call.

One field deserves a second thought before you auto-fill it. A profile photo carries bias weight in hiring, so several teams treat it and other sensitive attributes as something a recruiter sees and confirms rather than something the system writes silently. The rule of thumb is to auto-fill the factual career data, and to surface anything that could shape a decision for a human to accept. For the full catalog of fields an enrichment API can return, see our breakdown of what a candidate enrichment API should give you.

The write path: enrich on a human-confirmed add, and the cost it saves

The cheapest enrichment is the one you never run. In a recruiting product you do not enrich every search result, because most of them never become candidates. You enrich at the moment a recruiter confirms a candidate into a pipeline. That single decision about when to write is also the decision that controls your bill.

So the write path looks like this. Run discovery against the in-database search, which is cheap at 0.03 credits per result, and show the recruiter a list. When they click to add someone, enrich that one person, store the record, and flag it as sourced through your tool. Fetch contact data only later, the moment a recruiter actually wants to reach out, because email and phone are billed separately and you should pay for them only when you use them.

The teams that ran this at scale added one more thing. One of them put it plainly, that they built their own candidate database on the backend so they did not have to spend tokens twice, and a repeated search returns the same already-enriched results. The mechanism is the stable identifier. Every record carries a crustdata_person_id, so before you enrich, you check whether you already hold that id and skip the call if you do.

def enrich_on_add(person_id, db):
    existing = db.get_candidate(person_id)
    if existing and existing.enriched_within_days(30):
        return existing  # already fresh, no second charge

    profile = crustdata.person_enrich(person_id)   # billed once
    return db.upsert_candidate(person_id, profile, ai_sourced=True)

def enrich_on_add(person_id, db):
    existing = db.get_candidate(person_id)
    if existing and existing.enriched_within_days(30):
        return existing  # already fresh, no second charge

    profile = crustdata.person_enrich(person_id)   # billed once
    return db.upsert_candidate(person_id, profile, ai_sourced=True)

def enrich_on_add(person_id, db):
    existing = db.get_candidate(person_id)
    if existing and existing.enriched_within_days(30):
        return existing  # already fresh, no second charge

    profile = crustdata.person_enrich(person_id)   # billed once
    return db.upsert_candidate(person_id, profile, ai_sourced=True)

Reserve the more expensive real-time profile for the shortlist that needs the freshest data, and never re-enrich a record you are about to delete under your retention rules. That last point is where cost and compliance meet, which is the next section.

Governing it: compliance by design

This is the part no vendor page and no AI-written article can reproduce, because it only exists once a real team builds it under real rules. One team operating in the Netherlands described how European law turned into columns in their database. Recruitment AI is treated as high-risk under the EU AI Act, and a candidate has the right under GDPR Article 22 not to be subject to a decision based solely on automated processing. For a builder, that translates into three concrete things. None of this is legal advice, so check your own counsel, but the engineering pattern is clear.

Prove a human made the call: the AI can find and rank candidates, but a person has to confirm each one into the pipeline, and you have to be able to show it. That is two columns, confirmed_by_user_id and confirmed_at, written when the recruiter clicks. They are your evidence that a human, not the model, made the decision.

Store the AI's output as a score the recruiter reads: the team had a hard rule that the AI may not say a candidate is not good enough. It can surface and it can score, but the negative decision belongs to a human. So you keep the model's verdict in a match_score column, a number the recruiter reads, and you never let it write a rejected status. A score is a suggestion, and a status is a decision, and keeping them apart is what keeps you out of solely-automated territory.

Run retention as a scheduled delete job: you cannot hold candidate data forever. The team's rule was that a sourced person who is not contacted within a few weeks is purged, and contact data for someone you did reach is time-boxed to about a year. In the schema that is a sourced_at and a contacted_at timestamp, plus a scheduled job that deletes anything past those windows. Retention stops being a promise and becomes a cron job.

These flags also feed the explainability the law asks for. When you can show who confirmed a candidate, when, and what score the AI gave, you can answer a candidate's request to understand how they were handled.

Keeping the record fresh

A candidate database goes out of date the moment you build it, because people change jobs and locations. The teams that cared about this did not re-pull every record on a schedule, which is slow and expensive. They watched for the change. A watcher on your existing talent tells you when someone shifts roles or signals they are open, and only then do you refresh that record and re-match them to a live vacancy. One team wanted exactly this so recruiters would see a change on a candidate they already had, instead of forgetting that person and starting a fresh search.

The point is to update on the delta instead of starting over. Refresh by last-verified date rather than on a blind timer, so an active candidate in an open pipeline gets checked often while a dormant record sits untouched until something moves. The watcher and webhook mechanics are their own topic, and our guide to the best candidate enrichment APIs covers where monitoring fits in the stack.

Conclusion

An ATS is only as good as the data inside it, and the teams getting this right treat enrichment as an architecture rather than a feature. Three things carry most of the value:

Enrich at the confirmed add, not on every search result, and cache by crustdata_person_id so you never pay twice.
Govern the record with real flags. A confirmed_by_user_id, a match_score that is a score and not a status, and a retention job driven by two timestamps.
Refresh on the delta. Watch existing talent and re-enrich only what actually changed.

This week, map your candidate schema to the enrichment response and decide which fields auto-fill and which a human confirms. This month, add the ai_sourced and confirmation flags and the retention job, so compliance is built in rather than bolted on later. You can do the same for the company records your recruiters work against, and the enrichment API guide shows how the company side fits. Crustdata is the data layer for teams building their own recruiting product, and you can wire it in on the free tier with 100 credits or book a demo to walk through the architecture.

Frequently asked questions

What fields can I fill on a candidate record from an enrichment API? Skills, employment history, education, normalized title, headline, and location come back as structured data from a person enrichment call. Contact data like email and phone is fetched separately and billed per data point, so you request it only when a recruiter is ready to reach out.

How is enrichment different from an ATS integration? An ATS integration usually means connecting to an external system like Greenhouse or Lever to read and write records. Enrichment means filling the records in your own database with fresh external data. This guide is about the second one, building enrichment into an ATS you own.

Do I need Greenhouse or Lever, or can I enrich my own database? You can enrich your own database directly. Each of the teams we spoke with called the enrichment API straight into a database they built, with no third-party ATS in the path. You only need a webhook bridge if you are enriching records that live in someone else's system.

How do I stay compliant when enriching candidates automatically in the EU? Keep a human in the loop on every add, store the AI's output as a score rather than a decision, and run a retention job that purges data on a schedule. Recruitment AI is high-risk under the EU AI Act, and GDPR Article 22 gives candidates the right not to be judged by a solely automated process. Confirm the specifics with your own counsel.

How do I avoid paying to enrich the same person twice? Cache on the stable identifier. Store the crustdata_person_id on each record, check for it before you enrich, and skip the call if you already hold a recent copy. Use cheap in-database search for discovery and reserve real-time enrichment for the shortlist.

When should I refresh a candidate record? Refresh by activity, not on a fixed timer. A candidate sitting in an open pipeline is worth checking often, while a dormant record can wait until a signal moves, such as a job change or an open-to-work flag. Watching for that change is cheaper than re-pulling every profile on a schedule, and it keeps your spend tied to records a recruiter is actually working.

Manmohit covers real-time data infrastructure and intelligence layers for sales, recruiting, and investment platforms. At Crustdata, he leads engineering, transforming live data changes like job moves, hiring spikes, funding events and more into structured, reliable APIs that product teams can use to build automated workflows.