c1v-id

Identity resolution for AI applications

AI agents that interact with customers, CRMs, or any system of record face a critical decision point: is this person already in our system, or should we create a new record? Because both input and existing data are often messy, agents can confuse customer records, pollute data with duplicates, or deliver poor customer experiences.

c1v-id is an open-source identity resolution library that sits between the agent and the system of record, answering identity queries in milliseconds. It uses probabilistic record linkage with blocking strategies (~O(n) vs naive O(n²)), weighted multi-field scoring, and transitive clustering. Designed as a drop-in for LangChain agents, n8n workflows, and RAG pipelines. Zero ML dependencies. Configurable survivorship rules.

Use Cases

AI Agents: Check if a customer exists before creating a new record
CRM Deduplication: Merge duplicate contacts from multiple sources
Lead Routing: Match incoming leads to existing opportunities
Customer Support: Find customer context across fragmented records
Data Migration: Deduplicate when merging systems

vs. Enterprise CDPs (Segment, mParticle)

	c1v-id	Enterprise CDP
Cost	Free	$100K+/year
Data Location	Your infrastructure	Their cloud
Customization	Full control	Limited
Integration	Any Python app	Vendor lock-in

Enterprise CDPs solve identity as part of a larger platform. c1v-id gives you just the identity resolution piece to embed anywhere.

Core Concepts

Concept	What It Does	Why It Matters
Normalization	Cleans emails, phones, names	`[email protected]` → `[email protected]`
Blocking	Groups likely matches	Reduces O(n²) to ~O(n)
Scoring	Calculates similarity	Weighted fuzzy matching across fields
Clustering	Groups transitive matches	If A≈B and B≈C, then A∈C
Golden Records	Merges duplicates	Best value wins per survivorship rules

Installation

pip install c1v-id

Quick Start

Resolve duplicates in 10 lines of Python:

from c1v_id import IdentityResolver

resolver = IdentityResolver()

records = [
    {"email": "[email protected]", "name": "John Doe", "phone": "555-1234"},
    {"email": "[email protected]", "name": "J. Doe", "phone": "555-1234"},
    {"email": "[email protected]", "name": "Jane Smith"},
]

golden = resolver.resolve(records)
print(f"Input: {len(records)} records → Output: {len(golden)} golden records")
# Input: 3 records → Output: 2 golden records

Match Two Records

result = resolver.match(
    {"email": "[email protected]", "name": "John"},
    {"email": "[email protected]", "name": "Johnny"}
)

print(result.score)       # 0.97
print(result.decision)    # 'auto_merge'
print(result.matched_on)  # ['email', 'name']

Find Matches in Existing Data

incoming = {"email": "[email protected]", "name": "John"}
existing = [
    {"id": "1", "email": "[email protected]", "name": "John Doe"},
    {"id": "2", "email": "[email protected]", "name": "Jane Doe"},
]

matches = resolver.find_matches(incoming, existing)
# Returns best matches sorted by score

Custom Configuration

from c1v_id import IdentityResolver, ResolverConfig, Thresholds, Weights

config = ResolverConfig(
    thresholds=Thresholds(auto_merge=0.95, needs_review=0.8),
    weights=Weights(email=0.6, phone=0.3, name=0.1, address=0.0),
)

resolver = IdentityResolver(config=config)

Why c1v-id?

vs. Splink

	c1v-id	Splink
Hello World	10 lines	50+ lines
Target	AI builders	Data analysts
Setup	`pip install`	Spark/DuckDB config
ML Required	No	Optional
Use Case	Real-time matching	Batch analytics

Splink is powerful for large-scale data linkage projects with dedicated analysts. c1v-id is for developers who need identity resolution as a feature, not a project.

vs. dedupe

	c1v-id	dedupe
Maintenance	Active	Stale (2+ years)
Dependencies	3 (pandas, rapidfuzz, pyyaml)	10+
Learning Curve	Minimal	Requires training data
API Style	`resolve(records)`	Iterative labeling

dedupe requires interactive labeling to train a model. c1v-id works out of the box with sensible defaults.

Low-Level API

For custom pipelines, use the building blocks directly:

Normalization

from c1v_id import norm_email, norm_phone, norm_name

norm_email("[email protected]")  # '[email protected]'
norm_phone("(555) 123-4567")          # '5551234567'
norm_name("  JOHN   DOE  ")           # 'john doe'

Blocking

from c1v_id import email_domain_last4, phone_last7, make_blocks

email_domain_last4("[email protected]")  # 'gmail.com|john'
phone_last7("555-123-4567")           # '1234567'

blocks = make_blocks(df, ["email_domain_last4", "phone_last7"])

Clustering

from c1v_id import UnionFind

uf = UnionFind([1, 2, 3, 4, 5])
uf.union(1, 2)
uf.union(2, 3)
uf.find(1) == uf.find(3)  # True (transitive)
uf.get_clusters()         # {1: [1, 2, 3], 4: [4], 5: [5]}

Golden Records

from c1v_id import build_golden_records, SurvivorshipRule

rules = {
    "email": SurvivorshipRule.MOST_RECENT,
    "address": SurvivorshipRule.LONGEST,
    "first": SurvivorshipRule.FIRST_NON_NULL,
}

golden = build_golden_records(df, clusters, rules, source_priority=["crm", "web"])

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
.planning		.planning
src/c1v_id		src/c1v_id
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

c1v-id

Use Cases

vs. Enterprise CDPs (Segment, mParticle)

Core Concepts

Installation

Quick Start

Match Two Records

Find Matches in Existing Data

Custom Configuration

Why c1v-id?

vs. Splink

vs. dedupe

Low-Level API

Normalization

Blocking

Clustering

Golden Records

License

About

Uh oh!

Releases 1

Packages

Languages

License

davidancor/c1v-id

Folders and files

Latest commit

History

Repository files navigation

c1v-id

Use Cases

vs. Enterprise CDPs (Segment, mParticle)

Core Concepts

Installation

Quick Start

Match Two Records

Find Matches in Existing Data

Custom Configuration

Why c1v-id?

vs. Splink

vs. dedupe

Low-Level API

Normalization

Blocking

Clustering

Golden Records

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages