PII Detection
CID222 automatically detects and protects Personally Identifiable Information (PII) and Protected Health Information (PHI) across 15+ entity types with configurable handling policies.
Supported Entity Types
Personal Identifiers
| Entity | Description | Detection Method |
|---|---|---|
PERSON | Full names, including titles | NER |
EMAIL | Email addresses | Regex + validation |
PHONE | Phone numbers (intl formats) | Regex + libphonenumber |
SSN | Social Security Numbers (US) | Regex + checksum |
DATE_OF_BIRTH | Birth dates | NER + context |
PASSPORT | Passport numbers | Regex by country |
DRIVER_LICENSE | Driver's license numbers | Regex by state/country |
Financial Data
| Entity | Description | Detection Method |
|---|---|---|
CREDIT_CARD | Credit/debit card numbers | Regex + Luhn |
IBAN | International bank accounts | Regex + checksum |
BANK_ACCOUNT | Domestic bank accounts | Regex + context |
TAX_ID | Tax identification numbers | Regex by country |
Location Data
| Entity | Description | Detection Method |
|---|---|---|
LOCATION | Physical addresses | NER + patterns |
IP_ADDRESS | IPv4 and IPv6 addresses | Regex |
GPS_COORDINATES | Latitude/longitude | Regex + validation |
Health Data (PHI)
| Entity | Description | Detection Method |
|---|---|---|
MEDICAL_ID | Medical record numbers | Regex + context |
HEALTH_CONDITION | Diseases, diagnoses | NER + medical NER |
MEDICATION | Drug names, dosages | NER + drug database |
Detection Accuracy
CID222 achieves high accuracy across entity types through a combination of methods:
| Entity Type | Precision | Recall | F1 Score |
|---|---|---|---|
| 99.5% | 99.8% | 99.6% | |
| CREDIT_CARD | 99.2% | 98.9% | 99.0% |
| PHONE | 97.8% | 96.5% | 97.1% |
| PERSON | 94.2% | 92.8% | 93.5% |
| LOCATION | 91.5% | 89.3% | 90.4% |
Masking Formats
Detected PII is replaced with type-specific placeholders:
Masking Examples
Original: "Contact john.smith@company.com or call 555-123-4567"Masked: "Contact [EMAIL] or call [PHONE]"Original: "My SSN is 123-45-6789 and credit card is 4111-1111-1111-1111"Masked: "My SSN is [SSN] and credit card is [CREDIT_CARD]"Original: "Send to John Smith at 123 Main St, New York"Masked: "Send to [PERSON] at [LOCATION]"
Custom Patterns
Add custom regex patterns for organization-specific identifiers:
Custom Pattern Example
{"name": "Employee ID","entity_type": "EMPLOYEE_ID","pattern": "EMP-[0-9]{6}","action": "MASK","confidence": 0.95}
Test custom patterns in the dashboard's Filter Testing tool before deploying to production.
Context Awareness
CID222's detection is context-aware to reduce false positives:
- Number sequences — "Call 911" won't trigger phone detection
- Code contexts — IP addresses in code comments may be allowed
- Example data — "example@example.com" can be exempted
- Business context — Company names vs. person names