Content Detection
CID222 automatically detects sensitive content in all requests. This page documents the detection results format and available endpoints for querying detection logs.
Detection Result Format
Every chat completion includes a detections field showing what was found and how it was handled:
{"detections": {"input": [{"entity_type": "EMAIL","text": "john@example.com","start": 25,"end": 41,"confidence": 0.99,"action": "MASK","masked_text": "[EMAIL]"}],"output": []}}
Supported Entity Types
| Type | Description | Examples |
|---|---|---|
PERSON | Person names | John Smith, Dr. Jane Doe |
EMAIL | Email addresses | user@domain.com |
PHONE | Phone numbers | +1-555-123-4567 |
SSN | Social Security Numbers | 123-45-6789 |
CREDIT_CARD | Credit card numbers | 4111-1111-1111-1111 |
IBAN | Bank account numbers | DE89370400440532013000 |
IP_ADDRESS | IP addresses | 192.168.1.1, 2001:db8::1 |
LOCATION | Physical addresses | 123 Main St, New York |
DATE_OF_BIRTH | Birth dates | 01/15/1990 |
MEDICAL_ID | Medical record numbers | MRN-12345678 |
Detection Actions
| Action | Description |
|---|---|
MASK | Replace with placeholder (e.g., [EMAIL]). Request proceeds with masked content. |
REJECT | Block the entire request. Returns 403 error. |
FLAG | Log the detection but allow request to proceed unchanged. |
Safety Categories
Beyond PII, CID222 detects harmful content categories:
| Category | Description |
|---|---|
TOXIC_CONTENT | Profanity, abuse, harassment |
HATE_SPEECH | Discriminatory content targeting protected groups |
SEXUAL_CONTENT | Adult or explicit material |
VIOLENCE | Violent content or threats |
JAILBREAK | Prompt injection attempts |
Query Detection Logs
GET /admin/detections
This endpoint requires admin permissions.
| Parameter | Type | Description |
|---|---|---|
entity_type | string | Filter by entity type |
action | string | Filter by action taken |
start_date | string | Filter from date (ISO 8601) |
end_date | string | Filter to date (ISO 8601) |
Query Detections
curl -X GET "https://api.cid222.ai/admin/detections?entity_type=EMAIL&action=MASK" \-H "Authorization: Bearer YOUR_API_KEY"
Detection Statistics
GET /admin/detections/stats
Get aggregate statistics on detections:
Response
{"total_detections": 15420,"by_entity_type": {"EMAIL": 5230,"PHONE": 3150,"PERSON": 4890,"CREDIT_CARD": 2150},"by_action": {"MASK": 14200,"REJECT": 820,"FLAG": 400},"period": {"start": "2024-01-01T00:00:00Z","end": "2024-01-31T23:59:59Z"}}
Confidence Scores
Each detection includes a confidence score between 0 and 1:
- > 0.9 — High confidence, action applied automatically
- 0.7 - 0.9 — Medium confidence, may require review
- < 0.7 — Low confidence, typically flagged only
Confidence thresholds are configurable per filter. See Content Safety Pipeline for details.