Content Detection

CID222 automatically detects sensitive content in all requests. This page documents the detection results format and available endpoints for querying detection logs.

Detection Result Format

Every chat completion includes a detections field showing what was found and how it was handled:

{
  "detections": {
    "input": [
      {
        "entity_type": "EMAIL",
        "text": "john@example.com",
        "start": 25,
        "end": 41,
        "confidence": 0.99,
        "action": "MASK",
        "masked_text": "[EMAIL]"
      }
    ],
    "output": []
  }
}

Supported Entity Types

Type	Description	Examples
`PERSON`	Person names	John Smith, Dr. Jane Doe
`EMAIL`	Email addresses	user@domain.com
`PHONE`	Phone numbers	+1-555-123-4567
`SSN`	Social Security Numbers	123-45-6789
`CREDIT_CARD`	Credit card numbers	4111-1111-1111-1111
`IBAN`	Bank account numbers	DE89370400440532013000
`IP_ADDRESS`	IP addresses	192.168.1.1, 2001:db8::1
`LOCATION`	Physical addresses	123 Main St, New York
`DATE_OF_BIRTH`	Birth dates	01/15/1990
`MEDICAL_ID`	Medical record numbers	MRN-12345678

Detection Actions

Action	Description
`MASK`	Replace with placeholder (e.g., [EMAIL]). Request proceeds with masked content.
`REJECT`	Block the entire request. Returns 403 error.
`FLAG`	Log the detection but allow request to proceed unchanged.

Safety Categories

Beyond PII, CID222 detects harmful content categories:

Category	Description
`TOXIC_CONTENT`	Profanity, abuse, harassment
`HATE_SPEECH`	Discriminatory content targeting protected groups
`SEXUAL_CONTENT`	Adult or explicit material
`VIOLENCE`	Violent content or threats
`JAILBREAK`	Prompt injection attempts

Query Detection Logs

GET /admin/detections

This endpoint requires admin permissions.

Parameter	Type	Description
`entity_type`	string	Filter by entity type
`action`	string	Filter by action taken
`start_date`	string	Filter from date (ISO 8601)
`end_date`	string	Filter to date (ISO 8601)

Query Detections

curl -X GET "https://api.cid222.ai/admin/detections?entity_type=EMAIL&action=MASK" \
  -H "Authorization: Bearer YOUR_API_KEY"

Detection Statistics

GET /admin/detections/stats

Get aggregate statistics on detections:

Response

{
  "total_detections": 15420,
  "by_entity_type": {
    "EMAIL": 5230,
    "PHONE": 3150,
    "PERSON": 4890,
    "CREDIT_CARD": 2150
  },
  "by_action": {
    "MASK": 14200,
    "REJECT": 820,
    "FLAG": 400
  },
  "period": {
    "start": "2024-01-01T00:00:00Z",
    "end": "2024-01-31T23:59:59Z"
  }
}

Confidence Scores

Each detection includes a confidence score between 0 and 1:

> 0.9 — High confidence, action applied automatically
0.7 - 0.9 — Medium confidence, may require review
< 0.7 — Low confidence, typically flagged only

Confidence thresholds are configurable per filter. See Content Safety Pipeline for details.