Definition: Identify PII spans within a sample (both input and output). The current model detects the following precisely defined categories:

  • Account Information: account numbers, BIC and IBAN.

  • Address: must contain at least a street name and number, and may contain extra elements such as city, zip code, state, etc.

  • Credit Card: credit card number, CVV and expiration date.

  • Date of Birth: must contain a day, month and year.

  • Email.

  • Name: must contain first and last name (and optionally middle name).

  • Network Information: IPv4, IPv6 and MAC addresses.

  • Personal Identification: personal IDs not included in other categories. In particular: PIN, IMEI, VIN, VRM, Driver license.

  • Password.

  • Phone Number.

  • Social Security Number (SSN).

  • Username.

Calculation: We leverage a Small Language Model (SML) trained on proprietary datasets.

Usefulness: Automatically identify PII occurrences in any part of the workflow (user input, chains, model output, etc), and respond accordingly by implementing guardrails or other preventative measures.

Explainability: To highlight which parts of the text were detected as PII, click on the

icon next to the PII metric value. The type of PII detected along with the model’s confidence will be shown on the input or output text.