Generative Custom Sensitive Topic Detection
Uses a generative LLM to classify a conversation against a set of sensitive-topic categories, and flags whether it is safe. Use it to detect sensitive or unsafe content in a conversation.
What it does
- Sends the conversation transcript to an LLM with your category definitions.
- Returns whether the conversation is safe (no unsafe categories matched).
- Returns the list of any unsafe categories detected.
Inputs
| Name | Type | Description |
|---|---|---|
| Categories | Optional List | List of category objects to detect. Each category must be an object with 'id' (e.g., 'S1', 'S2') and 'description' (the category definition with examples). If not provided, default categories based on Llama Guard 2 will be used: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-guard-2/ |
| Conversation Text | String | String containing entire transcript of conversation, ideally where each line begins with either customer: or ai: |
Outputs
| Name | Type | Description |
|---|---|---|
| Is Safe | Boolean | true if no unsafe categories were detected, false otherwise. |
| Categories | String | List of unsafe categories detected, if any. |
Notes
- If no categories are provided, a default set based on Llama Guard 2 is used.
- For best results, format the transcript with each line prefixed by
customer:orai:.