Generative Custom Sensitive Topic Detection

Uses a generative LLM to classify a conversation against a set of sensitive-topic categories, and flags whether it is safe. Use it to detect sensitive or unsafe content in a conversation.

What it does

Sends the conversation transcript to an LLM with your category definitions.
Returns whether the conversation is safe (no unsafe categories matched).
Returns the list of any unsafe categories detected.

Inputs

Name	Type	Description
Categories	Optional List	List of category objects to detect. Each category must be an object with 'id' (e.g., 'S1', 'S2') and 'description' (the category definition with examples). If not provided, default categories based on Llama Guard 2 will be used: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-guard-2/
Conversation Text	String	String containing entire transcript of conversation, ideally where each line begins with either customer: or ai:

Outputs

Name	Type	Description
Is Safe	Boolean	true if no unsafe categories were detected, false otherwise.
Categories	String	List of unsafe categories detected, if any.

Notes

If no categories are provided, a default set based on Llama Guard 2 is used.
For best results, format the transcript with each line prefixed by customer: or ai:.

Updated 18 days ago

Did this page help you?