Generative Custom Sensitive Topic Detection

Uses a generative LLM to classify a conversation against a set of sensitive-topic categories, and flags whether it is safe. Use it to detect sensitive or unsafe content in a conversation.

What it does

  • Sends the conversation transcript to an LLM with your category definitions.
  • Returns whether the conversation is safe (no unsafe categories matched).
  • Returns the list of any unsafe categories detected.

Inputs

NameTypeDescription
CategoriesOptional ListList of category objects to detect. Each category must be an object with 'id' (e.g., 'S1', 'S2') and 'description' (the category definition with examples). If not provided, default categories based on Llama Guard 2 will be used: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-guard-2/
Conversation TextStringString containing entire transcript of conversation, ideally where each line begins with either customer: or ai:

Outputs

NameTypeDescription
Is SafeBooleantrue if no unsafe categories were detected, false otherwise.
CategoriesStringList of unsafe categories detected, if any.

Notes

  • If no categories are provided, a default set based on Llama Guard 2 is used.
  • For best results, format the transcript with each line prefixed by customer: or ai:.