Regex

Extract entities using a regex pattern.

AI_ENTITY_EXTRACTION

Extract entities from text using a regular expression (regex) pattern.

Overview

The Regex feature enables the extraction of specific entities from a text string by applying a regular expression pattern. This functionality is ideal for identifying structured data like product references, email addresses, or codes within unstructured text.

Inputs

Name	ID	Description
Text	`text`	The input text from which entities are extracted.
Regex Pattern	`regex`	The regular expression pattern to search for.

Outputs

Name	ID	Description
Success	`success`	Boolean indicating whether the extraction was successful.
Entity	`match`	The extracted entity matching the regex pattern.

Notes

Ensure the regex pattern is properly formatted and tested for accuracy.
Action ID for this operation: b620331b-db42-4d53-82e2-caaf54c52ecc.

Introduction to Regular Expressions (Regex)

Regular expressions (regex or regexp) are sequences of characters that define a search pattern. They are commonly used for string matching, searching, and text manipulation in programming, text processing, and data validation.

Basic Syntax

Literal Characters

Literal characters match themselves exactly. For example:

cat matches the string "cat."

Metacharacters

Metacharacters have special meanings in regex:

Character	Description
`.`	Matches any single character except newline
`^`	Matches the start of a string
`$`	Matches the end of a string
`*`	Matches 0 or more occurrences of the preceding element
`+`	Matches 1 or more occurrences of the preceding element
`?`	Matches 0 or 1 occurrence of the preceding element
`{}`	Matches a specified number of occurrences
`[]`	Matches any character in the set
`	`	Acts as an OR operator
`()`	Groups patterns and captures matches
`\`	Escapes metacharacters

Character Classes

Character classes define a set of characters to match:

Pattern	Description
`[abc]`	Matches 'a', 'b', or 'c'
`[^abc]`	Matches any character except 'a', 'b', or 'c'
`[a-z]`	Matches any lowercase letter
`[A-Z]`	Matches any uppercase letter
`[0-9]`	Matches any digit
`\d`	Matches any digit (same as `[0-9]`)
`\D`	Matches any non-digit character
`\w`	Matches any word character (alphanumeric + `_`)
`\W`	Matches any non-word character
`\s`	Matches any whitespace character
`\S`	Matches any non-whitespace character

Quantifiers

Quantifiers define how many instances of a character or group to match:

Pattern	Description
`*`	Matches 0 or more occurrences
`+`	Matches 1 or more occurrences
`?`	Matches 0 or 1 occurrence
`{n}`	Matches exactly n occurrences
`{n,}`	Matches n or more occurrences
`{n,m}`	Matches between n and m occurrences

Anchors

Anchors are used to match positions within a string:

Pattern	Description
`^`	Matches the start of a string
`$`	Matches the end of a string
`\b`	Matches a word boundary
`\B`	Matches a non-word boundary

Special Groups

Capturing Groups

Parentheses () are used to create capturing groups:

(abc) captures the sequence "abc."

Non-Capturing Groups

Use (?: ) to group without capturing:

(?:abc) matches "abc" but does not store the match.

Lookaheads and Lookbehinds

Positive Lookahead: (?=...) ensures that the following pattern matches.
Negative Lookahead: (?!...) ensures that the following pattern does not match.
Positive Lookbehind: (?<=...) ensures that the preceding pattern matches.
Negative Lookbehind: (?<!...) ensures that the preceding pattern does not match.

Examples

Matching an Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Explanation:

^[a-zA-Z0-9._%+-]+: Matches the username part.
@[a-zA-Z0-9.-]+: Matches the domain name.
\.[a-zA-Z]{2,}$: Matches the top-level domain (e.g., .com, .org).

Validating a Phone Number

^\+?[1-9]\d{1,14}$

Explanation:

^\+?: Matches an optional plus sign.
[1-9]: Ensures the number doesn't start with 0.
\d{1,14}: Matches 1 to 14 digits.

Finding Duplicates

\b(\w+)\b(?=.*\b\1\b)

Explanation:

\b(\w+)\b: Captures a word.
(?=.*\b\1\b): Ensures the word appears again later.

Tools for Testing Regex

Regex101 (https://regex101.com)
RegExr (https://regexr.com)
Debuggex (https://www.debuggex.com)

Conclusion

Regular expressions are powerful tools for text processing and validation. Mastering regex requires practice, but with understanding, they can greatly simplify complex string operations.

Updated 8 months ago