Regex
Extract entities using a regex pattern.
AI_ENTITY_EXTRACTION
Extract entities from text using a regular expression (regex) pattern.
Overview
The Regex feature enables the extraction of specific entities from a text string by applying a regular expression pattern. This functionality is ideal for identifying structured data like product references, email addresses, or codes within unstructured text.
Inputs
Name | ID | Description |
---|---|---|
Text | text | The input text from which entities are extracted. |
Regex Pattern | regex | The regular expression pattern to search for. |
Outputs
Name | ID | Description |
---|---|---|
Success | success | Boolean indicating whether the extraction was successful. |
Entity | match | The extracted entity matching the regex pattern. |
Notes
- Ensure the regex pattern is properly formatted and tested for accuracy.
- Action ID for this operation:
b620331b-db42-4d53-82e2-caaf54c52ecc
.
Introduction to Regular Expressions (Regex)
Regular expressions (regex or regexp) are sequences of characters that define a search pattern. They are commonly used for string matching, searching, and text manipulation in programming, text processing, and data validation.
Basic Syntax
Literal Characters
Literal characters match themselves exactly. For example:
cat
matches the string "cat."
Metacharacters
Metacharacters have special meanings in regex:
Character | Description | |
---|---|---|
. | Matches any single character except newline | |
^ | Matches the start of a string | |
$ | Matches the end of a string | |
* | Matches 0 or more occurrences of the preceding element | |
+ | Matches 1 or more occurrences of the preceding element | |
? | Matches 0 or 1 occurrence of the preceding element | |
{} | Matches a specified number of occurrences | |
[] | Matches any character in the set | |
` | ` | Acts as an OR operator |
() | Groups patterns and captures matches | |
\ | Escapes metacharacters |
Character Classes
Character classes define a set of characters to match:
Pattern | Description |
---|---|
[abc] | Matches 'a', 'b', or 'c' |
[^abc] | Matches any character except 'a', 'b', or 'c' |
[a-z] | Matches any lowercase letter |
[A-Z] | Matches any uppercase letter |
[0-9] | Matches any digit |
\d | Matches any digit (same as [0-9] ) |
\D | Matches any non-digit character |
\w | Matches any word character (alphanumeric + _ ) |
\W | Matches any non-word character |
\s | Matches any whitespace character |
\S | Matches any non-whitespace character |
Quantifiers
Quantifiers define how many instances of a character or group to match:
Pattern | Description |
---|---|
* | Matches 0 or more occurrences |
+ | Matches 1 or more occurrences |
? | Matches 0 or 1 occurrence |
{n} | Matches exactly n occurrences |
{n,} | Matches n or more occurrences |
{n,m} | Matches between n and m occurrences |
Anchors
Anchors are used to match positions within a string:
Pattern | Description |
---|---|
^ | Matches the start of a string |
$ | Matches the end of a string |
\b | Matches a word boundary |
\B | Matches a non-word boundary |
Special Groups
Capturing Groups
Parentheses ()
are used to create capturing groups:
(abc)
captures the sequence "abc."
Non-Capturing Groups
Use (?: )
to group without capturing:
(?:abc)
matches "abc" but does not store the match.
Lookaheads and Lookbehinds
- Positive Lookahead:
(?=...)
ensures that the following pattern matches. - Negative Lookahead:
(?!...)
ensures that the following pattern does not match. - Positive Lookbehind:
(?<=...)
ensures that the preceding pattern matches. - Negative Lookbehind:
(?<!...)
ensures that the preceding pattern does not match.
Examples
Matching an Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Explanation:
^[a-zA-Z0-9._%+-]+
: Matches the username part.@[a-zA-Z0-9.-]+
: Matches the domain name.\.[a-zA-Z]{2,}$
: Matches the top-level domain (e.g., .com, .org).
Validating a Phone Number
^\+?[1-9]\d{1,14}$
Explanation:
^\+?
: Matches an optional plus sign.[1-9]
: Ensures the number doesn't start with 0.\d{1,14}
: Matches 1 to 14 digits.
Finding Duplicates
\b(\w+)\b(?=.*\b\1\b)
Explanation:
\b(\w+)\b
: Captures a word.(?=.*\b\1\b)
: Ensures the word appears again later.
Tools for Testing Regex
- Regex101 (https://regex101.com)
- RegExr (https://regexr.com)
- Debuggex (https://www.debuggex.com)
Conclusion
Regular expressions are powerful tools for text processing and validation. Mastering regex requires practice, but with understanding, they can greatly simplify complex string operations.
Updated 29 days ago