REGEXP_SUBSTR

REGEXP_SUBSTR#

Returns a single substring occurrence from string_expression that matches a regular-expression pattern_expression. If no match exists, the function returns NULL.

Syntax#

REGEXP_SUBSTR
(
    string_expression,
    pattern_expression [ , start [ , occurrence [ , flags [ , group ] ] ] ]
)

Arguments#

string_expression

An expression of a character string.

Can be a constant, variable, or column of character string.

Data types: char, nchar, varchar, or nvarchar.

pattern_expression

Regular expression pattern to match. Usually a text literal.

Data types: char, nchar, varchar, or nvarchar. pattern_expression supports a maximum character length of 8,000 bytes.

start

Specify the starting position for the search within the search string. Optional. Type is int or bigint.

The numbering is 1-based, meaning the first character in the expression is 1 and the value must be >= 1. If the start expression is less than 1, returns error. If the start expression is greater than the length of string_expression, the function returns NULL. The default is 1.

occurrence

An expression (positive integer) that specifies which occurrence of the pattern expression within the source string to be searched or replaced. Default is 1. Searches at the first character of the string_expression. For a positive integer n, it searches for the nth occurrence beginning with the first character following the first occurrence of the pattern_expression, and so forth.

flags

One or more characters that specify the modifiers used for searching for matches. Type is varchar or char, with a maximum of 30 characters.

For example, ims. The default is c. If an empty string (’ ‘) is provided, it will be treated as the default value (‘c’). Supply c or any other character expressions. If flag contains multiple contradictory characters, then the last character is used.

For example, if you specify ic the regex returns case-sensitive matching.

Supported flags

Flag

Meaning

i

Case-insensitive matching (default: off).

m

Multi-line mode; ^ and $ match line boundaries as well as text boundaries (default: off).

s

Dot-all mode; . can match a newline (n) (default: off).

c

Case-sensitive matching (default: on).

group

Specifies which capture group (subexpression) of a pattern_expression determines the position within string_expression to return. The capture group (subexpression) is a fragment of pattern enclosed in parentheses and can be nested.

The capture groups are numbered in the order in which their left parentheses appear. The data type of group is int and the value must be greater than or equal to 0, and must not be greater than the number of capture groups (subexpressions) in pattern_expression. The default group value is 0, which indicates that the position is based on the string that matches the entire pattern.

Return types#

Returns string.

Examples#

Extract the domain name from an email address.

SELECT REGEXP_SUBSTR(EMAIL, '@(.+)$', 1, 1, 'i', 1) AS DOMAIN
  FROM CUSTOMERS;

Find the first word in a sentence that starts with a vowel.

SELECT REGEXP_SUBSTR(COMMENT, '\b[aeiou]\w*', 1, 1, 'i') AS WORD
  FROM FEEDBACK;

Get the last four digits of a credit card number.

SELECT REGEXP_SUBSTR(CARD_NUMBER, '\d{4}$') AS LAST_FOUR
  FROM PAYMENTS;

See also#