String Processing
String Processing
RegexMatch
Usage
The function is used to fetch matched contents from text with given regular expression.
Name: REGEXMATCH
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
regex
: The regular expression to match in the text. All grammars supported by Java are acceptable,
for example,\d+\.\d+\.\d+\.\d+
is expected to match any IPv4 addresses.group
: The wanted group index in the matched result.
Reference to java.util.regex, group 0 is the whole pattern and
the next ones are numbered with the appearance order of left parentheses.
For example, the groups inA(B(CD))
are: 0-A(B(CD))
, 1-B(CD)
, 2-CD
.
Output Series: Output a single series. The type is TEXT.
Note: Those points with null values or not matched with the given pattern will not return any results.
Examples
Input series:
+-----------------------------+-------------------------------+
| Time| root.test.d1.s1|
+-----------------------------+-------------------------------+
|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
+-----------------------------+-------------------------------+
SQL for query:
select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1
Output series:
+-----------------------------+----------------------------------------------------------------------+
| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")|
+-----------------------------+----------------------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| 192.168.0.1|
|2021-01-01T00:00:02.000+08:00| 192.168.0.24|
|2021-01-01T00:00:03.000+08:00| 192.168.0.2|
|2021-01-01T00:00:04.000+08:00| 192.168.0.5|
|2021-01-01T00:00:05.000+08:00| 192.168.0.124|
+-----------------------------+----------------------------------------------------------------------+
RegexReplace
Usage
The function is used to replace the specific regular expression matches with given string.
Name: REGEXREPLACE
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
regex
: The target regular expression to be replaced. All grammars supported by Java are acceptable.replace
: The string to be put on and back reference notes in Java is also supported,
for example, '$1' refers to group 1 in theregex
which will be filled with corresponding matched results.limit
: The number of matches to be replaced which should be an integer no less than -1,
default to -1 which means all matches will be replaced.offset
: The number of matches to be skipped, which means the firstoffset
matches will not be replaced, default to 0.reverse
: Whether to count all the matches reversely, default to 'false'.
Output Series: Output a single series. The type is TEXT.
Examples
Input series:
+-----------------------------+-------------------------------+
| Time| root.test.d1.s1|
+-----------------------------+-------------------------------+
|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
+-----------------------------+-------------------------------+
SQL for query:
select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------------------------+
| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",|
| | "replace"="cluster-$1", "limit"="1")|
+-----------------------------+-----------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]|
+-----------------------------+-----------------------------------------------------------+
RegexSplit
Usage
The function is used to split text with given regular expression and return specific element.
Name: REGEXSPLIT
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
regex
: The regular expression used to split the text.
All grammars supported by Java are acceptable, for example,['"]
is expected to match'
and"
.index
: The wanted index of elements in the split result.
It should be an integer no less than -1, default to -1 which means the length of the result array is returned
and any non-negative integer is used to fetch the text of the specific index starting from 0.
Output Series: Output a single series. The type is INT32 when index
is -1 and TEXT when it's an valid index.
Note: When index
is out of the range of the result array, for example 0,1,2
split with ,
and index
is set to 3,
no result are returned for that record.
Examples
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00| B+,B,B|
|2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-,B,B|
+-----------------------------+---------------+
SQL for query:
select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1
Output series:
+-----------------------------+------------------------------------------------------+
| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")|
+-----------------------------+------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| 4|
|2021-01-01T00:00:02.000+08:00| 4|
|2021-01-01T00:00:03.000+08:00| 3|
|2021-01-01T00:00:04.000+08:00| 4|
|2021-01-01T00:00:05.000+08:00| 4|
+-----------------------------+------------------------------------------------------+
Another SQL for query:
SQL for query:
select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------------------+
| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| B-|
|2021-01-01T00:00:02.000+08:00| B+|
|2021-01-01T00:00:04.000+08:00| A|
|2021-01-01T00:00:05.000+08:00| B|
+-----------------------------+-----------------------------------------------------+
StrReplace
Usage
The function is used to replace the specific substring with given string.
Name: STRREPLACE
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
target
: The target substring to be replaced.replace
: The string to be put on.limit
: The number of matches to be replaced which should be an integer no less than -1,
default to -1 which means all matches will be replaced.offset
: The number of matches to be skipped, which means the firstoffset
matches will not be replaced, default to 0.reverse
: Whether to count all the matches reversely, default to 'false'.
Output Series: Output a single series. The type is TEXT.
Examples
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00| B+,B,B|
|2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-,B,B|
+-----------------------------+---------------+
SQL for query:
select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------+
| Time|strreplace(root.test.d1.s1, "target"=",",|
| | "replace"="/", "limit"="2")|
+-----------------------------+-----------------------------------------+
|2021-01-01T00:00:01.000+08:00| A/B/A+,B-|
|2021-01-01T00:00:02.000+08:00| A/A+/A,B+|
|2021-01-01T00:00:03.000+08:00| B+/B/B|
|2021-01-01T00:00:04.000+08:00| A+/A/A+,A|
|2021-01-01T00:00:05.000+08:00| A/B-/B,B|
+-----------------------------+-----------------------------------------+
Another SQL for query:
select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------------------+
| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= |
| | "|", "limit"="1", "offset"="1", "reverse"="true")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| A,B/A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+/A,B+|
|2021-01-01T00:00:03.000+08:00| B+/B,B|
|2021-01-01T00:00:04.000+08:00| A+,A/A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-/B,B|
+-----------------------------+-----------------------------------------------------+