String Processing
String Processing
STRING_CONTAINS
Function introduction
This function checks whether the substring s
exists in the string
Function name: STRING_CONTAINS
Input sequence: Only a single input sequence is supported, the type is TEXT.
parameter:
s
: The string to search for.
Output Sequence: Output a single sequence, the type is BOOLEAN.
Usage example
select s1, string_contains(s1, 's'='warn') from root.sg1.d4;
+-----------------------------+--------------+-------------------------------------------+
| Time|root.sg1.d4.s1|string_contains(root.sg1.d4.s1, "s"="warn")|
+-----------------------------+--------------+-------------------------------------------+
|1970-01-01T08:00:00.001+08:00| warn:-8721| true|
|1970-01-01T08:00:00.002+08:00| error:-37229| false|
|1970-01-01T08:00:00.003+08:00| warn:1731| true|
+-----------------------------+--------------+-------------------------------------------+
Total line number = 3
It costs 0.007s
STRING_MATCHES
Function introduction
This function judges whether a string can be matched by the regular expression regex
.
Function name: STRING_MATCHES
Input sequence: Only a single input sequence is supported, the type is TEXT.
parameter:
regex
: Java standard library-style regular expressions.
Output Sequence: Output a single sequence, the type is BOOLEAN.
Usage example
select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4;
+-----------------------------+--------------+------------------------------------------------------+
| Time|root.sg1.d4.s1|string_matches(root.sg1.d4.s1, "regex"="[^\\s]+37229")|
+-----------------------------+--------------+------------------------------------------------------+
|1970-01-01T08:00:00.001+08:00| warn:-8721| false|
|1970-01-01T08:00:00.002+08:00| error:-37229| true|
|1970-01-01T08:00:00.003+08:00| warn:1731| false|
+-----------------------------+--------------+------------------------------------------------------+
Total line number = 3
It costs 0.007s
Length
Usage
The function is used to get the length of input series.
Name: LENGTH
Input Series: Only support a single input series. The data type is TEXT.
Output Series: Output a single series. The type is INT32.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1|
|1970-01-01T08:00:00.002+08:00| 22test22|
+-----------------------------+--------------+
SQL for query:
select s1, length(s1) from root.sg1.d1
Output series:
+-----------------------------+--------------+----------------------+
| Time|root.sg1.d1.s1|length(root.sg1.d1.s1)|
+-----------------------------+--------------+----------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| 6|
|1970-01-01T08:00:00.002+08:00| 22test22| 8|
+-----------------------------+--------------+----------------------+
Locate
Usage
The function is used to get the position of the first occurrence of substring target
in input series. Returns -1 if there are no target
in input.
Name: LOCATE
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
target
: The substring to be located.reverse
: Indicates whether reverse locate is required. The default value isfalse
, means left-to-right locate.
Output Series: Output a single series. The type is INT32.
Note: The index begins from 0.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1|
|1970-01-01T08:00:00.002+08:00| 22test22|
+-----------------------------+--------------+
SQL for query:
select s1, locate(s1, "target"="1") from root.sg1.d1
Output series:
+-----------------------------+--------------+------------------------------------+
| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1")|
+-----------------------------+--------------+------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| 0|
|1970-01-01T08:00:00.002+08:00| 22test22| -1|
+-----------------------------+--------------+------------------------------------+
Another SQL for query:
select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1
Output series:
+-----------------------------+--------------+------------------------------------------------------+
| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1", "reverse"="true")|
+-----------------------------+--------------+------------------------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| 5|
|1970-01-01T08:00:00.002+08:00| 22test22| -1|
+-----------------------------+--------------+------------------------------------------------------+
StartsWith
Usage
The function is used to check whether input series starts with the specified prefix.
Name: STARTSWITH
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
target
: The prefix to be checked.
Output Series: Output a single series. The type is BOOLEAN.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1|
|1970-01-01T08:00:00.002+08:00| 22test22|
+-----------------------------+--------------+
SQL for query:
select s1, startswith(s1, "target"="1") from root.sg1.d1
Output series:
+-----------------------------+--------------+----------------------------------------+
| Time|root.sg1.d1.s1|startswith(root.sg1.d1.s1, "target"="1")|
+-----------------------------+--------------+----------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| true|
|1970-01-01T08:00:00.002+08:00| 22test22| false|
+-----------------------------+--------------+----------------------------------------+
EndsWith
Usage
The function is used to check whether input series ends with the specified suffix.
Name: ENDSWITH
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
target
: The suffix to be checked.
Output Series: Output a single series. The type is BOOLEAN.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1|
|1970-01-01T08:00:00.002+08:00| 22test22|
+-----------------------------+--------------+
SQL for query:
select s1, endswith(s1, "target"="1") from root.sg1.d1
Output series:
+-----------------------------+--------------+--------------------------------------+
| Time|root.sg1.d1.s1|endswith(root.sg1.d1.s1, "target"="1")|
+-----------------------------+--------------+--------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| true|
|1970-01-01T08:00:00.002+08:00| 22test22| false|
+-----------------------------+--------------+--------------------------------------+
Concat
Usage
The function is used to concat input series and target strings.
Name: CONCAT
Input Series: At least one input series. The data type is TEXT.
Parameter:
targets
: A series of K-V, key needs to start withtarget
and be not duplicated, value is the string you want to concat.series_behind
: Indicates whether series behind targets. The default value isfalse
.
Output Series: Output a single series. The type is TEXT.
Note:
- If value of input series is NULL, it will be skipped.
- We can only concat input series and
targets
separately.concat(s1, "target1"="IoT", s2, "target2"="DB")
andconcat(s1, s2, "target1"="IoT", "target2"="DB")
gives the same result.
Examples
Input series:
+-----------------------------+--------------+--------------+
| Time|root.sg1.d1.s1|root.sg1.d1.s2|
+-----------------------------+--------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1| null|
|1970-01-01T08:00:00.002+08:00| 22test22| 2222test|
+-----------------------------+--------------+--------------+
SQL for query:
select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1
Output series:
+-----------------------------+--------------+--------------+-----------------------------------------------------------------------+
| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB")|
+-----------------------------+--------------+--------------+-----------------------------------------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| null| 1test1IoTDB|
|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 22test222222testIoTDB|
+-----------------------------+--------------+--------------+-----------------------------------------------------------------------+
Another SQL for query:
select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1
Output series:
+-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+
| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB", "series_behind"="true")|
+-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| null| IoTDB1test1|
|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| IoTDB22test222222test|
+-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+
Substr
Usage
The function is used to get the substring start
to end - 1
.
Name: SUBSTR
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
start
: Indicates the start position of substring.end
: Indicates the end position of substring.
Output Series: Output a single series. The type is TEXT.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1|
|1970-01-01T08:00:00.002+08:00| 22test22|
+-----------------------------+--------------+
SQL for query:
select s1, substr(s1, "start"="0", "end"="2") from root.sg1.d1
Output series:
+-----------------------------+--------------+----------------------------------------------+
| Time|root.sg1.d1.s1|substr(root.sg1.d1.s1, "start"="0", "end"="2")|
+-----------------------------+--------------+----------------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| 1t|
|1970-01-01T08:00:00.002+08:00| 22test22| 22|
+-----------------------------+--------------+----------------------------------------------+
Upper
Usage
The function is used to get the string of input series with all characters changed to uppercase.
Name: UPPER
Input Series: Only support a single input series. The data type is TEXT.
Output Series: Output a single series. The type is TEXT.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1|
|1970-01-01T08:00:00.002+08:00| 22test22|
+-----------------------------+--------------+
SQL for query:
select s1, upper(s1) from root.sg1.d1
Output series:
+-----------------------------+--------------+---------------------+
| Time|root.sg1.d1.s1|upper(root.sg1.d1.s1)|
+-----------------------------+--------------+---------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| 1TEST1|
|1970-01-01T08:00:00.002+08:00| 22test22| 22TEST22|
+-----------------------------+--------------+---------------------+
Lower
Usage
The function is used to get the string of input series with all characters changed to lowercase.
Name: LOWER
Input Series: Only support a single input series. The data type is TEXT.
Output Series: Output a single series. The type is TEXT.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s1|
+-----------------------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1TEST1|
|1970-01-01T08:00:00.002+08:00| 22TEST22|
+-----------------------------+--------------+
SQL for query:
select s1, lower(s1) from root.sg1.d1
Output series:
+-----------------------------+--------------+---------------------+
| Time|root.sg1.d1.s1|lower(root.sg1.d1.s1)|
+-----------------------------+--------------+---------------------+
|1970-01-01T08:00:00.001+08:00| 1TEST1| 1test1|
|1970-01-01T08:00:00.002+08:00| 22TEST22| 22test22|
+-----------------------------+--------------+---------------------+
Trim
Usage
The function is used to get the string whose value is same to input series, with all leading and trailing space removed.
Name: TRIM
Input Series: Only support a single input series. The data type is TEXT.
Output Series: Output a single series. The type is TEXT.
Note: Returns NULL if input is NULL.
Examples
Input series:
+-----------------------------+--------------+
| Time|root.sg1.d1.s3|
+-----------------------------+--------------+
|1970-01-01T08:00:00.002+08:00| 3querytest3|
|1970-01-01T08:00:00.003+08:00| 3querytest3 |
+-----------------------------+--------------+
SQL for query:
select s3, trim(s3) from root.sg1.d1
Output series:
+-----------------------------+--------------+--------------------+
| Time|root.sg1.d1.s3|trim(root.sg1.d1.s3)|
+-----------------------------+--------------+--------------------+
|1970-01-01T08:00:00.002+08:00| 3querytest3| 3querytest3|
|1970-01-01T08:00:00.003+08:00| 3querytest3 | 3querytest3|
+-----------------------------+--------------+--------------------+
StrCmp
Usage
The function is used to get the compare result of two input series. Returns 0
if series value are the same, a negative integer
if value of series1 is smaller than series2,
a positive integer
if value of series1 is more than series2.
Name: StrCmp
Input Series: Support two input series. Data types are all the TEXT.
Output Series: Output a single series. The type is INT32.
Note: Returns NULL either series value is NULL.
Examples
Input series:
+-----------------------------+--------------+--------------+
| Time|root.sg1.d1.s1|root.sg1.d1.s2|
+-----------------------------+--------------+--------------+
|1970-01-01T08:00:00.001+08:00| 1test1| null|
|1970-01-01T08:00:00.002+08:00| 22test22| 2222test|
+-----------------------------+--------------+--------------+
SQL for query:
select s1, s2, strcmp(s1, s2) from root.sg1.d1
Output series:
+-----------------------------+--------------+--------------+--------------------------------------+
| Time|root.sg1.d1.s1|root.sg1.d1.s2|strcmp(root.sg1.d1.s1, root.sg1.d1.s2)|
+-----------------------------+--------------+--------------+--------------------------------------+
|1970-01-01T08:00:00.001+08:00| 1test1| null| null|
|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 66|
+-----------------------------+--------------+--------------+--------------------------------------+
StrReplace
Usage
This is not a built-in function and can only be used after registering the library-udf. The function is used to replace the specific substring with given string.
Name: STRREPLACE
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
target
: The target substring to be replaced.replace
: The string to be put on.limit
: The number of matches to be replaced which should be an integer no less than -1,
default to -1 which means all matches will be replaced.offset
: The number of matches to be skipped, which means the firstoffset
matches will not be replaced, default to 0.reverse
: Whether to count all the matches reversely, default to 'false'.
Output Series: Output a single series. The type is TEXT.
Examples
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00| B+,B,B|
|2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-,B,B|
+-----------------------------+---------------+
SQL for query:
select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------+
| Time|strreplace(root.test.d1.s1, "target"=",",|
| | "replace"="/", "limit"="2")|
+-----------------------------+-----------------------------------------+
|2021-01-01T00:00:01.000+08:00| A/B/A+,B-|
|2021-01-01T00:00:02.000+08:00| A/A+/A,B+|
|2021-01-01T00:00:03.000+08:00| B+/B/B|
|2021-01-01T00:00:04.000+08:00| A+/A/A+,A|
|2021-01-01T00:00:05.000+08:00| A/B-/B,B|
+-----------------------------+-----------------------------------------+
Another SQL for query:
select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------------------+
| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= |
| | "|", "limit"="1", "offset"="1", "reverse"="true")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| A,B/A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+/A,B+|
|2021-01-01T00:00:03.000+08:00| B+/B,B|
|2021-01-01T00:00:04.000+08:00| A+,A/A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-/B,B|
+-----------------------------+-----------------------------------------------------+
RegexMatch
Usage
This is not a built-in function and can only be used after registering the library-udf. The function is used to fetch matched contents from text with given regular expression.
Name: REGEXMATCH
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
regex
: The regular expression to match in the text. All grammars supported by Java are acceptable,
for example,\d+\.\d+\.\d+\.\d+
is expected to match any IPv4 addresses.group
: The wanted group index in the matched result.
Reference to java.util.regex, group 0 is the whole pattern and
the next ones are numbered with the appearance order of left parentheses.
For example, the groups inA(B(CD))
are: 0-A(B(CD))
, 1-B(CD)
, 2-CD
.
Output Series: Output a single series. The type is TEXT.
Note: Those points with null values or not matched with the given pattern will not return any results.
Examples
Input series:
+-----------------------------+-------------------------------+
| Time| root.test.d1.s1|
+-----------------------------+-------------------------------+
|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
+-----------------------------+-------------------------------+
SQL for query:
select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1
Output series:
+-----------------------------+----------------------------------------------------------------------+
| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")|
+-----------------------------+----------------------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| 192.168.0.1|
|2021-01-01T00:00:02.000+08:00| 192.168.0.24|
|2021-01-01T00:00:03.000+08:00| 192.168.0.2|
|2021-01-01T00:00:04.000+08:00| 192.168.0.5|
|2021-01-01T00:00:05.000+08:00| 192.168.0.124|
+-----------------------------+----------------------------------------------------------------------+
RegexReplace
Usage
This is not a built-in function and can only be used after registering the library-udf. The function is used to replace the specific regular expression matches with given string.
Name: REGEXREPLACE
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
regex
: The target regular expression to be replaced. All grammars supported by Java are acceptable.replace
: The string to be put on and back reference notes in Java is also supported,
for example, '$1' refers to group 1 in theregex
which will be filled with corresponding matched results.limit
: The number of matches to be replaced which should be an integer no less than -1,
default to -1 which means all matches will be replaced.offset
: The number of matches to be skipped, which means the firstoffset
matches will not be replaced, default to 0.reverse
: Whether to count all the matches reversely, default to 'false'.
Output Series: Output a single series. The type is TEXT.
Examples
Input series:
+-----------------------------+-------------------------------+
| Time| root.test.d1.s1|
+-----------------------------+-------------------------------+
|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
+-----------------------------+-------------------------------+
SQL for query:
select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------------------------+
| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",|
| | "replace"="cluster-$1", "limit"="1")|
+-----------------------------+-----------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]|
|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]|
|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]|
|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]|
|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]|
+-----------------------------+-----------------------------------------------------------+
RegexSplit
Usage
This is not a built-in function and can only be used after registering the library-udf. The function is used to split text with given regular expression and return specific element.
Name: REGEXSPLIT
Input Series: Only support a single input series. The data type is TEXT.
Parameter:
regex
: The regular expression used to split the text.
All grammars supported by Java are acceptable, for example,['"]
is expected to match'
and"
.index
: The wanted index of elements in the split result.
It should be an integer no less than -1, default to -1 which means the length of the result array is returned
and any non-negative integer is used to fetch the text of the specific index starting from 0.
Output Series: Output a single series. The type is INT32 when index
is -1 and TEXT when it's an valid index.
Note: When index
is out of the range of the result array, for example 0,1,2
split with ,
and index
is set to 3,
no result are returned for that record.
Examples
Input series:
+-----------------------------+---------------+
| Time|root.test.d1.s1|
+-----------------------------+---------------+
|2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
|2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
|2021-01-01T00:00:03.000+08:00| B+,B,B|
|2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
|2021-01-01T00:00:05.000+08:00| A,B-,B,B|
+-----------------------------+---------------+
SQL for query:
select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1
Output series:
+-----------------------------+------------------------------------------------------+
| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")|
+-----------------------------+------------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| 4|
|2021-01-01T00:00:02.000+08:00| 4|
|2021-01-01T00:00:03.000+08:00| 3|
|2021-01-01T00:00:04.000+08:00| 4|
|2021-01-01T00:00:05.000+08:00| 4|
+-----------------------------+------------------------------------------------------+
Another SQL for query:
SQL for query:
select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1
Output series:
+-----------------------------+-----------------------------------------------------+
| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")|
+-----------------------------+-----------------------------------------------------+
|2021-01-01T00:00:01.000+08:00| B-|
|2021-01-01T00:00:02.000+08:00| B+|
|2021-01-01T00:00:04.000+08:00| A|
|2021-01-01T00:00:05.000+08:00| B|
+-----------------------------+-----------------------------------------------------+