Skip to main content

Database Programming

...About 46 min

Database Programming

TRIGGER

1. Instructions

The trigger provides a mechanism for listening to changes in time series data. With user-defined logic, tasks such as alerting and data forwarding can be conducted.

The trigger is implemented based on the reflection mechanism. Users can monitor data changes by implementing the Java interfaces. IoTDB allows users to dynamically register and drop triggers without restarting the server.

The document will help you learn to define and manage triggers.

Pattern for Listening

A single trigger can be used to listen for data changes in a time series that match a specific pattern. For example, a trigger can listen for the data changes of time series root.sg.a, or time series that match the pattern root.sg.*. When you register a trigger, you can specify the path pattern that the trigger listens on through an SQL statement.

Trigger Type

There are currently two types of triggers, and you can specify the type through an SQL statement when registering a trigger:

  • Stateful triggers: The execution logic of this type of trigger may depend on data from multiple insertion statement . The framework will aggregate the data written by different nodes into the same trigger instance for calculation to retain context information. This type of trigger is usually used for sampling or statistical data aggregation for a period of time. information. Only one node in the cluster holds an instance of a stateful trigger.
  • Stateless triggers: The execution logic of the trigger is only related to the current input data. The framework does not need to aggregate the data of different nodes into the same trigger instance. This type of trigger is usually used for calculation of single row data and abnormal detection. Each node in the cluster holds an instance of a stateless trigger.

Trigger Event

There are currently two trigger events for the trigger, and other trigger events will be expanded in the future. When you register a trigger, you can specify the trigger event through an SQL statement:

  • BEFORE INSERT: Fires before the data is persisted. Please note that currently the trigger does not support data cleaning and will not change the data to be persisted itself.
  • AFTER INSERT: Fires after the data is persisted.

2. How to Implement a Trigger

You need to implement the trigger by writing a Java class, where the dependency shown below is required. If you use Mavenopen in new window, you can search for them directly from the Maven repositoryopen in new window.

Dependency

<dependency>
  <groupId>org.apache.iotdb</groupId>
  <artifactId>iotdb-server</artifactId>
  <version>1.0.0</version>
  <scope>provided</scope>
</dependency>

Note that the dependency version should be correspondent to the target server version.

Interface Description

To implement a trigger, you need to implement the org.apache.iotdb.trigger.api.Trigger class.

import org.apache.iotdb.trigger.api.enums.FailureStrategy;
import org.apache.iotdb.tsfile.write.record.Tablet;

public interface Trigger {

  /**
   * This method is mainly used to validate {@link TriggerAttributes} before calling {@link
   * Trigger#onCreate(TriggerAttributes)}.
   *
   * @param attributes TriggerAttributes
   * @throws Exception e
   */
  default void validate(TriggerAttributes attributes) throws Exception {}

  /**
   * This method will be called when creating a trigger after validation.
   *
   * @param attributes TriggerAttributes
   * @throws Exception e
   */
  default void onCreate(TriggerAttributes attributes) throws Exception {}

  /**
   * This method will be called when dropping a trigger.
   *
   * @throws Exception e
   */
  default void onDrop() throws Exception {}

  /**
   * When restarting a DataNode, Triggers that have been registered will be restored and this method
   * will be called during the process of restoring.
   *
   * @throws Exception e
   */
  default void restore() throws Exception {}

  /**
   * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC}
   * is the default strategy.
   *
   * @return {@link FailureStrategy}
   */
  default FailureStrategy getFailureStrategy() {
    return FailureStrategy.OPTIMISTIC;
  }

  /**
   * @param tablet see {@link Tablet} for detailed information of data structure. Data that is
   *     inserted will be constructed as a Tablet and you can define process logic with {@link
   *     Tablet}.
   * @return true if successfully fired
   * @throws Exception e
   */
  default boolean fire(Tablet tablet) throws Exception {
    return true;
  }
}

This class provides two types of programming interfaces: Lifecycle related interfaces and data change listening related interfaces. All the interfaces in this class are not required to be implemented. When the interfaces are not implemented, the trigger will not respond to the data changes. You can implement only some of these interfaces according to your needs.

Descriptions of the interfaces are as followed.

InterfaceDescription
default void validate(TriggerAttributes attributes) throws Exception {}When you creates a trigger using the CREATE TRIGGER statement, you can specify the parameters that the trigger needs to use, and this interface will be used to verify the correctness of the parameters。
default void onCreate(TriggerAttributes attributes) throws Exception {}This interface is called once when you create a trigger using the CREATE TRIGGER statement. During the lifetime of each trigger instance, this interface will be called only once. This interface is mainly used for the following functions: helping users to parse custom attributes in SQL statements (using TriggerAttributes). You can create or apply for resources, such as establishing external links, opening files, etc.
default void onDrop() throws Exception {}This interface is called when you drop a trigger using the DROP TRIGGER statement. During the lifetime of each trigger instance, this interface will be called only once. This interface mainly has the following functions: it can perform the operation of resource release and can be used to persist the results of trigger calculations.
default void restore() throws Exception {}When the DataNode is restarted, the cluster will restore the trigger instance registered on the DataNode, and this interface will be called once for stateful trigger during the process. After the DataNode where the stateful trigger instance is located goes down, the cluster will restore the trigger instance on another available DataNode, calling this interface once in the process. This interface can be used to customize recovery logic.
Listening Interface
/**
   * @param tablet see {@link Tablet} for detailed information of data structure. Data that is
   *     inserted will be constructed as a Tablet and you can define process logic with {@link
   *     Tablet}.
   * @return true if successfully fired
   * @throws Exception e
   */
  default boolean fire(Tablet tablet) throws Exception {
    return true;
  }

When the data changes, the trigger uses the Tablet as the unit of firing operation. You can obtain the metadata and data of the corresponding sequence through Tablet, and then perform the corresponding trigger operation. If the fire process is successful, the return value should be true. If the interface returns false or throws an exception, we consider the trigger fire process as failed. When the trigger fire process fails, we will perform corresponding operations according to the listening strategy interface.

When performing an INSERT operation, for each time series in it, we will detect whether there is a trigger that listens to the path pattern, and then assemble the time series data that matches the path pattern listened by the same trigger into a new Tablet for trigger fire interface. Can be understood as:

Map<PartialPath, List<Trigger>> pathToTriggerListMap => Map<Trigger, Tablet>

Note that currently we do not make any guarantees about the order in which triggers fire.

Here is an example:

Suppose there are three triggers, and the trigger event of the triggers are all BEFORE INSERT:

  • Trigger1 listens on root.sg.*
  • Trigger2 listens on root.sg.a
  • Trigger3 listens on root.sg.b

Insertion statement:

insert into root.sg(time, a, b) values (1, 1, 1);

The time series root.sg.a matches Trigger1 and Trigger2, and the sequence root.sg.b matches Trigger1 and Trigger3, then:

  • The data of root.sg.a and root.sg.b will be assembled into a new tablet1, and Trigger1.fire(tablet1) will be executed at the corresponding Trigger Event.
  • The data of root.sg.a will be assembled into a new tablet2, and Trigger2.fire(tablet2) will be executed at the corresponding Trigger Event.
  • The data of root.sg.b will be assembled into a new tablet3, and Trigger3.fire(tablet3) will be executed at the corresponding Trigger Event.
Listening Strategy Interface

When the trigger fails to fire, we will take corresponding actions according to the strategy set by the listening strategy interface. You can set org.apache.iotdb.trigger.api.enums.FailureStrategy. There are currently two strategies, optimistic and pessimistic:

  • Optimistic strategy: The trigger that fails to fire does not affect the firing of subsequent triggers, nor does it affect the writing process, that is, we do not perform additional processing on the sequence involved in the trigger failure, only log the failure to record the failure, and finally inform user that data insertion is successful, but the trigger fire part failed.
  • Pessimistic strategy: The failure trigger affects the processing of all subsequent Pipelines, that is, we believe that the firing failure of the trigger will cause all subsequent triggering processes to no longer be carried out. If the trigger event of the trigger is BEFORE INSERT, then the insertion will no longer be performed, and the insertion failure will be returned directly.
 /**
   * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC}
   * is the default strategy.
   *
   * @return {@link FailureStrategy}
   */
  default FailureStrategy getFailureStrategy() {
    return FailureStrategy.OPTIMISTIC;
  }

Example

If you use Mavenopen in new window, you can refer to our sample project trigger-example.

You can find it hereopen in new window.

Here is the code from one of the sample projects:

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

package org.apache.iotdb.trigger;

import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration;
import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent;
import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler;
import org.apache.iotdb.trigger.api.Trigger;
import org.apache.iotdb.trigger.api.TriggerAttributes;
import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType;
import org.apache.iotdb.tsfile.write.record.Tablet;
import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.HashMap;
import java.util.List;

public class ClusterAlertingExample implements Trigger {
  private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class);

  private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler();

  private final AlertManagerConfiguration alertManagerConfiguration =
      new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts");

  private String alertname;

  private final HashMap<String, String> labels = new HashMap<>();

  private final HashMap<String, String> annotations = new HashMap<>();

  @Override
  public void onCreate(TriggerAttributes attributes) throws Exception {
    alertname = "alert_test";

    labels.put("series", "root.ln.wf01.wt01.temperature");
    labels.put("value", "");
    labels.put("severity", "");

    annotations.put("summary", "high temperature");
    annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}");

    alertManagerHandler.open(alertManagerConfiguration);
  }

  @Override
  public void onDrop() throws IOException {
    alertManagerHandler.close();
  }

  @Override
  public boolean fire(Tablet tablet) throws Exception {
    List<MeasurementSchema> measurementSchemaList = tablet.getSchemas();
    for (int i = 0, n = measurementSchemaList.size(); i < n; i++) {
      if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) {
        // for example, we only deal with the columns of Double type
        double[] values = (double[]) tablet.values[i];
        for (double value : values) {
          if (value > 100.0) {
            LOGGER.info("trigger value > 100");
            labels.put("value", String.valueOf(value));
            labels.put("severity", "critical");
            AlertManagerEvent alertManagerEvent =
                new AlertManagerEvent(alertname, labels, annotations);
            alertManagerHandler.onEvent(alertManagerEvent);
          } else if (value > 50.0) {
            LOGGER.info("trigger value > 50");
            labels.put("value", String.valueOf(value));
            labels.put("severity", "warning");
            AlertManagerEvent alertManagerEvent =
                new AlertManagerEvent(alertname, labels, annotations);
            alertManagerHandler.onEvent(alertManagerEvent);
          }
        }
      }
    }
    return true;
  }
}

3. Trigger Management

You can create and drop a trigger through an SQL statement, and you can also query all registered triggers through an SQL statement.

We recommend that you stop insertion while creating triggers.

Create Trigger

Triggers can be registered on arbitrary path patterns. The time series registered with the trigger will be listened to by the trigger. When there is data change on the series, the corresponding fire method in the trigger will be called.

Registering a trigger can be done as follows:

  1. Implement a Trigger class as described in the How to implement a Trigger chapter, assuming the class's full class name is org.apache.iotdb.trigger.ClusterAlertingExample
  2. Package the project into a JAR package.
  3. Register the trigger with an SQL statement. During the creation process, the validate and onCreate interfaces of the trigger will only be called once. For details, please refer to the chapter of How to implement a Trigger.

The complete SQL syntax is as follows:

// Create Trigger
createTrigger
    : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause?
    ;

triggerType
    : STATELESS | STATEFUL
    ;

triggerEventClause
    : (BEFORE | AFTER) INSERT
    ;
        
uriClause
    : USING URI uri
    ;

uri
    : STRING_LITERAL
    ;
    
triggerAttributeClause
    : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET
    ;

triggerAttribute
    : key=attributeKey operator_eq value=attributeValue
    ;

Below is the explanation for the SQL syntax:

  • triggerName: The trigger ID, which is globally unique and used to distinguish different triggers, is case-sensitive.
  • triggerType: Trigger types are divided into two categories, STATELESS and STATEFUL.
  • triggerEventClause: when the trigger fires, BEFORE INSERT and AFTER INSERT are supported now.
  • pathPattern:The path pattern the trigger listens on, can contain wildcards * and **.
  • className:The class name of the Trigger class.
  • jarLocation: Optional. When this option is not specified, by default, we consider that the DBA has placed the JAR package required to create the trigger in the trigger_root_dir directory (configuration item, default is IOTDB_HOME/ext/trigger) of each DataNode node. When this option is specified, we will download and distribute the file resource corresponding to the URI to the trigger_root_dir/install directory of each DataNode.
  • triggerAttributeClause: It is used to specify the parameters that need to be set when the trigger instance is created. This part is optional in the SQL syntax.

Here is an example SQL statement to help you understand:

CREATE STATELESS TRIGGER triggerTest
BEFORE INSERT
ON root.sg.**
AS 'org.apache.iotdb.trigger.ClusterAlertingExample'
USING URI '/jar/ClusterAlertingExample.jar'
WITH (
    "name" = "trigger",
    "limit" = "100"
)

The above SQL statement creates a trigger named triggerTest:

Drop Trigger

The trigger can be dropped by specifying the trigger ID. During the process of dropping the trigger, the onDrop interface of the trigger will be called only once.

The SQL syntax is:

// Drop Trigger
dropTrigger
  : DROP TRIGGER triggerName=identifier
;

Here is an example statement:

DROP TRIGGER triggerTest1

The above statement will drop the trigger with ID triggerTest1.

Show Trigger

You can query information about triggers that exist in the cluster through an SQL statement.

The SQL syntax is as follows:

SHOW TRIGGERS

The result set format of this statement is as follows:

TriggerNameEventTypeStatePathPatternClassNameNodeId
triggerTest1BEFORE_INSERT / AFTER_INSERTSTATELESS / STATEFULINACTIVE / ACTIVE / DROPPING / TRANSFFERINGroot.**org.apache.iotdb.trigger.TriggerExampleALL(STATELESS) / DATA_NODE_ID(STATEFUL)

Trigger State

During the process of creating and dropping triggers in the cluster, we maintain the states of the triggers. The following is a description of these states:

StateDescriptionIs it recommended to insert data?
INACTIVEThe intermediate state of executing CREATE TRIGGER, the cluster has just recorded the trigger information on the ConfigNode, and the trigger has not been activated on any DataNode.NO
ACTIVEStatus after successful execution of CREATE TRIGGE, the trigger is available on all DataNodes in the cluster.YES
DROPPINGIntermediate state of executing DROP TRIGGER, the cluster is in the process of dropping the trigger.NO
TRANSFERRINGThe cluster is migrating the location of this trigger instance.NO

4. Notes

  • The trigger takes effect from the time of registration, and does not process the existing historical data. **That is, only insertion requests that occur after the trigger is successfully registered will be listened to by the trigger. **
  • The fire process of trigger is synchronous currently, so you need to ensure the efficiency of the trigger, otherwise the writing performance may be greatly affected. You need to guarantee concurrency safety of triggers yourself.
  • Please do no register too many triggers in the cluster. Because the trigger information is fully stored in the ConfigNode, and there is a copy of the information in all DataNodes
  • It is recommended to stop writing when registering triggers. Registering a trigger is not an atomic operation. When registering a trigger, there will be an intermediate state in which some nodes in the cluster have registered the trigger, and some nodes have not yet registered successfully. To avoid write requests on some nodes being listened to by triggers and not being listened to on some nodes, we recommend not to perform writes when registering triggers.
  • When the node holding the stateful trigger instance goes down, we will try to restore the corresponding instance on another node. During the recovery process, we will call the restore interface of the trigger class once.
  • The trigger JAR package has a size limit, which must be less than min(config_node_ratis_log_appender_buffer_size_max, 2G), where config_node_ratis_log_appender_buffer_size_max is a configuration item. For the specific meaning, please refer to the IOTDB configuration item description.
  • It is better not to have classes with the same full class name but different function implementations in different JAR packages. For example, trigger1 and trigger2 correspond to resources trigger1.jar and trigger2.jar respectively. If two JAR packages contain a org.apache.iotdb.trigger.example.AlertListener class, when CREATE TRIGGER uses this class, the system will randomly load the class in one of the JAR packages, which will eventually leads the inconsistent behavior of trigger and other issues.

5. Configuration Parameters

ParameterMeaning
trigger_lib_dirDirectory to save the trigger jar package
stateful_trigger_retry_num_when_not_foundHow many times will we retry to found an instance of stateful trigger on DataNodes if not found

CONTINUOUS QUERY (CQ)

1. Introduction

Continuous queries(CQ) are queries that run automatically and periodically on realtime data and store query results in other specified time series.

Users can implement sliding window streaming computing through continuous query, such as calculating the hourly average temperature of a sequence and writing it into a new sequence. Users can customize the RESAMPLE clause to create different sliding windows, which can achieve a certain degree of tolerance for out-of-order data.

2. Syntax

CREATE (CONTINUOUS QUERY | CQ) <cq_id> 
[RESAMPLE 
  [EVERY <every_interval>] 
  [BOUNDARY <execution_boundary_time>]
  [RANGE <start_time_offset>[, end_time_offset]] 
]
[TIMEOUT POLICY BLOCKED|DISCARD]
BEGIN
  SELECT CLAUSE
    INTO CLAUSE
    FROM CLAUSE
    [WHERE CLAUSE]
    [GROUP BY(<group_by_interval>[, <sliding_step>]) [, level = <level>]]
    [HAVING CLAUSE]
    [FILL {PREVIOUS | LINEAR | constant}]
    [LIMIT rowLimit OFFSET rowOffset]
    [ALIGN BY DEVICE]
END

Note:

  1. If there exists any time filters in WHERE CLAUSE, IoTDB will throw an error, because IoTDB will automatically generate a time range for the query each time it's executed.
  2. GROUP BY TIME CLAUSE is different, it doesn't contain its original first display window parameter which is [start_time, end_time). It's still because IoTDB will automatically generate a time range for the query each time it's executed.
  3. If there is no group by time clause in query, EVERY clause is required, otherwise IoTDB will throw an error.

Descriptions of parameters in CQ syntax

  • <cq_id> specifies the globally unique id of CQ.
  • <every_interval> specifies the query execution time interval. We currently support the units of ns, us, ms, s, m, h, d, w, and its value should not be lower than the minimum threshold configured by the user, which is continuous_query_min_every_interval. It's an optional parameter, default value is set to group_by_interval in group by clause.
  • <start_time_offset> specifies the start time of each query execution as now()-<start_time_offset>. We currently support the units of ns, us, ms, s, m, h, d, w.Itopen in new window's an optional parameter, default value is set to every_interval in resample clause.
  • <end_time_offset> specifies the end time of each query execution as now()-<end_time_offset>. We currently support the units of ns, us, ms, s, m, h, d, w.Itopen in new window's an optional parameter, default value is set to 0.
  • <execution_boundary_time> is a date that represents the execution time of a certain cq task.
    • <execution_boundary_time> can be earlier than, equals to, later than current time.
    • This parameter is optional. If not specified, it is equal to BOUNDARY 0
    • The start time of the first time window is <execution_boundary_time> - <start_time_offset>.
    • The end time of the first time window is <execution_boundary_time> - <end_time_offset>.
    • The time range of the i (1 <= i)th window is [<execution_boundary_time> - <start_time_offset> + (i - 1) * <every_interval>, <execution_boundary_time> - <end_time_offset> + (i - 1) * <every_interval>).
    • If the current time is earlier than or equal to execution_boundary_time, then the first execution moment of the continuous query is execution_boundary_time.
    • If the current time is later than execution_boundary_time, then the first execution moment of the continuous query is the first execution_boundary_time + i * <every_interval> that is later than or equal to the current time .
  • <every_interval><start_time_offset> and <group_by_interval> should all be greater than 0.
  • The value of <group_by_interval> should be less than or equal to the value of <start_time_offset>, otherwise the system will throw an error.
  • Users should specify the appropriate <start_time_offset> and <every_interval> according to actual needs.
    • If <start_time_offset> is greater than <every_interval>, there will be partial data overlap in each query window.
    • If <start_time_offset> is less than <every_interval>, there may be uncovered data between each query window.
  • start_time_offset should be larger than end_time_offset, otherwise the system will throw an error.
<start_time_offset> == <every_interval>
1
1
<start_time_offset> > <every_interval>
2
2
<start_time_offset> < <every_interval>
3
3
<every_interval> is not zero
4
4
  • TIMEOUT POLICY specify how we deal with the cq task whose previous time interval execution is not finished while the next execution time has reached. The default value is BLOCKED.
    • BLOCKED means that we will block and wait to do the current cq execution task until the previous time interval cq task finishes. If using BLOCKED policy, all the time intervals will be executed, but it may be behind the latest time interval.
    • DISCARD means that we just discard the current cq execution task and wait for the next execution time and do the next time interval cq task. If using DISCARD policy, some time intervals won't be executed when the execution time of one cq task is longer than the <every_interval>. However, once a cq task is executed, it will use the latest time interval, so it can catch up at the sacrifice of some time intervals being discarded.

3. Examples of CQ

The examples below use the following sample data. It's a real time data stream and we can assume that the data arrives on time.

+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
|                         Time|root.ln.wf02.wt02.temperature|root.ln.wf02.wt01.temperature|root.ln.wf01.wt02.temperature|root.ln.wf01.wt01.temperature|
+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
|2021-05-11T22:18:14.598+08:00|                        121.0|                         72.0|                        183.0|                        115.0|
|2021-05-11T22:18:19.941+08:00|                          0.0|                         68.0|                         68.0|                        103.0|
|2021-05-11T22:18:24.949+08:00|                        122.0|                         45.0|                         11.0|                         14.0|
|2021-05-11T22:18:29.967+08:00|                         47.0|                         14.0|                         59.0|                        181.0|
|2021-05-11T22:18:34.979+08:00|                        182.0|                        113.0|                         29.0|                        180.0|
|2021-05-11T22:18:39.990+08:00|                         42.0|                         11.0|                         52.0|                         19.0|
|2021-05-11T22:18:44.995+08:00|                         78.0|                         38.0|                        123.0|                         52.0|
|2021-05-11T22:18:49.999+08:00|                        137.0|                        172.0|                        135.0|                        193.0|
|2021-05-11T22:18:55.003+08:00|                         16.0|                        124.0|                        183.0|                         18.0|
+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+

Configuring execution intervals

Use an EVERY interval in the RESAMPLE clause to specify the CQ’s execution interval, if not specific, default value is equal to group_by_interval.

CREATE CONTINUOUS QUERY cq1
RESAMPLE EVERY 20s
BEGIN
SELECT max_value(temperature)
  INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max)
  FROM root.ln.*.*
  GROUP BY(10s)
END

cq1 calculates the 10-second average of temperature sensor under the root.ln prefix path and stores the results in the temperature_max sensor using the same prefix path as the corresponding sensor.

cq1 executes at 20-second intervals, the same interval as the EVERY interval. Every 20 seconds, cq1 runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with now().

Supposing that the current time is 2021-05-11T22:18:40.000+08:00, we can see annotated log output about cq1 running at DataNode if you set log level to DEBUG:

At **2021-05-11T22:18:40.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`.
`cq1` generate 2 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>
At **2021-05-11T22:19:00.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`.
`cq1` generate 2 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
|2021-05-11T22:18:50.000+08:00|                             16.0|                            124.0|                            183.0|                             18.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>

cq1 won't deal with data that is before the current time window which is 2021-05-11T22:18:20.000+08:00, so here are the results:

> SELECT temperature_max from root.ln.*.*;
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
|2021-05-11T22:18:50.000+08:00|                             16.0|                            124.0|                            183.0|                             18.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+

Configuring time range for resampling

Use start_time_offset in the RANGE clause to specify the start time of the CQ’s time range, if not specific, default value is equal to EVERY interval.

CREATE CONTINUOUS QUERY cq2
RESAMPLE RANGE 40s
BEGIN
  SELECT max_value(temperature)
  INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max)
  FROM root.ln.*.*
  GROUP BY(10s)
END

cq2 calculates the 10-second average of temperature sensor under the root.ln prefix path and stores the results in the temperature_max sensor using the same prefix path as the corresponding sensor.

cq2 executes at 10-second intervals, the same interval as the group_by_interval. Every 10 seconds, cq2 runs a single query that covers the time range between now() minus the start_time_offset and now() , that is, the time range between 40 seconds prior to now() and now().

Supposing that the current time is 2021-05-11T22:18:40.000+08:00, we can see annotated log output about cq2 running at DataNode if you set log level to DEBUG:

At **2021-05-11T22:18:40.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`.
`cq2` generate 4 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:00.000+08:00|                             NULL|                             NULL|                             NULL|                             NULL|
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>
At **2021-05-11T22:18:50.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:10, 2021-05-11T22:18:50)`.
`cq2` generate 4 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>
At **2021-05-11T22:19:00.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`.
`cq2` generate 4 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
|2021-05-11T22:18:50.000+08:00|                             16.0|                            124.0|                            183.0|                             18.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>

cq2 won't write lines that are all null. Notice cq2 will also calculate the results for some time interval many times. Here are the results:

> SELECT temperature_max from root.ln.*.*;
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
|2021-05-11T22:18:50.000+08:00|                             16.0|                            124.0|                            183.0|                             18.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+

Configuring execution intervals and CQ time ranges

Use an EVERY interval and RANGE interval in the RESAMPLE clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use fill() to change the value reported for time intervals with no data.

CREATE CONTINUOUS QUERY cq3
RESAMPLE EVERY 20s RANGE 40s
BEGIN
  SELECT max_value(temperature)
  INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max)
  FROM root.ln.*.*
  GROUP BY(10s)
  FILL(100.0)
END

cq3 calculates the 10-second average of temperature sensor under the root.ln prefix path and stores the results in the temperature_max sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value 100.0 for time intervals with no results.

cq3 executes at 20-second intervals, the same interval as the EVERY interval. Every 20 seconds, cq3 runs a single query that covers the time range between now() minus the start_time_offset and now(), that is, the time range between 40 seconds prior to now() and now().

Supposing that the current time is 2021-05-11T22:18:40.000+08:00, we can see annotated log output about cq3 running at DataNode if you set log level to DEBUG:

At **2021-05-11T22:18:40.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`.
`cq3` generate 4 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:00.000+08:00|                            100.0|                            100.0|                            100.0|                            100.0|
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>
At **2021-05-11T22:19:00.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`.
`cq3` generate 4 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
|2021-05-11T22:18:50.000+08:00|                             16.0|                            124.0|                            183.0|                             18.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>

Notice that cq3 will calculate the results for some time interval many times, so here are the results:

> SELECT temperature_max from root.ln.*.*;
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:00.000+08:00|                            100.0|                            100.0|                            100.0|                            100.0|
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
|2021-05-11T22:18:40.000+08:00|                            137.0|                            172.0|                            135.0|                            193.0|
|2021-05-11T22:18:50.000+08:00|                             16.0|                            124.0|                            183.0|                             18.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+

Configuring end_time_offset for CQ time range

Use an EVERY interval and RANGE interval in the RESAMPLE clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use fill() to change the value reported for time intervals with no data.

CREATE CONTINUOUS QUERY cq4
RESAMPLE EVERY 20s RANGE 40s, 20s
BEGIN
  SELECT max_value(temperature)
  INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max)
  FROM root.ln.*.*
  GROUP BY(10s)
  FILL(100.0)
END

cq4 calculates the 10-second average of temperature sensor under the root.ln prefix path and stores the results in the temperature_max sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value 100.0 for time intervals with no results.

cq4 executes at 20-second intervals, the same interval as the EVERY interval. Every 20 seconds, cq4 runs a single query that covers the time range between now() minus the start_time_offset and now() minus the end_time_offset, that is, the time range between 40 seconds prior to now() and 20 seconds prior to now().

Supposing that the current time is 2021-05-11T22:18:40.000+08:00, we can see annotated log output about cq4 running at DataNode if you set log level to DEBUG:

At **2021-05-11T22:18:40.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:20)`.
`cq4` generate 2 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:00.000+08:00|                            100.0|                            100.0|                            100.0|                            100.0|
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>
At **2021-05-11T22:19:00.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`.
`cq4` generate 2 lines:
>
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
>

Notice that cq4 will calculate the results for all time intervals only once after a delay of 20 seconds, so here are the results:

> SELECT temperature_max from root.ln.*.*;
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|                         Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+
|2021-05-11T22:18:00.000+08:00|                            100.0|                            100.0|                            100.0|                            100.0|
|2021-05-11T22:18:10.000+08:00|                            121.0|                             72.0|                            183.0|                            115.0|
|2021-05-11T22:18:20.000+08:00|                            122.0|                             45.0|                             59.0|                            181.0|
|2021-05-11T22:18:30.000+08:00|                            182.0|                            113.0|                             52.0|                            180.0|
+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+

CQ without group by clause

Use an EVERY interval in the RESAMPLE clause to specify the CQ’s execution interval and the length of the CQ’s time range.

CREATE CONTINUOUS QUERY cq5
RESAMPLE EVERY 20s
BEGIN
  SELECT temperature + 1
  INTO root.precalculated_sg.::(temperature)
  FROM root.ln.*.*
  align by device
END

cq5 calculates the temperature + 1 under the root.ln prefix path and stores the results in the root.precalculated_sg database. Sensors use the same prefix path as the corresponding sensor.

cq5 executes at 20-second intervals, the same interval as the EVERY interval. Every 20 seconds, cq5 runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with now().

Supposing that the current time is 2021-05-11T22:18:40.000+08:00, we can see annotated log output about cq5 running at DataNode if you set log level to DEBUG:

At **2021-05-11T22:18:40.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`.
`cq5` generate 16 lines:
>
+-----------------------------+-------------------------------+-----------+
|                         Time|                         Device|temperature|
+-----------------------------+-------------------------------+-----------+
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02|      123.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02|       48.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02|      183.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02|       45.0|
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01|       46.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01|       15.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01|      114.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01|       12.0|
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02|       12.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02|       60.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02|       30.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02|       53.0|
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01|       15.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01|      182.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01|      181.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01|       20.0|
+-----------------------------+-------------------------------+-----------+
>
At **2021-05-11T22:19:00.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`.
`cq5` generate 12 lines:
>
+-----------------------------+-------------------------------+-----------+
|                         Time|                         Device|temperature|
+-----------------------------+-------------------------------+-----------+
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02|       79.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02|      138.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02|       17.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01|       39.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01|      173.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01|      125.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02|      124.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02|      136.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02|      184.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01|       53.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01|      194.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01|       19.0|
+-----------------------------+-------------------------------+-----------+
>

cq5 won't deal with data that is before the current time window which is 2021-05-11T22:18:20.000+08:00, so here are the results:

> SELECT temperature from root.precalculated_sg.*.* align by device;
+-----------------------------+-------------------------------+-----------+
|                         Time|                         Device|temperature|
+-----------------------------+-------------------------------+-----------+
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02|      123.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02|       48.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02|      183.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02|       45.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02|       79.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02|      138.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02|       17.0|
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01|       46.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01|       15.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01|      114.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01|       12.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01|       39.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01|      173.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01|      125.0|
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02|       12.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02|       60.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02|       30.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02|       53.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02|      124.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02|      136.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02|      184.0|
|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01|       15.0| 
|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01|      182.0|
|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01|      181.0|
|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01|       20.0|
|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01|       53.0| 
|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01|      194.0|
|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01|       19.0|
+-----------------------------+-------------------------------+-----------+

4. CQ Management

Listing continuous queries

List every CQ on the IoTDB Cluster with:

SHOW (CONTINUOUS QUERIES | CQS) 

SHOW (CONTINUOUS QUERIES | CQS) order results by cq_id.

Examples
SHOW CONTINUOUS QUERIES;

we will get:

cq_idquerystate
s1_count_cqCREATE CQ s1_count_cq
BEGIN
SELECT count(s1)
INTO root.sg_count.d.count_s1
FROM root.sg.d
GROUP BY(30m)
END
active

Dropping continuous queries

Drop a CQ with a specific cq_id:

DROP (CONTINUOUS QUERY | CQ) <cq_id>

DROP CQ returns an empty result.

Examples

Drop the CQ named s1_count_cq:

DROP CONTINUOUS QUERY s1_count_cq;

Altering continuous queries

CQs can't be altered once they're created. To change a CQ, you must DROP and reCREATE it with the updated settings.

5. CQ Use Cases

Downsampling and Data Retention

Use CQs with TTL set on database in IoTDB to mitigate storage concerns. Combine CQs and TTL to automatically downsample high precision data to a lower precision and remove the dispensable, high precision data from the database.

Recalculating expensive queries

Shorten query runtimes by pre-calculating expensive queries with CQs. Use a CQ to automatically downsample commonly-queried, high precision data to a lower precision. Queries on lower precision data require fewer resources and return faster.

Pre-calculate queries for your preferred graphing tool to accelerate the population of graphs and dashboards.

Substituting for sub-query

IoTDB does not support sub queries. We can get the same functionality by creating a CQ as a sub query and store its result into other time series and then querying from those time series again will be like doing nested sub query.

Example

IoTDB does not accept the following query with a nested sub query. The query calculates the average number of non-null values of s1 at 30 minute intervals:

SELECT avg(count_s1) from (select count(s1) as count_s1 from root.sg.d group by([0, now()), 30m));

To get the same results:

1. Create a CQ

This step performs the nested sub query in from clause of the query above. The following CQ automatically calculates the number of non-null values of s1 at 30 minute intervals and writes those counts into the new root.sg_count.d.count_s1 time series.

CREATE CQ s1_count_cq 
BEGIN 
    SELECT count(s1)  
        INTO root.sg_count.d(count_s1)
        FROM root.sg.d
        GROUP BY(30m)
END

2. Query the CQ results

Next step performs the avg([...]) part of the outer query above.

Query the data in the time series root.sg_count.d.count_s1 to calculate the average of it:

SELECT avg(count_s1) from root.sg_count.d;

6. System Parameter Configuration

NameDescriptionData TypeDefault Value
continuous_query_submit_threadThe number of threads in the scheduled thread pool that submit continuous query tasks periodicallyint322
continuous_query_min_every_interval_in_msThe minimum value of the continuous query execution time intervalduration1000

USER-DEFINED FUNCTION (UDF)

IoTDB provides a variety of built-in functions to meet your computing needs, and you can also create user defined functions to meet more computing needs.

This document describes how to write, register and use a UDF.

UDF Types

In IoTDB, you can expand two types of UDF:

UDF ClassDescription
UDTF(User Defined Timeseries Generating Function)This type of function can take multiple time series as input, and output one time series, which can have any number of data points.
UDAF(User Defined Aggregation Function)Custom Aggregation Functions. This type of function can take one time series as input, and output one aggregated data point for each group based on the GROUP BY type.

UDF Development Dependencies

If you use Mavenopen in new window, you can search for the development dependencies listed below from the Maven repositoryopen in new window . Please note that you must select the same dependency version as the target IoTDB server version for development.

<dependency>
  <groupId>org.apache.iotdb</groupId>
  <artifactId>udf-api</artifactId>
  <version>1.0.0</version>
  <scope>provided</scope>
</dependency>

UDTF(User Defined Timeseries Generating Function)

To write a UDTF, you need to inherit the org.apache.iotdb.udf.api.UDTF class, and at least implement the beforeStart method and a transform method.

The following table shows all the interfaces available for user implementation.

Interface definitionDescriptionRequired to Implement
void validate(UDFParameterValidator validator) throws ExceptionThis method is mainly used to validate UDFParameters and it is executed before beforeStart(UDFParameters, UDTFConfigurations) is called.Optional
void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws ExceptionThe initialization method to call the user-defined initialization behavior before a UDTF processes the input data. Every time a user executes a UDTF query, the framework will construct a new UDF instance, and beforeStart will be called.Required
void transform(Row row, PointCollector collector) throws ExceptionThis method is called by the framework. This data processing method will be called when you choose to use the RowByRowAccessStrategy strategy (set in beforeStart) to consume raw data. Input data is passed in by Row, and the transformation result should be output by PointCollector. You need to call the data collection method provided by collector to determine the output data.Required to implement at least one transform method
void transform(RowWindow rowWindow, PointCollector collector) throws ExceptionThis method is called by the framework. This data processing method will be called when you choose to use the SlidingSizeWindowAccessStrategy or SlidingTimeWindowAccessStrategy strategy (set in beforeStart) to consume raw data. Input data is passed in by RowWindow, and the transformation result should be output by PointCollector. You need to call the data collection method provided by collector to determine the output data.Required to implement at least one transform method
void terminate(PointCollector collector) throws ExceptionThis method is called by the framework. This method will be called once after all transform calls have been executed. In a single UDF query, this method will and will only be called once. You need to call the data collection method provided by collector to determine the output data.Optional
void beforeDestroy() This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance.Optional

In the life cycle of a UDTF instance, the calling sequence of each method is as follows:

  1. void validate(UDFParameterValidator validator) throws Exception
  2. void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception
  3. void transform(Row row, PointCollector collector) throws Exception or void transform(RowWindow rowWindow, PointCollector collector) throws Exception
  4. void terminate(PointCollector collector) throws Exception
  5. void beforeDestroy()

Note that every time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDTF without considering the influence of concurrency and other factors.

The usage of each interface will be described in detail below.

void validate(UDFParameterValidator validator) throws Exception

The validate method is used to validate the parameters entered by the user.

In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification.

Please refer to the Javadoc for the usage of UDFParameterValidator.

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception

This method is mainly used to customize UDTF. In this method, the user can do the following things:

  1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user.
  2. Set the strategy to access the raw data and set the output data type in UDTFConfigurations.
  3. Create resources, such as establishing external connections, opening files, etc.
UDFParameters

UDFParameters is used to parse UDF parameters in SQL statements (the part in parentheses after the UDF function name in SQL). The input parameters have two parts. The first part is data types of the time series that the UDF needs to process, and the second part is the key-value pair attributes for customization. Only the second part can be empty.

Example:

SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d;

Usage:

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception {
  String stringValue = parameters.getString("key1"); // iotdb
  Float floatValue = parameters.getFloat("key2"); // 123.45
  Double doubleValue = parameters.getDouble("key3"); // null
  int intValue = parameters.getIntOrDefault("key4", 678); // 678
  // do something
  
  // configurations
  // ...
}
UDTFConfigurations

You must use UDTFConfigurations to specify the strategy used by UDF to access raw data and the type of output sequence.

Usage:

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception {
  // parameters
  // ...
  
  // configurations
  configurations
    .setAccessStrategy(new RowByRowAccessStrategy())
    .setOutputDataType(Type.INT32);
}

The setAccessStrategy method is used to set the UDF's strategy for accessing the raw data, and the setOutputDataType method is used to set the data type of the output sequence.

setAccessStrategy

Note that the raw data access strategy you set here determines which transform method the framework will call. Please implement the transform method corresponding to the raw data access strategy. Of course, you can also dynamically decide which strategy to set based on the attribute parameters parsed by UDFParameters. Therefore, two transform methods are also allowed to be implemented in one UDF.

The following are the strategies you can set:

Interface definitionDescriptionThe transform Method to Call
RowByRowAccessStrategyProcess raw data row by row. The framework calls the transform method once for each row of raw data input. When UDF has only one input sequence, a row of input is one data point in the input sequence. When UDF has multiple input sequences, one row of input is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of null, but not all of them are null)void transform(Row row, PointCollector collector) throws Exception
SlidingTimeWindowAccessStrategyProcess a batch of data in a fixed time interval each time. We call the container of a data batch a window. The framework calls the transform method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of null, but not all of them are null)void transform(RowWindow rowWindow, PointCollector collector) throws Exception
SlidingSizeWindowAccessStrategyThe raw data is processed batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). We call the container of a data batch a window. The framework calls the transform method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of null, but not all of them are null)void transform(RowWindow rowWindow, PointCollector collector) throws Exception
SessionTimeWindowAccessStrategyThe raw data is processed batch by batch. We call the container of a data batch a window. The time interval between each two windows is greater than or equal to the sessionGap given by the user. The framework calls the transform method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of null, but not all of them are null)void transform(RowWindow rowWindow, PointCollector collector) throws Exception
StateWindowAccessStrategyThe raw data is processed batch by batch. We call the container of a data batch a window. In the state window, for text type or boolean type data, each value of the point in window is equal to the value of the first point in the window, and for numerical data, the distance between each value of the point in window and the value of the first point in the window is less than the threshold delta given by the user. The framework calls the transform method once for each raw data input window. There may be multiple rows of data in a window. Currently, we only support state window for one measurement, that is, a column of data.void transform(RowWindow rowWindow, PointCollector collector) throws Exception

RowByRowAccessStrategy: The construction of RowByRowAccessStrategy does not require any parameters.

The SlidingTimeWindowAccessStrategy is shown schematically below.

SlidingTimeWindowAccessStrategy: SlidingTimeWindowAccessStrategy has many constructors, you can pass 3 types of parameters to them:

  • Parameter 1: The display window on the time axis
  • Parameter 2: Time interval for dividing the time axis (should be positive)
  • Parameter 3: Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number)

The first type of parameters are optional. If the parameters are not provided, the beginning time of the display window will be set to the same as the minimum timestamp of the query result set, and the ending time of the display window will be set to the same as the maximum timestamp of the query result set.

The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis.

The relationship between the three types of parameters can be seen in the figure below. Please see the Javadoc for more details.

Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the transform method for the empty windows.

The SlidingSizeWindowAccessStrategy is shown schematically below.

SlidingSizeWindowAccessStrategy: SlidingSizeWindowAccessStrategy has many constructors, you can pass 2 types of parameters to them:

  • Parameter 1: Window size. This parameter specifies the number of data rows contained in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows.
  • Parameter 2: Sliding step. This parameter means the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number)

The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size.

The SessionTimeWindowAccessStrategy is shown schematically below. Time intervals less than or equal to the given minimum time interval sessionGap are assigned in one group

SessionTimeWindowAccessStrategy: SessionTimeWindowAccessStrategy has many constructors, you can pass 2 types of parameters to them:

  • Parameter 1: The display window on the time axis.
  • Parameter 2: The minimum time interval sessionGap of two adjacent windows.

The StateWindowAccessStrategy is shown schematically below. **For numerical data, if the state difference is less than or equal to the given threshold delta, it will be assigned in one group. **

StateWindowAccessStrategy has four constructors.

  • Constructor 1: For numerical data, there are 3 parameters: the time axis can display the start and end time of the time window and the threshold delta for the allowable change within a single window.
  • Constructor 2: For text data and boolean data, there are 3 parameters: the time axis can be provided to display the start and end time of the time window. For both data types, the data within a single window is same, and there is no need to provide an allowable change threshold.
  • Constructor 3: For numerical data, there are 1 parameters: you can only provide the threshold delta that is allowed to change within a single window. The start time of the time axis display time window will be defined as the smallest timestamp in the entire query result set, and the time axis display time window end time will be defined as The largest timestamp in the entire query result set.
  • Constructor 4: For text data and boolean data, you can provide no parameter. The start and end timestamps are explained in Constructor 3.

StateWindowAccessStrategy can only take one column as input for now.

Please see the Javadoc for more details.

setOutputDataType

Note that the type of output sequence you set here determines the type of data that the PointCollector can actually receive in the transform method. The relationship between the output data type set in setOutputDataType and the actual data output type that PointCollector can receive is as follows:

Output Data Type Set in setOutputDataTypeData Type that PointCollector Can Receive
INT32int
INT64long
FLOATfloat
DOUBLEdouble
BOOLEANboolean
TEXTjava.lang.String and org.apache.iotdb.udf.api.type.Binary

The type of output time series of a UDTF is determined at runtime, which means that a UDTF can dynamically determine the type of output time series according to the type of input time series.
Here is a simple example:

void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception {
  // do something
  // ...
  
  configurations
    .setAccessStrategy(new RowByRowAccessStrategy())
    .setOutputDataType(parameters.getDataType(0));
}

void transform(Row row, PointCollector collector) throws Exception

You need to implement this method when you specify the strategy of UDF to read the original data as RowByRowAccessStrategy.

This method processes the raw data one row at a time. The raw data is input from Row and output by PointCollector. You can output any number of data points in one transform method call. It should be noted that the type of output data points must be the same as you set in the beforeStart method, and the timestamps of output data points must be strictly monotonically increasing.

The following is a complete UDF example that implements the void transform(Row row, PointCollector collector) throws Exception method. It is an adder that receives two columns of time series as input. When two data points in a row are not null, this UDF will output the algebraic sum of these two data points.

import org.apache.iotdb.udf.api.UDTF;
import org.apache.iotdb.udf.api.access.Row;
import org.apache.iotdb.udf.api.collector.PointCollector;
import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations;
import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters;
import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy;
import org.apache.iotdb.udf.api.type.Type;

public class Adder implements UDTF {

  @Override
  public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) {
    configurations
        .setOutputDataType(TSDataType.INT64)
        .setAccessStrategy(new RowByRowAccessStrategy());
  }

  @Override
  public void transform(Row row, PointCollector collector) throws Exception {
    if (row.isNull(0) || row.isNull(1)) {
      return;
    }
    collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1));
  }
}

void transform(RowWindow rowWindow, PointCollector collector) throws Exception

You need to implement this method when you specify the strategy of UDF to read the original data as SlidingTimeWindowAccessStrategy or SlidingSizeWindowAccessStrategy.

This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and we call the container containing this batch of data a window. The raw data is input from RowWindow and output by PointCollector. RowWindow can help you access a batch of Row, it provides a set of interfaces for random access and iterative access to this batch of Row. You can output any number of data points in one transform method call. It should be noted that the type of output data points must be the same as you set in the beforeStart method, and the timestamps of output data points must be strictly monotonically increasing.

Below is a complete UDF example that implements the void transform(RowWindow rowWindow, PointCollector collector) throws Exception method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range.

import java.io.IOException;
import org.apache.iotdb.udf.api.UDTF;
import org.apache.iotdb.udf.api.access.Row;
import org.apache.iotdb.udf.api.access.RowWindow;
import org.apache.iotdb.udf.api.collector.PointCollector;
import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations;
import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters;
import org.apache.iotdb.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy;
import org.apache.iotdb.udf.api.type.Type;

public class Counter implements UDTF {

  @Override
  public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) {
    configurations
        .setOutputDataType(TSDataType.INT32)
        .setAccessStrategy(new SlidingTimeWindowAccessStrategy(
            parameters.getLong("time_interval"),
            parameters.getLong("sliding_step"),
            parameters.getLong("display_window_begin"),
            parameters.getLong("display_window_end")));
  }

  @Override
  public void transform(RowWindow rowWindow, PointCollector collector) {
    if (rowWindow.windowSize() != 0) {
      collector.putInt(rowWindow.windowStartTime(), rowWindow.windowSize());
    }
  }
}

void terminate(PointCollector collector) throws Exception

In some scenarios, a UDF needs to traverse all the original data to calculate the final output data points. The terminate interface provides support for those scenarios.

This method is called after all transform calls are executed and before the beforeDestory method is executed. You can implement the transform method to perform pure data processing (without outputting any data points), and implement the terminate method to output the processing results.

The processing results need to be output by the PointCollector. You can output any number of data points in one terminate method call. It should be noted that the type of output data points must be the same as you set in the beforeStart method, and the timestamps of output data points must be strictly monotonically increasing.

Below is a complete UDF example that implements the void terminate(PointCollector collector) throws Exception method. It takes one time series whose data type is INT32 as input, and outputs the maximum value point of the series.

import java.io.IOException;
import org.apache.iotdb.udf.api.UDTF;
import org.apache.iotdb.udf.api.access.Row;
import org.apache.iotdb.udf.api.collector.PointCollector;
import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations;
import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters;
import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy;
import org.apache.iotdb.udf.api.type.Type;

public class Max implements UDTF {

  private Long time;
  private int value;

  @Override
  public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) {
    configurations
        .setOutputDataType(TSDataType.INT32)
        .setAccessStrategy(new RowByRowAccessStrategy());
  }

  @Override
  public void transform(Row row, PointCollector collector) {
    if (row.isNull(0)) {
      return;
    }
    int candidateValue = row.getInt(0);
    if (time == null || value < candidateValue) {
      time = row.getTime();
      value = candidateValue;
    }
  }

  @Override
  public void terminate(PointCollector collector) throws IOException {
    if (time != null) {
      collector.putInt(time, value);
    }
  }
}

void beforeDestroy()

The method for terminating a UDF.

This method is called by the framework. For a UDF instance, beforeDestroy will be called after the last record is processed. In the entire life cycle of the instance, beforeDestroy will only be called once.

UDAF (User Defined Aggregation Function)

A complete definition of UDAF involves two classes, State and UDAF.

State Class

To write your own State, you need to implement the org.apache.iotdb.udf.api.State interface.

The following table shows all the interfaces available for user implementation.

Interface DefinitionDescriptionRequired to Implement
void reset()To reset the State object to its initial state, you need to fill in the initial values of the fields in the State class within this method as if you were writing a constructor.Required
byte[] serialize()Serializes State to binary data. This method is used for IoTDB internal State passing. Note that the order of serialization must be consistent with the following deserialization methods.Required
void deserialize(byte[] bytes)Deserializes binary data to State. This method is used for IoTDB internal State passing. Note that the order of deserialization must be consistent with the serialization method above.Required

The following section describes the usage of each interface in detail.

void reset()

This method resets the State to its initial state, you need to fill in the initial values of the fields in the State object in this method. For optimization reasons, IoTDB reuses State as much as possible internally, rather than creating a new State for each group, which would introduce unnecessary overhead. When State has finished updating the data in a group, this method is called to reset to the initial state as a way to process the next group.

In the case of State for averaging (aka avg), for example, you would need the sum of the data, sum, and the number of entries in the data, count, and initialize both to 0 in the reset() method.

class AvgState implements State {
  double sum;

  long count;

  @Override
  public void reset() {
    sum = 0;
    count = 0;
  }
  
  // other methods
}
byte[] serialize()/void deserialize(byte[] bytes)

These methods serialize the State into binary data, and deserialize the State from the binary data. IoTDB, as a distributed database, involves passing data among different nodes, so you need to write these two methods to enable the passing of the State among different nodes. Note that the order of serialization and deserialization must be the consistent.

In the case of State for averaging (aka avg), for example, you can convert the content of State to byte[] array and read out the content of State from byte[] array in any way you want, the following shows the code for serialization/deserialization using ByteBuffer introduced by Java8:

@Override
public byte[] serialize() {
  ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES);
  buffer.putDouble(sum);
  buffer.putLong(count);

  return buffer.array();
}

@Override
public void deserialize(byte[] bytes) {
  ByteBuffer buffer = ByteBuffer.wrap(bytes);
  sum = buffer.getDouble();
  count = buffer.getLong();
}

UDAF Classes

To write a UDAF, you need to implement the org.apache.iotdb.udf.api.UDAF interface.

The following table shows all the interfaces available for user implementation.

Interface definitionDescriptionRequired to Implement
void validate(UDFParameterValidator validator) throws ExceptionThis method is mainly used to validate UDFParameters and it is executed before beforeStart(UDFParameters, UDTFConfigurations) is called.Optional
void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws ExceptionInitialization method that invokes user-defined initialization behavior before UDAF processes the input data. Unlike UDTF, configuration is of type UDAFConfiguration.Required
State createState()To create a State object, usually just call the default constructor and modify the default initial value as needed.Required
void addInput(State state, Column[] columns, BitMap bitMap)Update State object according to the incoming data Column[] in batch, note that column[0] always represents the time column. In addition, BitMap represents the data that has been filtered out before, you need to manually determine whether the corresponding data has been filtered out when writing this method.Required
void combineState(State state, State rhs)Merge rhs state into state state. In a distributed scenario, the same set of data may be distributed on different nodes, IoTDB generates a State object for the partial data on each node, and then calls this method to merge it into the complete State.Required
void outputFinal(State state, ResultValue resultValue)Computes the final aggregated result based on the data in State. Note that according to the semantics of the aggregation, only one value can be output per group.Required
void beforeDestroy() This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance.Optional

In the life cycle of a UDAF instance, the calling sequence of each method is as follows:

  1. State createState()
  2. void validate(UDFParameterValidator validator) throws Exception
  3. void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception
  4. void addInput(State state, Column[] columns, BitMap bitMap)
  5. void combineState(State state, State rhs)
  6. void outputFinal(State state, ResultValue resultValue)
  7. void beforeDestroy()

Similar to UDTF, every time the framework executes a UDAF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDAF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDAF without considering the influence of concurrency and other factors.

The usage of each interface will be described in detail below.

void validate(UDFParameterValidator validator) throws Exception

Same as UDTF, the validate method is used to validate the parameters entered by the user.

In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification.

void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception

The beforeStart method does the same thing as the UDAF:

  1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user.
  2. Set the strategy to access the raw data and set the output data type in UDAFConfigurations.
  3. Create resources, such as establishing external connections, opening files, etc.

The role of the UDFParameters type can be seen above.

UDAFConfigurations

The difference from UDTF is that UDAF uses UDAFConfigurations as the type of configuration object.

Currently, this class only supports setting the type of output data.

void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception {
  // parameters
  // ...

  // configurations
  configurations
    .setOutputDataType(Type.INT32); }
}

The relationship between the output type set in setOutputDataType and the type of data output that ResultValue can actually receive is as follows:

The output type set in setOutputDataTypeThe output type that ResultValue can actually receive
INT32int
INT64long
FLOATfloat
DOUBLEdouble
BOOLEANboolean
TEXTorg.apache.iotdb.udf.api.type.Binary

The output type of the UDAF is determined at runtime. You can dynamically determine the output sequence type based on the input type.

Here is a simple example:

void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception {
  // do something
  // ...
  
  configurations
    .setOutputDataType(parameters.getDataType(0));
}
State createState()

This method creates and initializes a State object for UDAF. Due to the limitations of the Java language, you can only call the default constructor for the State class. The default constructor assigns a default initial value to all the fields in the class, and if that initial value does not meet your requirements, you need to initialize them manually within this method.

The following is an example that includes manual initialization. Suppose you want to implement an aggregate function that multiply all numbers in the group, then your initial State value should be set to 1, but the default constructor initializes it to 0, so you need to initialize State manually after calling the default constructor:

public State createState() {
  MultiplyState state = new MultiplyState();
  state.result = 1;
  return state;
}
void addInput(State state, Column[] columns, BitMap bitMap)

This method updates the State object with the raw input data. For performance reasons, also to align with the IoTDB vectorized query engine, the raw input data is no longer a data point, but an array of columns Column[]. Note that the first column (i.e. column[0]) is always the time column, so you can also do different operations in UDAF depending on the time.

Since the input parameter is not of a single data point type, but of multiple columns, you need to manually filter some of the data in the columns, which is why the third parameter, BitMap, exists. It identifies which of these columns have been filtered out, so you don't have to think about the filtered data in any case.

Here's an example of addInput() that counts the number of items (aka count). It shows how you can use BitMap to ignore data that has been filtered out. Note that due to the limitations of the Java language, you need to do the explicit cast the State object from type defined in the interface to a custom State type at the beginning of the method, otherwise you won't be able to use the State object.

public void addInput(State state, Column[] column, BitMap bitMap) {
  CountState countState = (CountState) state;

  int count = column[0].getPositionCount();
  for (int i = 0; i < count; i++) {
    if (bitMap != null && !bitMap.isMarked(i)) {
      continue;
    }
    if (!column[1].isNull(i)) {
      countState.count++;
    }
  }
}
void combineState(State state, State rhs)

This method combines two States, or more precisely, updates the first State object with the second State object. IoTDB is a distributed database, and the data of the same group may be distributed on different nodes. For performance reasons, IoTDB will first aggregate some of the data on each node into State, and then merge the States on different nodes that belong to the same group, which is what combineState does.

Here's an example of combineState() for averaging (aka avg). Similar to addInput, you need to do an explicit type conversion for the two States at the beginning. Also note that you are updating the value of the first State with the contents of the second State.

public void combineState(State state, State rhs) {
  AvgState avgState = (AvgState) state;
  AvgState avgRhs = (AvgState) rhs;

  avgState.count += avgRhs.count;
  avgState.sum += avgRhs.sum;
}
void outputFinal(State state, ResultValue resultValue)

This method works by calculating the final result from State. You need to access the various fields in State, derive the final result, and set the final result into the ResultValue object.IoTDB internally calls this method once at the end for each group. Note that according to the semantics of aggregation, the final result can only be one value.

Here is another outputFinal example for averaging (aka avg). In addition to the forced type conversion at the beginning, you will also see a specific use of the ResultValue object, where the final result is set by setXXX (where XXX is the type name).

public void outputFinal(State state, ResultValue resultValue) {
  AvgState avgState = (AvgState) state;

  if (avgState.count != 0) {
    resultValue.setDouble(avgState.sum / avgState.count);
  } else {
    resultValue.setNull();
  }
}
void beforeDestroy()

The method for terminating a UDF.

This method is called by the framework. For a UDF instance, beforeDestroy will be called after the last record is processed. In the entire life cycle of the instance, beforeDestroy will only be called once.

Maven Project Example

If you use Maven, you can build your own UDF project referring to our udf-example module. You can find the project hereopen in new window.

UDF Registration

The process of registering a UDF in IoTDB is as follows:

  1. Implement a complete UDF class, assuming the full class name of this class is org.apache.iotdb.udf.ExampleUDTF.
  2. Package your project into a JAR. If you use Maven to manage your project, you can refer to the Maven project example above.
  3. Make preparations for registration according to the registration mode. For details, see the following example.
  4. You can use following SQL to register UDF.
CREATE FUNCTION <UDF-NAME> AS <UDF-CLASS-FULL-PATHNAME> (USING URI URI-STRING)?

Example: register UDF named example, you can choose either of the following two registration methods

No URI

Prepare:
When use this method to register,you should put JAR to directory iotdb-server-1.0.0-all-bin/ext/udf(directory can config).
Note,you should put JAR to this directory of all DataNodes if using Cluster

SQL:

CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample'
Using URI

Prepare:
When use this method to register,you need to upload the JAR to URI server and ensure the IoTDB instance executing this registration statement has access to the URI server.
Note,you needn't place JAR manually,IoTDB will download the JAR and sync it.

SQL:

CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar'

Note

Since UDF instances are dynamically loaded through reflection technology, you do not need to restart the server during the UDF registration process.

UDF function names are not case-sensitive.

Please ensure that the function name given to the UDF is different from all built-in function names. A UDF with the same name as a built-in function cannot be registered.

We recommend that you do not use classes that have the same class name but different function logic in different JAR packages. For example, in UDF(UDAF/UDTF): udf1, udf2, the JAR package of udf1 is udf1.jar and the JAR package of udf2 is udf2.jar. Assume that both JAR packages contain the org.apache.iotdb.udf.ExampleUDTF class. If you use two UDFs in the same SQL statement at the same time, the system will randomly load either of them and may cause inconsistency in UDF execution behavior.

UDF Deregistration

The following shows the SQL syntax of how to deregister a UDF.

DROP FUNCTION <UDF-NAME>

Here is an example:

DROP FUNCTION example

UDF Queries

The usage of UDF is similar to that of built-in aggregation functions.

Basic SQL syntax support

  • Support SLIMIT / SOFFSET
  • Support LIMIT / OFFSET
  • Support queries with time filters
  • Support queries with value filters

Queries with * in SELECT Clauses

Assume that there are 2 time series (root.sg.d1.s1 and root.sg.d1.s2) in the system.

  • SELECT example(*) from root.sg.d1

Then the result set will include the results of example (root.sg.d1.s1) and example (root.sg.d1.s2).

  • SELECT example(s1, *) from root.sg.d1

Then the result set will include the results of example(root.sg.d1.s1, root.sg.d1.s1) and example(root.sg.d1.s1, root.sg.d1.s2).

  • SELECT example(*, *) from root.sg.d1

Then the result set will include the results of example(root.sg.d1.s1, root.sg.d1.s1), example(root.sg.d1.s2, root.sg.d1.s1), example(root.sg.d1.s1, root.sg.d1.s2) and example(root.sg.d1.s2, root.sg.d1.s2).

Queries with Key-value Attributes in UDF Parameters

You can pass any number of key-value pair parameters to the UDF when constructing a UDF query. The key and value in the key-value pair need to be enclosed in single or double quotes. Note that key-value pair parameters can only be passed in after all time series have been passed in. Here is a set of examples:

SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1;
SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1;

Nested Queries

SELECT s1, s2, example(s1, s2) FROM root.sg.d1;
SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN;
SELECT s1 * example(* / s1 + s2) FROM root.sg.d1;
SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1;

Show All Registered UDFs

SHOW FUNCTIONS

User Permission Management

There are 1 types of user permissions related to UDF: USE_UDF

  • Only users with this permission are allowed to register UDFs
  • Only users with this permission are allowed to deregister UDFs
  • Only users with this permission are allowed to use UDFs for queries

For more user permissions related content, please refer to Account Management Statements.

Configurable Properties

You can use udf_lib_dir to config udf lib directory.
When querying by a UDF, IoTDB may prompt that there is insufficient memory. You can resolve the issue by configuring udf_initial_byte_array_length_for_memory_control, udf_memory_budget_in_mb and udf_reader_transformer_collector_memory_proportion in iotdb-datanode.properties and restarting the server.

Contribute UDF

This part mainly introduces how external users can contribute their own UDFs to the IoTDB community.

Prerequisites

  1. UDFs must be universal.

    The "universal" mentioned here refers to: UDFs can be widely used in some scenarios. In other words, the UDF function must have reuse value and may be directly used by other users in the community.

    If you are not sure whether the UDF you want to contribute is universal, you can send an email to dev@iotdb.apache.org or create an issue to initiate a discussion.

  2. The UDF you are going to contribute has been well tested and can run normally in the production environment.

What you need to prepare

  1. UDF source code
  2. Test cases
  3. Instructions
UDF Source Code
  1. Create the UDF main class and related classes in iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin or in its subfolders.
  2. Register your UDF in iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin/BuiltinTimeSeriesGeneratingFunction.java.
Test Cases

At a minimum, you need to write integration tests for the UDF.

You can add a test class in integration-test/src/test/java/org/apache/iotdb/db/it/udf.

Instructions

The instructions need to include: the name and the function of the UDF, the attribute parameters that must be provided when the UDF is executed, the applicable scenarios, and the usage examples, etc.

The instructions should be added in docs/UserGuide/Operation Manual/DML Data Manipulation Language.md.

Submit a PR

When you have prepared the UDF source code, test cases, and instructions, you are ready to submit a Pull Request (PR) on Githubopen in new window. You can refer to our code contribution guide to submit a PR: Pull Request Guideopen in new window.

Known Implementations

Built-in UDF

  1. Aggregate Functions, such as SUM. For details and examples, see the document Aggregate Functions.
  2. Arithmetic Functions, such as SIN. For details and examples, see the document Arithmetic Operators and Functions.
  3. Comparison Functions, such as ON_OFF. For details and examples, see the document Comparison Operators and Functions.
  4. String Processing Functions, such as STRING_CONTAINS. For details and examples, see the document String Processing.
  5. Data Type Conversion Function, such as CAST. For details and examples, see the document Data Type Conversion Function.
  6. Constant Timeseries Generating Functions, such as CONST. For details and examples, see the document Constant Timeseries Generating Functions.
  7. Selector Functions, such as TOP_K. For details and examples, see the document Selector Functions.
  8. Continuous Interval Functions, such as ZERO_DURATION. For details and examples, see the document Continuous Interval Functions.
  9. Variation Trend Calculation Functions, such as TIME_DIFFERENCE. For details and examples, see the document Variation Trend Calculation Functions.
  10. Sample Functions, such as M4. For details and examples, see the document Sample Functions.
  11. Change Points Function, such as CHANGE_POINTS. For details and examples, see the document Time-Series.

Data Quality Function Library

About

For applications based on time series data, data quality is vital. UDF Library is IoTDB User Defined Functions (UDF) about data quality, including data profiling, data quality evalution and data repairing. It effectively meets the demand for data quality in the industrial field.

Quick Start

The functions in this function library are not built-in functions, and must be loaded into the system before use.

  1. Downloadopen in new window the JAR with all dependencies and the script of registering UDF.
  2. Copy the JAR package to ext\udf under the directory of IoTDB system (Please put JAR to this directory of all DataNodes if you use Cluster).
  3. Run sbin\start-server.bat (for Windows) or sbin\start-server.sh (for Linux or MacOS) to start IoTDB server.
  4. Copy the script to the directory of IoTDB system (under the root directory, at the same level as sbin), modify the parameters in the script if needed and run it to register UDF.
Implemented Functions
  1. Data Quality related functions, such as Completeness. For details and examples, see the document Data-Quality.
  2. Data Profiling related functions, such as ACF. For details and examples, see the document Data-Profiling.
  3. Anomaly Detection related functions, such as IQR. For details and examples, see the document Anomaly-Detection.
  4. Frequency Domain Analysis related functions, such as Conv. For details and examples, see the document Frequency-Domain.
  5. Data Matching related functions, such as DTW. For details and examples, see the document Data-Matching.
  6. Data Repairing related functions, such as TimestampRepair. For details and examples, see the document Data-Repairing.
  7. Series Discovery related functions, such as ConsecutiveSequences. For details and examples, see the document Series-Discovery.
  8. Machine Learning related functions, such as AR. For details and examples, see the document Machine-Learning.

Q&A

Q1: How to modify the registered UDF?

A1: Assume that the name of the UDF is example and the full class name is org.apache.iotdb.udf.ExampleUDTF, which is introduced by example.jar.

  1. Unload the registered function by executing DROP FUNCTION example.
  2. Delete example.jar under iotdb-server-1.0.0-all-bin/ext/udf.
  3. Modify the logic in org.apache.iotdb.udf.ExampleUDTF and repackage it. The name of the JAR package can still be example.jar.
  4. Upload the new JAR package to iotdb-server-1.0.0-all-bin/ext/udf.
  5. Load the new UDF by executing CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF".

Copyright © 2024 The Apache Software Foundation.
Apache IoTDB, IoTDB, Apache, the Apache feather logo, and the Apache IoTDB project logo are either registered trademarks or trademarks of The Apache Software Foundation in all countries

Have a question? Connect with us on QQ, WeChat, or Slack. Join the community now.