Skip to main content

Data Region

About 1 min

Data Region

Background

The database is specified by the user display.
Use the statement "CREATE DATABASE" to create the database.
Each database has a corresponding StorageGroupProcessor.

To ensure eventually consistency, a insert lock (exclusive lock) is used to synchronize each insert request in each database.
So the server side parallelism of data ingestion is equal to the number of database.

Problem

From background, we can infer that the parallelism of data ingestion of IoTDB is max(num of client, server side parallelism), which equals to max(num of client, num of database)

The concept of database usually is related to real world entity such as factory, location, country and so on.
The number of databases may be small which makes the parallelism of data ingestion of IoTDB insufficient. We can't jump out of this dilemma even we start hundreds of client for ingestion.

Solution

Our idea is to group devices into buckets and change the granularity of synchronization from database level to device buckets level.

In detail, we use hash to group different devices into buckets called data region.
For example, one device called "root.sg.d"(assume it's database is "root.sgopen in new window") is belonged to data region "root.sgopen in new window.[hash("root.sg.d") mod num_of_data_region]"

Usage

To use data region, you can set this config below:

data_region_num

Recommended value is [data region number] = [CPU core number] / [user-defined database number]

For more information, you can refer to this page.

Copyright © 2024 The Apache Software Foundation.
Apache and the Apache feather logo are trademarks of The Apache Software Foundation

Have a question? Connect with us on QQ, WeChat, or Slack. Join the community now.

We use Google Analytics to collect anonymous, aggregated usage information.