WO2021091489A1 - Method and apparatus for storing time series data, and server and storage medium thereof - Google Patents

Method and apparatus for storing time series data, and server and storage medium thereof Download PDF

Info

Publication number
WO2021091489A1
WO2021091489A1 PCT/SG2020/050634 SG2020050634W WO2021091489A1 WO 2021091489 A1 WO2021091489 A1 WO 2021091489A1 SG 2020050634 W SG2020050634 W SG 2020050634W WO 2021091489 A1 WO2021091489 A1 WO 2021091489A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
data
array
query
stored
Prior art date
Application number
PCT/SG2020/050634
Other languages
French (fr)
Inventor
Deyun Wu
Liyi WANG
Original Assignee
Envision Digital International Pte. Ltd.
Shanghai Envision Digital Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Envision Digital International Pte. Ltd., Shanghai Envision Digital Co., Ltd. filed Critical Envision Digital International Pte. Ltd.
Publication of WO2021091489A1 publication Critical patent/WO2021091489A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Definitions

  • Embodiments of the present disclosure relate to the field of databases, and in particular to a method and apparatus for storing time series data, and a server and a storage medium thereof.
  • Time series data refers to a data series recorded in chronological order by the same indicator.
  • a time series database is a specialized database for storing and managing the time series data.
  • each storage policy includes multiple storage areas for storing data within different time ranges.
  • Data files are periodically compressed by log files in the storage areas.
  • a system finds the hit area according to an instruction, decompresses the relevant data file, merges the required data in a chronological order, and return a final result.
  • Embodiments of the present disclosure provide a method for storing time series data and apparatus, a server, and a storage medium.
  • the technical solutions are as follows.
  • embodiments of the present disclosure provide a method for storing time series data.
  • the method includes:
  • inventions of the present disclosure provide an apparatus for storing time series data.
  • the apparatus includes:
  • an acquiring module configured to acquire data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval;
  • a first determining module configured to determine a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored;
  • a second determining module configured to determine a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group includes a plurality of databases;
  • a storing module configured to write the data to be stored into a target array in the target database of the target logic group.
  • inventions of the present disclosure provide a server.
  • the server includes a processor and a memory. At least one instruction, at least one program and a code set or an instruction set are stored in the memory and loaded and executed by the processor to implement the method for storing time series data defined in the above aspect.
  • embodiments of the present disclosure provide a computer-readable storage medium. At least one instruction, at least one program and a code set or an instruction set are stored in the computer-readable storage medium and loaded and executed by a processor to implement the method for storing time series data defined in the above aspect.
  • the device logic primary key of the target device is determined by acquiring the data type of the data to be stored collected at equal time intervals and the device identifier of the target device that collects the data.
  • the target logic group and the target database are determined according to the device logic primary key and the timestamp of the data to be stored.
  • the data to be stored is written into the target array in the target database.
  • FIG. 1 is a schematic diagram of an implementing environment in accordance with one exemplary embodiment
  • FIG. 2 is a flowchart of a method for storing time series data in accordance with one exemplary embodiment
  • FIG. 3 is a flowchart of a method for storing time series data in accordance with another exemplary embodiment
  • FIG. 4 is a schematic diagram of a logical grouping mode in accordance with one exemplary embodiment
  • FIG. 5 is a flowchart of data query in a method for storing time series data in accordance with another exemplary embodiment
  • FIG. 6 is a structural block diagram of an apparatus for storing time series data in accordance with one exemplary embodiment.
  • FIG. 7 is a schematic structural diagram of a database server in accordance with one exemplary embodiment.
  • time series database software such as OpenTSDB and InfluxDB
  • Each storage policy includes a plurality of storage areas for storing data within a certain time range.
  • Each storage area includes a memory cache area, a log file, and one or more data files.
  • Different compression policies are adopted for different data types.
  • the data files are periodically compressed by the log files according to the compression policies.
  • a database finds all the hit areas according to a reading logic, and determines whether it is read directly from the memory cache area or the data files. If it needs to be read from the data files, the relevant data file is decompressed.
  • the data is located with an index in the data file.
  • a database system merges the data retured by all the hit areas in chronological order. Finally, a query result is returned.
  • an embodiment of the present disclosure provides a method for storing data.
  • FIG. 1 a schematic diagram of an implementing environment in accordance with one exemplary embodiment of the present disclosure is illustrated.
  • the implementing environment includes a collecting device 101, a database (DB) server 102 and a querying terminal 103.
  • DB database
  • the collecting device 101 is an apparatus with a data collecting function. Data acquired by the collecting device 101 is time series data with equal time intervals.
  • the collecting device 101 may be a new energy apparatus with such sensors as an anemometer, a temperature and humidity detector and a photovoltaic sensor, and fbr example, it may be a wind-driven generator or a photovoltaic panel. As shown in FIG. 1, the collecting device 101 is a wind-driven generator with an anemometer.
  • the collecting device 101 and the database server 102 are connected by a wired or wireless network.
  • the database server 102 is a storage device for storing data collected by the collecting device 101, and may be one server, a server cluster consisting of a plurality of servers or a cloud server.
  • the database server 102 acquires the data sent by the collecting device 101, compresses and stores the acquired data in a corresponding database, and decompresses the data and sends the decompressed data to the querying terminal 103 during data query.
  • the data stored in the database server 102 is the time series data.
  • the database server 102 and the querying terminal 103 are connected by a wired or wireless network.
  • the querying terminal 103 is a device with a data query function.
  • the querying terminal 103 sends a query instruction including a query condition to the database server 102, the database server 102 queries the corresponding time series data according to the query condition and feeds the queried time series data back to the querying terminal 103.
  • the querying terminal 103 displays the received time series data in the form of a diagram.
  • the querying terminal 103 may be a personal computer, a smart phone, a tablet PC or the like. As shown in FIG. 1, the querying terminal 103 is a personal computer.
  • FIG. 2 a flowchart of a method for storing time series data in accordance with one exemplary embodiment of the present disclosure is illustrated. This embodiment is described using the scenario where the method is applied to a database server as an example. The method includes the following steps.
  • step 201 data to be stored collected by a target device is acquired.
  • the data to be stored is time series data collected by the target device at a preset time interval.
  • the database server acquires the data to be stored, which is the time series data, namely, a data series recorded in chronological order with the same data type.
  • the adjacent data has the same time interval.
  • the database server acquires a wind speed value collected by a wind-driven generator via an anemometer.
  • the two adjacent wind speed values have the same time interval.
  • the anemometer collects the wind speed every ten minutes.
  • a device logic primary key corresponding to the target device is determined according to a device identifier of the target device and a data type of the data to be stored.
  • the device logic primary key corresponding to the target device needs to be determined according to the device identifier of the target device and the data type of the data to be stored.
  • the database server has a main data table, which includes the device logic primary key, as well as the device identifier, the time interval and the data type corresponding to the device logic primary key. The device logic primary key is searched in the main data table according to the device identifier and the data type of the data to be stored when the database server acquires the data to be stored.
  • the main data table is as shown in Table 1.
  • a target logic group and a target database are determined according to the device logic primary key and a timestamp corresponding to the data to be stored.
  • the target database belongs to the target logic group.
  • the target logic group includes a plurality of databases.
  • the database server adopts a distributed storage solution. That is, the data is stored in the different databases, and in case of increase of the data, the only requirement is to add a logic group constituted by a plurality of databases without decompressing a data file where history data is stored and migrating the history data to the new databases.
  • the same logic group has a plurality of databases.
  • the database server determines the target logic group of the data to be stored according to the device logic primary key and the timestamp and stores the data to be stored in the plurality of databases of the target logic group.
  • step 204 the data to be stored is written into a target array in the target database of the target logic group.
  • a database of each logic group adopts an array with a specified length to store the data.
  • the database server writes the data to be stored in the target array of the target database to complete data storage after determining the target logic group and the target database.
  • each logic group has three databases. If the database server determines that the current data to be stored corresponds to a No. 2 logic group, the data to be stored is divided into three portions according to the timestamp of the data to be stored. The three portions are respectively stored in the three databases of the No. 2 logic group. Besides, each portion is written into a target array of the respective database.
  • the device logic primary key of the target device is determined by acquiring the data type of the data to be stored collected at equal time intervals and the device identifier of the target device that collects the data.
  • the target logic group and the target database are determined according to the device logic primary key and the timestamp of the data to be stored.
  • the data to be stored is written into the target array in the target database.
  • FIG. 3 a flowchart of a method for storing time series data in accordance with another exemplary embodiment of the present disclosure is illustrated. This embodiment is described using the scenario where the method is applied to a database server as an example.
  • the method for storing time series data includes the following steps.
  • step 301 data to be stored collected by a target device is acquired.
  • the data to be stored is time series data collected by the target device at a preset time interval.
  • step 301 For the implementation of step 301, reference may be made to step 201, which is not repeated in this embodiment.
  • a device xOOl collects a wind speed every 10 minutes from T00:00:00 on August 10, 2019 to T11:20:00 on August 11, 2019.
  • 201 data is collected in total and sent to the database server in the form of a data table as shown in Table 2.
  • a device logic primary key corresponding to the target device is determined according to a device identifier of the target device and a data type of the data to be stored.
  • step 302 For the implementation of step 302, reference may be made to step 202, which is not repeated in this embodiment.
  • the database server obtains that the corresponding device logic primary key ID1 is 0 by searching in a main data table according to the data type, wind speed, and the device identifier, xOOl. Relevant information of the main data table is as shown in Table 3.
  • step 303 a time index corresponding to the data to be stored is determined according to a timestamp.
  • the database server obtains the timestamp of each data to be stored while acquiring the data to be stored and determines the corresponding time index according to the timestamp.
  • step 303 may include the following sub-steps.
  • the database server calculates a first time difference between a timestamp of the first data to be stored and the initial timestamp and a second time difference between a timestamp of the last data to be stored and the initial timestamp, such that a range of the time index corresponding to the data to be stored may be conveniently calculated according to the first time difference and the second time difference.
  • a collecting device with a device identifier of xOOl collects a wind speed of a certain place every ten minutes from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019.
  • the database server works out in the unit of second that the time difference between the timestamp of the data to be stored and the initial timestamp is 1565395200 seconds.
  • the time index is determined according to the time difference, the preset time interval and an array length of a target array.
  • the time index is obtained by a rounding calculation.
  • the database server determines the corresponding database by calculating the time index of the data to be stored, and stores the data in the corresponding database in chronological order by effectively using the characteristic of equal time interval. Since it is unnecessary to store the timestamp, data storage resources of the database server are saved.
  • the time index of the 201 data to be stored shown in Table 2 ranges from 26089 to 26091.
  • step 304 an array identifier of the target array is determined according to the time index and the device logic primary key.
  • the database server has a time index table. After working out the time index of the data to be stored, the database server determines the array identifier of the target array in the time index table according to the time index and the device logic primary key.
  • the array identifier is an integer.
  • the time index table is as shown in Table 4.
  • the database server finds in the time index table that the array identifier ID2 of the corresponding target array is 0. Relevant information of the time index table is as shown in Table 5.
  • step 305 a target logic group and a target database are determined according to the array identifier.
  • the database server has a data storage table. After determining the array identifier of the data to be stored, the database server searches the target array corresponding to the target database of the target logic group in the data storage table according to the array identifier.
  • the data storage table is as shown in the following table.
  • step 305 may include the following sub-steps.
  • the target logic group is determined according to a numerical interval to which the array identifier belongs. Different logic groups correspond to different numerical intervals.
  • the database server classifies the logic groups according to the numerical interval of the array identifier. For example, every n arrays constitute one logic group, a plurality of databases is distributed in each logic group, and the n arrays are uniformly distributed in a plurality of databases. After determining the target logic group of the data to be stored, the database server stores the data to be stored in a plurality of databases of the target logic group.
  • FIG. 4 illustrates a logic grouping mode of the database server. Classification is based on the numerical interval of ID2, and every 2,000 arrays constitute one logic group. When the data is increased to exceed the capacity of all the current logic groups, one logic group only needs to be horizontally added such that the excessive data is stored in the new logic group without migrating relevant history data.
  • the value of ID2 of the data to be stored collected by the device xOOl ranges from 0 to 2.
  • 201 data is all stored in the first logic group.
  • the target database is determined according to the array identifier and the number of the databases of the target logic group.
  • the target database is obtained by a Hash modulo calculation.
  • a logic group includes a plurality of databases.
  • the database server determines the target logic group, the specific target database needs to be located.
  • the database server performs a Hash modulo calculation on the value of the array identifier according to the number of the databases of the target logic group to determine the target database.
  • the database server classifies 2,000 arrays as one logic group. Each logic group includes three databases. The database server determines the target database where the target array is located after performing the Hash modulo calculation on ID2 by 3.
  • the data to be stored collected by the device xOOl is stored in the three target arrays of the first logic group, whose array identifiers are respectively 0, 1 and 2.
  • the Hash modulo calculation it is determined that the data whose ID2 is 0 is stored in DB0, the data whose ID2 is 1 is stored in DB1 and the data whose ID2 is 2 is stored in DB2.
  • step 306 it is determined whether the target array exists or not.
  • the database server needs to determine whether the target array exists in the target database. For two different cases, it is necessary to adopt different methods for data storage.
  • step 307 if the target array exists, the target array is decompressed from the target database, the data to be stored is written into the decompressed target array, and the decompressed target array is compressed.
  • the database server judges whether the target array exists or not after determining the target array. If the target array exists, the target array is decompressed first. Then, the data to be stored is written into the decompressed target array, and the target array is compressed.
  • the database server adopts an XORs compression algorithm to compress the array, and its compression efficiency ranges from 70% to 95%.
  • step 307 may include the following sub-steps.
  • a data start of the data to be stored in the target array is determined according to the timestamp.
  • the target array When the target array exists, it indicates that the data is stored in part of data storage bits of the target array. Thus, in order to prevent the data stored later from affecting the stored data, the database server needs to calculate the data start of the data to be stored in the target array.
  • the database server determines a storage location using a time characteristic of the data to be stored. First, a modulo calculation is performed on the time difference between the timestamp of the data to be stored and the initial timestamp, the preset time interval and the array length of the target array to obtain the number of data that the target array may store.
  • the data with the same number in the data to be stored is stored in a vacant bit of the target array. For example, when the target array may store n data, and n is less than or equal to the array length, the first n data in the data to be stored is stored in the target array, and other data is stored in other target arrays.
  • the data to be stored is written into the decompressed target array from the data start.
  • the database server After working out the data start of the data to be stored, the database server writes the data to be stored, required to be stored in the target array, into a corresponding location of the target array.
  • a collecting device with a device identifier of xOOl collects a wind speed of a certain place every ten minutes from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019. There are 201 data required to be stored in total. After working out the time index 26089 and the corresponding array identifier, the database server works out that the target array may only store 8 data at present by a modulo calculation.
  • the database server writes 8 wind speeds from 00:00:00 on August 10, 2019 to 00:01: 10 on August 10, 2019 into the target array corresponding to the time index 26089, writes 100 data from 00:01:20 on August 10, 2019 into the target array corresponding to the time index 26090, and writes the rest 93 data into the first 93 bits of the target array corresponding to the time index 26091.
  • the database server detects that the target array corresponding to the time index 26089 exists, and stores the data in the form of a table.
  • a storage table of the data to be stored collected by the device xOOl is as shown in Table 7.
  • the first 8 pieces of data collected by xOOl is written into the last 8 bits of the target array whose ID2 is 0 of the target logic group and compressed and stored in DB0.
  • NaN represents a portion of an array, where the date is not stored.
  • step 308 if the target array does not exist, the target array is created in the target database, the data to be stored is written into the target array, and the target array is compressed. [0095] When the database server detects that the target array does not exist at present, it needs to create the target array in the target database, to write a corresponding portion of the data to be stored into the target array and to compress the target array using an XORs algorithm to complete data storage.
  • the database server detects, after storing the first 8 data collected by xOOl in the target array corresponding to the time index 26089, that the target arrays corresponding to the time indexes 26090 and 26091 do not exist, the target arrays whose ID2 is 1 and 2 are respectively created in DB1 and DB2 of the target logic group.
  • the 9th to 108th data collected by xOOl are written into a table in which ID2 is 1, as shown in Table. 8.
  • the rest 93 data is written into a table in which ID2 is 2, as shown in Table 9.
  • NaN represents data stored in the array 2 during the subsequent data storage process.
  • the data to be stored is stored in the plurality of databases by distributed storage.
  • the target array and the data start in the target database are calculated according to the timestamp and the time interval of the data to be stored.
  • Each location represents data of corresponding time.
  • storage of the timestamp is avoided, saving the data storage resources.
  • 100 times of the storage capacity of the database may be reduced to the utmost extent.
  • FIG. 5 a flowchart of data query in a method for storing time series data in accordance with another exemplary embodiment is illustrated. This embodiment takes that the method is applied to a database server as an example for explanation.
  • the method for storing time series data includes the following steps.
  • a query instruction is received.
  • the query instruction includes a query timestamp, a query device identifier and a query data type.
  • the database server receives the query instruction from a client.
  • a user inputs the query timestamp, the query device identifier and the query data type into the client.
  • a query device logic primary key corresponding to a target device is determined according to the query device identifier and the query data type.
  • the database server searches a corresponding query device logic primary key ID 1 in a main data table according to the query device identifier and the query data type.
  • the database server finds ID1 corresponding to xOOl and the wind speed in the main data table and determines ID 1 as the query device logic primary key.
  • ID 1 the query device logic primary key.
  • step 503 a query logic group and a query database are determined according to the query device logic primary key and the query timestamp, and a query array in the query database is decompressed.
  • the database server determines a time index corresponding to query data according to the query timestamp.
  • a preset time interval of the query data is determined by the query device logic primary key first. Then, the time index is determined according to a time difference between the query timestamp and an initial timestamp, the preset time interval and an array length of the query array.
  • An array identifier of the query array is determined according to the time index and the query device logic primary key. Thus, a query logic group and a query database are determined. The query array corresponding to the query database is decompressed.
  • a wind speed in a certain place from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019 is queried.
  • ID1 and the preset time interval of 10 minutes are determined according to the query device identifier and the query data type, wind speed.
  • a time difference between a timestamp of data to be queried and the initial timestamp is 1565395200 seconds through calculation.
  • the corresponding query array is determined.
  • step 504 query data is extracted from the decompressed query array according to the query timestamp.
  • the database server needs to determine a start location and an end location of the query data in the query array according to the query timestamp first, then, extracts the query data between the two locations, and returs the extracted query data to the client. After that, the query array is compressed again.
  • the database server determines that the data to be queried is stored in three databases of the first logic group according to the query timestamp of 00:00:00 on August 10, 2019.
  • the array identifiers of the query arrays are 0, 1 and 2.
  • a modulo calculation is performed on the time index corresponding to the query array to determine that the data to be queried is the last 8 bits in the query array 0 and the first 93 bits of the query array 1 and the query array 2.
  • the database server extracts the data in the corresponding location after decompressing the three query arrays and merges the extracted data in chronological order. The merged data is returned to the query terminal.
  • the data to be queried is accurately located in the query array by the query timestamp and the query device logic primary key.
  • the database only needs to decompress the relevant arrays but not decompress a big data block file. Therefore, the speed is high and less system resources are consumed.
  • the volume of the queried data is relatively large or the data queried by a plurality of query terminals is relevant, since the data is stored in the plurality of databases in a distributed manner, hot spots caused by frequent visits to a certain database are avoided.
  • FIG. 6 is a structural block diagram of an apparatus for storing time series data in accordance with one exemplary embodiment of the present application.
  • the apparatus may be disposed on the database server as described in the above embodiment.
  • the apparatus for storing time series data includes:
  • an acquiring module 601 configured to acquire data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval;
  • a first determining module 602 configured to determine a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored;
  • a second determining module 603 configured to determine a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group includes a plurality of databases;
  • a storing module 604 configured to write the data to be stored into a target array in the target database of the target logic group.
  • the second determining module 603 includes:
  • a first determining unit configured to determine a time index corresponding to the data to be stored according to the timestamp
  • a second determining unit configured to determine an array identifier of the target array according to the time index and the device logic primary key
  • a third determining unit configured to determine the target logic group and the target database according to the array identifier.
  • the first determining unit is further configured to:
  • [00123] calculate a time difference between the timestamp and an initial timestamp; and [00124] determine the time index according to the time difference, the preset time interval and an array length of the target array, wherein the time index is obtained by a rounding calculation.
  • the array identifier is an integer.
  • the third determining unit is further configured to: [00126] determine the target logic group according to a numerical interval to which the array identifier belongs, wherein different logic groups correspond to different numerical intervals; and [00127] determine the target database according to the array identifier and the number of the databases of the target logic group, wherein the target database is obtained by a Hash modulo calculation.
  • the storing module 604 includes:
  • a decompressing unit configured to, if the target array exists, decompress the target array from the target database, write the data to be stored into the decompressed target array, and compress the decompressed target array;
  • a creating unit configured to, if the target array does not exist, create the target array in the target database, write the data to be stored into the target array, and compress the target array.
  • the decompressing unit is further configured to:
  • the apparatus for storing time series data further includes:
  • a receiving module configured to receive a query instruction, wherein the query instruction includes a query timestamp, a query device identifier and a query data type;
  • a third determining module configured to determine a query device logic primary key corresponding to the target device according to the query device identifier and the query data type;
  • a fourth determining module configured to determine a query logic group and a query database according to the query device logic primary key and the query timestamp, and decompress a query array in the query database;
  • an extracting module configured to extract query data from the decompressed query array according to the query timestamp.
  • FIG. 7 is a schematic structural diagram of a server according to one exemplary embodiment of the present disclosure.
  • the server 700 includes a central processing unit (CPU) 701, a system memory 704 including a random access memory (RAM) 702 and a read-only memory (ROM) 703, and a system bus 705 connecting the system memory 704 and the central processing unit 701.
  • the server 700 further includes a basic input/output system (I/O system) 706 which helps transmit information between various components within the server, and a high-capacity storage device 707 for storing an operating system 713, an application 714 and other program modules 715.
  • I/O system basic input/output system
  • the basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse and a keyboard, for inputting information by the user. Both the display 708 and the input device 709 are connected to the central processing unit 701 through an input/output controller 710 connected to the system bus 705.
  • the basic input/output system 706 may also include the input/output controller 710 for receiving and processing input from a plurality of other devices, such as the keyboard, the mouse, or an electronic stylus. Similarly, the input/output controller 710 further provides output to the display, a printer or other types of output devices.
  • the high-capacity storage device 707 is connected to the central processing unit 701 through a high-capacity storage controller (not shown) connected to the system bus 705.
  • the high-capacity storage device 707 and a server-readable medium associated therewith provide non-volatile storage fbr the server 700. That is, the high-capacity storage device 707 may include the server-readable medium (not shown), such as a hard disk or a compact disc read-only memory (CD-ROM) driver.
  • the server-readable medium may include a server storage medium and a communication medium.
  • the server storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as a server-readable instruction, a data structure, a program module or other data.
  • the server storage medium includes a RAM, an ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory or other solid-state storage technologies; a CD-ROM, DVD or other optical storage; and a tape cartridge, a magnetic tape, a disk storage or other magnetic storage devices.
  • EPROM erasable programmable ROM
  • EEPROM electrically erasable programmable ROM
  • flash memory or other solid-state storage technologies
  • CD-ROM, DVD or other optical storage and a tape cartridge, a magnetic tape, a disk storage or other magnetic storage devices.
  • the server storage medium is not limited to above.
  • the memory stores one or more programs.
  • the one or more programs are configured to be executed by the one or more CPUs 701.
  • the one or more programs include instructions fbr performing the method for storing time series data.
  • the CPU 701 runs the one or more programs to perform the methods according to the above method embodiments.
  • the server 700 may also be run through a remote server connected to a network via a network, such as the Interet. That is, the server 700 may be connected to the network 712 through a network interface unit 711 connected to the system bus 705, or may be connected to other types of networks or remote server systems (not shown) with the network interface unit 711.
  • a network such as the Interet. That is, the server 700 may be connected to the network 712 through a network interface unit 711 connected to the system bus 705, or may be connected to other types of networks or remote server systems (not shown) with the network interface unit 711.
  • the memory further includes one or more programs stored therein, and the one or more programs include the steps performed by the database server in the methods according to the embodiments of the present disclosure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium storing at least one instruction which is loaded and executed by the processor to perform the method for storing time series data according to the above embodiments.
  • the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. If the functions are implemented in the software, they may be stored in a computer-readable medium or transmitted as one or more instructions or codes on a computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, wherein the communication medium includes any medium that facilitates transfer of a computer program from one place to another, and the storage medium may be any available medium that may be accessed by a general-purpose or special-purpose computer.

Abstract

Embodiments of the present disclosure provide a method and apparatus for storing time series data, and a server and a storage medium thereof, belonging to the field of databases. The method for storing time series data includes: acquiring data to be stored collected by a target device; determining a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored; determining a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group comprises a plurality of databases; and writing the data to be stored into a target array in the target database of the target logic group. In the embodiments of the present disclosure, in case of increase of data, horizontal capacity expansion may be accomplished so long as one or more logic groups are added without migrating history data, such that the horizontal capacity expansion is simple in process. In addition, since the data is distributed and stored in the plurality of databases of the target logic group, hot spots caused by a centralized operation on the single databases are avoided.

Description

METHOD AND APPARATUS FOR STORING TIME SERIES DATA, AND SERVER AND STORAGE MEDIUM THEREOF
TECHNICAL FIELD
[0001] Embodiments of the present disclosure relate to the field of databases, and in particular to a method and apparatus for storing time series data, and a server and a storage medium thereof.
BACKGROUND
[0002] Time series data refers to a data series recorded in chronological order by the same indicator. A time series database is a specialized database for storing and managing the time series data.
[0003] In the related art, the time series database adopts different storage policies for different data types. Generally, each storage policy includes multiple storage areas for storing data within different time ranges. Data files are periodically compressed by log files in the storage areas. During reading of the data, a system finds the hit area according to an instruction, decompresses the relevant data file, merges the required data in a chronological order, and return a final result.
[0004] However, when a traditional time series database is adopted, only a storage compression policy of the data is considered. When the data is increased, it is necessary to create a new database and migrate relevant history data to the new database, which causes a heavy workload in horizontal capacity expansion and wastes resources.
SUMMARY
[0005] Embodiments of the present disclosure provide a method for storing time series data and apparatus, a server, and a storage medium. The technical solutions are as follows.
[0006] In one aspect, embodiments of the present disclosure provide a method for storing time series data. The method includes:
[0007] acquiring data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval;
[0008] determining a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored;
[0009] determining a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group includes a plurality of databases; and [0010] writing the data to be stored into a target array in the target database of the target logic group.
[0011] In another aspect, embodiments of the present disclosure provide an apparatus for storing time series data. The apparatus includes:
[0012] an acquiring module, configured to acquire data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval;
[0013] a first determining module, configured to determine a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored;
[0014] a second determining module, configured to determine a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group includes a plurality of databases; and
[0015] a storing module, configured to write the data to be stored into a target array in the target database of the target logic group.
[0016] In yet another aspect, embodiments of the present disclosure provide a server. The server includes a processor and a memory. At least one instruction, at least one program and a code set or an instruction set are stored in the memory and loaded and executed by the processor to implement the method for storing time series data defined in the above aspect.
[0017] In yet another aspect, embodiments of the present disclosure provide a computer-readable storage medium. At least one instruction, at least one program and a code set or an instruction set are stored in the computer-readable storage medium and loaded and executed by a processor to implement the method for storing time series data defined in the above aspect.
[0018] The technical solutions according to the embodiments of the present disclosure have at least the following beneficial effects.
[0019] The device logic primary key of the target device is determined by acquiring the data type of the data to be stored collected at equal time intervals and the device identifier of the target device that collects the data. The target logic group and the target database are determined according to the device logic primary key and the timestamp of the data to be stored. The data to be stored is written into the target array in the target database. By adoption of the method according to the embodiments of the present disclosure, in case of increase of data, horizontal capacity expansion may be accomplished so long as one or more logic groups are added without migrating history data, such that the horizontal capacity expansion is simple in process. In addition, since the data is distributed and stored in the plurality of databases of the target logic group, hot spots caused by a centralized operation on the single databases are avoided.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1 is a schematic diagram of an implementing environment in accordance with one exemplary embodiment;
[0021] FIG. 2 is a flowchart of a method for storing time series data in accordance with one exemplary embodiment;
[0022] FIG. 3 is a flowchart of a method for storing time series data in accordance with another exemplary embodiment;
[0023] FIG. 4 is a schematic diagram of a logical grouping mode in accordance with one exemplary embodiment;
[0024] FIG. 5 is a flowchart of data query in a method for storing time series data in accordance with another exemplary embodiment;
[0025] FIG. 6 is a structural block diagram of an apparatus for storing time series data in accordance with one exemplary embodiment; and
[0026] FIG. 7 is a schematic structural diagram of a database server in accordance with one exemplary embodiment.
DETAILED DESCRIPTION
[0027] The present disclosure are described in further detail hereinafter with reference to the accompanying drawings, to present the objectives, technical solutions, and advantages of the present disclosure more clearly.
[0028] It is to be understood that the term "plurality" herein refers to two or more. "And/or" herein describes the correspondence of the corresponding objects, indicating three kinds of relationship. For example, A and/or B may be expressed as: A exists alone, A and B exist concurrently, B exists alone. The character "/" generally indicates that the context object is an "OR" relationship.
[0029] In the related art, time series database software, such as OpenTSDB and InfluxDB, is generally used for storage of time series data. Different storage policies are set for data with different storage time limits. Each storage policy includes a plurality of storage areas for storing data within a certain time range. Each storage area includes a memory cache area, a log file, and one or more data files. Different compression policies are adopted for different data types. The data files are periodically compressed by the log files according to the compression policies. During reading of the data, a database finds all the hit areas according to a reading logic, and determines whether it is read directly from the memory cache area or the data files. If it needs to be read from the data files, the relevant data file is decompressed. The data is located with an index in the data file. After reading is completed, a database system merges the data retured by all the hit areas in chronological order. Finally, a query result is returned.
[0030] However, when the data storage mode in the related art is adopted, for time series data acquired at equal time intervals on some scenarios, for example, the local wind speed is observed every 15 min in a meteorological station and an average value is calculated every 5 min by a photovoltaic sensor. Thus, an unnecessary timestamp will be stored additionally, resulting in waste of database resources. In addition, the compression mode according to a large file block only considers the data compression policy, and a large data block needs to be decompressed during reading of the data at a single time point, leading to high consumption of system resources. Moreover, when a database needs to be added in case of increase of the data volume, relevant history data need to be migrated to the new database, which results in relatively difficult horizontal capacity expansion.
[0031] In order to solve the above-mentioned problems, for the time series data with equal time intervals, an embodiment of the present disclosure provides a method for storing data. Referring to FIG. 1, a schematic diagram of an implementing environment in accordance with one exemplary embodiment of the present disclosure is illustrated. The implementing environment includes a collecting device 101, a database (DB) server 102 and a querying terminal 103.
[0032] The collecting device 101 is an apparatus with a data collecting function. Data acquired by the collecting device 101 is time series data with equal time intervals. The collecting device 101 may be a new energy apparatus with such sensors as an anemometer, a temperature and humidity detector and a photovoltaic sensor, and fbr example, it may be a wind-driven generator or a photovoltaic panel. As shown in FIG. 1, the collecting device 101 is a wind-driven generator with an anemometer.
[0033] The collecting device 101 and the database server 102 are connected by a wired or wireless network.
[0034] The database server 102 is a storage device for storing data collected by the collecting device 101, and may be one server, a server cluster consisting of a plurality of servers or a cloud server. Optionally, the database server 102 acquires the data sent by the collecting device 101, compresses and stores the acquired data in a corresponding database, and decompresses the data and sends the decompressed data to the querying terminal 103 during data query. In this embodiment, the data stored in the database server 102 is the time series data. [0035] The database server 102 and the querying terminal 103 are connected by a wired or wireless network.
[0036] The querying terminal 103 is a device with a data query function. In a possible application scenario, the querying terminal 103 sends a query instruction including a query condition to the database server 102, the database server 102 queries the corresponding time series data according to the query condition and feeds the queried time series data back to the querying terminal 103. The querying terminal 103 displays the received time series data in the form of a diagram. The querying terminal 103 may be a personal computer, a smart phone, a tablet PC or the like. As shown in FIG. 1, the querying terminal 103 is a personal computer.
[0037] Referring to FIG. 2, a flowchart of a method for storing time series data in accordance with one exemplary embodiment of the present disclosure is illustrated. This embodiment is described using the scenario where the method is applied to a database server as an example. The method includes the following steps.
[0038] In step 201, data to be stored collected by a target device is acquired. The data to be stored is time series data collected by the target device at a preset time interval.
[0039] The database server acquires the data to be stored, which is the time series data, namely, a data series recorded in chronological order with the same data type. The adjacent data has the same time interval.
[0040] Exemplarily, when the method for storing data is used for recording a wind speed within a period of time, the database server acquires a wind speed value collected by a wind-driven generator via an anemometer. The two adjacent wind speed values have the same time interval. For example, the anemometer collects the wind speed every ten minutes.
[0041] In step 202, a device logic primary key corresponding to the target device is determined according to a device identifier of the target device and a data type of the data to be stored.
[0042] In practice, since one device may collect various types of data, for example, when an anemometer and a temperature and humidity detector are simultaneously disposed on the target device, the device logic primary key corresponding to the target device needs to be determined according to the device identifier of the target device and the data type of the data to be stored. [0043] In a possible implementation, the database server has a main data table, which includes the device logic primary key, as well as the device identifier, the time interval and the data type corresponding to the device logic primary key. The device logic primary key is searched in the main data table according to the device identifier and the data type of the data to be stored when the database server acquires the data to be stored.
[0044] Exemplarily, the main data table is as shown in Table 1.
Figure imgf000008_0001
Table 1
[0045] In step 203, a target logic group and a target database are determined according to the device logic primary key and a timestamp corresponding to the data to be stored. The target database belongs to the target logic group. The target logic group includes a plurality of databases. [0046] In order to solve hot spots and to simplify migration and capacity expansion of the data, the database server adopts a distributed storage solution. That is, the data is stored in the different databases, and in case of increase of the data, the only requirement is to add a logic group constituted by a plurality of databases without decompressing a data file where history data is stored and migrating the history data to the new databases.
[0047] In a possible implementation, the same logic group has a plurality of databases. The database server determines the target logic group of the data to be stored according to the device logic primary key and the timestamp and stores the data to be stored in the plurality of databases of the target logic group.
[0048] In step 204, the data to be stored is written into a target array in the target database of the target logic group.
[0049] In a possible implementation, a database of each logic group adopts an array with a specified length to store the data. Correspondingly, the database server writes the data to be stored in the target array of the target database to complete data storage after determining the target logic group and the target database.
[0050] Exemplarily, each logic group has three databases. If the database server determines that the current data to be stored corresponds to a No. 2 logic group, the data to be stored is divided into three portions according to the timestamp of the data to be stored. The three portions are respectively stored in the three databases of the No. 2 logic group. Besides, each portion is written into a target array of the respective database.
[0051] In summary, in this embodiment, the device logic primary key of the target device is determined by acquiring the data type of the data to be stored collected at equal time intervals and the device identifier of the target device that collects the data. The target logic group and the target database are determined according to the device logic primary key and the timestamp of the data to be stored. The data to be stored is written into the target array in the target database. By adoption of the method, in case of increase of the data, horizontal capacity expansion may be accomplished so long as one or more logic groups are added, without migrating the history data, such that the horizontal capacity expansion is simple in process. In addition, since the data is stored in the plurality of databases of the target logic group in a distributed manner, hot spots caused by a centralized operation on the single databases are avoided. Moreover, data storage in the form of an array may reduce the number of data storage, reduce the number of times the database server reads and writes the database, and improve the data processing efficiency.
[0052] Referring to FIG. 3, a flowchart of a method for storing time series data in accordance with another exemplary embodiment of the present disclosure is illustrated. This embodiment is described using the scenario where the method is applied to a database server as an example. The method for storing time series data includes the following steps.
[0053] In step 301, data to be stored collected by a target device is acquired. The data to be stored is time series data collected by the target device at a preset time interval.
[0054] For the implementation of step 301, reference may be made to step 201, which is not repeated in this embodiment.
[0055] Exemplarily, a device xOOl collects a wind speed every 10 minutes from T00:00:00 on August 10, 2019 to T11:20:00 on August 11, 2019. 201 data is collected in total and sent to the database server in the form of a data table as shown in Table 2.
Figure imgf000009_0001
Table 2 [0056] In step 302, a device logic primary key corresponding to the target device is determined according to a device identifier of the target device and a data type of the data to be stored.
[0057] For the implementation of step 302, reference may be made to step 202, which is not repeated in this embodiment.
[0058] Exemplarily, for the 201 data to be stored collected by the device xOOl, the database server obtains that the corresponding device logic primary key ID1 is 0 by searching in a main data table according to the data type, wind speed, and the device identifier, xOOl. Relevant information of the main data table is as shown in Table 3.
Figure imgf000010_0001
Table 3
[0059] In step 303, a time index corresponding to the data to be stored is determined according to a timestamp.
[0060] The database server obtains the timestamp of each data to be stored while acquiring the data to be stored and determines the corresponding time index according to the timestamp. In a possible implementation, step 303 may include the following sub-steps.
[0061] First, a time difference between the timestamp and an initial timestamp is calculated.
[0062] In a possible implementation, after acquiring a plurality of data to be stored within a certain period of time, the database server calculates a first time difference between a timestamp of the first data to be stored and the initial timestamp and a second time difference between a timestamp of the last data to be stored and the initial timestamp, such that a range of the time index corresponding to the data to be stored may be conveniently calculated according to the first time difference and the second time difference.
[0063] Exemplarily, by taking calculation of the first time difference as an example, a collecting device with a device identifier of xOOl collects a wind speed of a certain place every ten minutes from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019. There are 201 data required to be stored in total. By taking the initial timestamp as 00:00:00 on January 1, 1979, the database server works out in the unit of second that the time difference between the timestamp of the data to be stored and the initial timestamp is 1565395200 seconds.
[0064] Second, the time index is determined according to the time difference, the preset time interval and an array length of a target array. The time index is obtained by a rounding calculation. [0065] In a possible implementation, the database server determines the corresponding database by calculating the time index of the data to be stored, and stores the data in the corresponding database in chronological order by effectively using the characteristic of equal time interval. Since it is unnecessary to store the timestamp, data storage resources of the database server are saved. [0066] Exemplarily, the database server sets an array length of an array for storing the data in the database to be 100, and obtains that the time index of the data to be stored is 1565395200/(10*60)/100=26089 by a rounding calculation according to the time difference of 1565395200 seconds, the preset time interval of 10 minutes and the array length of 100. Correspondingly, the time index of the 201 data to be stored shown in Table 2 ranges from 26089 to 26091.
[0067] In step 304, an array identifier of the target array is determined according to the time index and the device logic primary key.
[0068] In a possible implementation, the database server has a time index table. After working out the time index of the data to be stored, the database server determines the array identifier of the target array in the time index table according to the time index and the device logic primary key. The array identifier is an integer. The time index table is as shown in Table 4.
Figure imgf000011_0001
Table 4
[0069] Exemplarily, for the data to be stored with the time index of 26089 and the device logic primary key of 0, the database server finds in the time index table that the array identifier ID2 of the corresponding target array is 0. Relevant information of the time index table is as shown in Table 5.
Figure imgf000011_0002
Table s
[0070] In step 305, a target logic group and a target database are determined according to the array identifier.
[0071] In a possible implementation, the database server has a data storage table. After determining the array identifier of the data to be stored, the database server searches the target array corresponding to the target database of the target logic group in the data storage table according to the array identifier. Exemplarily, the data storage table is as shown in the following table.
Figure imgf000011_0003
Figure imgf000012_0001
Table 6
[0072] In a possible implementation, step 305 may include the following sub-steps.
[0073] First, the target logic group is determined according to a numerical interval to which the array identifier belongs. Different logic groups correspond to different numerical intervals.
[0074] In a possible implementation, the database server classifies the logic groups according to the numerical interval of the array identifier. For example, every n arrays constitute one logic group, a plurality of databases is distributed in each logic group, and the n arrays are uniformly distributed in a plurality of databases. After determining the target logic group of the data to be stored, the database server stores the data to be stored in a plurality of databases of the target logic group.
[0075] FIG. 4 illustrates a logic grouping mode of the database server. Classification is based on the numerical interval of ID2, and every 2,000 arrays constitute one logic group. When the data is increased to exceed the capacity of all the current logic groups, one logic group only needs to be horizontally added such that the excessive data is stored in the new logic group without migrating relevant history data.
[0076] Exemplarily, the value of ID2 of the data to be stored collected by the device xOOl ranges from 0 to 2. Thus, 201 data is all stored in the first logic group.
[0077] Second, the target database is determined according to the array identifier and the number of the databases of the target logic group. The target database is obtained by a Hash modulo calculation.
[0078] In order to avoid hot spots, a logic group includes a plurality of databases. When the database server determines the target logic group, the specific target database needs to be located. In a possible implementation, the database server performs a Hash modulo calculation on the value of the array identifier according to the number of the databases of the target logic group to determine the target database.
[0079] As shown in FIG. 4, the database server classifies 2,000 arrays as one logic group. Each logic group includes three databases. The database server determines the target database where the target array is located after performing the Hash modulo calculation on ID2 by 3.
[0080] Exemplarily, the data to be stored collected by the device xOOl is stored in the three target arrays of the first logic group, whose array identifiers are respectively 0, 1 and 2. Through the Hash modulo calculation, it is determined that the data whose ID2 is 0 is stored in DB0, the data whose ID2 is 1 is stored in DB1 and the data whose ID2 is 2 is stored in DB2.
[0081] In step 306, it is determined whether the target array exists or not. [0082] After determining the target database, the database server needs to determine whether the target array exists in the target database. For two different cases, it is necessary to adopt different methods for data storage.
[0083] In step 307, if the target array exists, the target array is decompressed from the target database, the data to be stored is written into the decompressed target array, and the decompressed target array is compressed.
[0084] The database server judges whether the target array exists or not after determining the target array. If the target array exists, the target array is decompressed first. Then, the data to be stored is written into the decompressed target array, and the target array is compressed. In a possible implementation, the database server adopts an XORs compression algorithm to compress the array, and its compression efficiency ranges from 70% to 95%.
[0085] In a possible implementation, step 307 may include the following sub-steps.
[0086] First, a data start of the data to be stored in the target array is determined according to the timestamp.
[0087] When the target array exists, it indicates that the data is stored in part of data storage bits of the target array. Thus, in order to prevent the data stored later from affecting the stored data, the database server needs to calculate the data start of the data to be stored in the target array.
[0088] In a possible implementation, the database server determines a storage location using a time characteristic of the data to be stored. First, a modulo calculation is performed on the time difference between the timestamp of the data to be stored and the initial timestamp, the preset time interval and the array length of the target array to obtain the number of data that the target array may store. The data with the same number in the data to be stored is stored in a vacant bit of the target array. For example, when the target array may store n data, and n is less than or equal to the array length, the first n data in the data to be stored is stored in the target array, and other data is stored in other target arrays.
[0089] Exemplarily, the database server sets the array length to be 100 and performs the modulo calculation 1565395200/(10*60)%100=92 on the data to be stored, whose time difference is 1565395200 seconds and the preset time interval is 10 minutes. It indicates that the target array corresponding to the current time index may only store 8 pieces of data. Since 0 and 99 are respectively at the beginning and the end of an array, it is determined that data start of the data to be stored is 92.
[0090] Second, the data to be stored is written into the decompressed target array from the data start. [0091] After working out the data start of the data to be stored, the database server writes the data to be stored, required to be stored in the target array, into a corresponding location of the target array.
[0092] Exemplarily, a collecting device with a device identifier of xOOl collects a wind speed of a certain place every ten minutes from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019. There are 201 data required to be stored in total. After working out the time index 26089 and the corresponding array identifier, the database server works out that the target array may only store 8 data at present by a modulo calculation. Then, the database server writes 8 wind speeds from 00:00:00 on August 10, 2019 to 00:01: 10 on August 10, 2019 into the target array corresponding to the time index 26089, writes 100 data from 00:01:20 on August 10, 2019 into the target array corresponding to the time index 26090, and writes the rest 93 data into the first 93 bits of the target array corresponding to the time index 26091. The database server detects that the target array corresponding to the time index 26089 exists, and stores the data in the form of a table. A storage table of the data to be stored collected by the device xOOl is as shown in Table 7. The first 8 pieces of data collected by xOOl is written into the last 8 bits of the target array whose ID2 is 0 of the target logic group and compressed and stored in DB0.
Figure imgf000014_0001
Table 7
[0093] NaN represents a portion of an array, where the date is not stored.
[0094] In step 308, if the target array does not exist, the target array is created in the target database, the data to be stored is written into the target array, and the target array is compressed. [0095] When the database server detects that the target array does not exist at present, it needs to create the target array in the target database, to write a corresponding portion of the data to be stored into the target array and to compress the target array using an XORs algorithm to complete data storage.
[0096] Exemplarily, when the database server detects, after storing the first 8 data collected by xOOl in the target array corresponding to the time index 26089, that the target arrays corresponding to the time indexes 26090 and 26091 do not exist, the target arrays whose ID2 is 1 and 2 are respectively created in DB1 and DB2 of the target logic group. The 9th to 108th data collected by xOOl are written into a table in which ID2 is 1, as shown in Table. 8. The rest 93 data is written into a table in which ID2 is 2, as shown in Table 9.
Figure imgf000014_0002
Figure imgf000015_0001
Table 8
Figure imgf000015_0002
Table 9
[0097] NaN represents data stored in the array 2 during the subsequent data storage process.
[0098] In this embodiment, the data to be stored is stored in the plurality of databases by distributed storage. The target array and the data start in the target database are calculated according to the timestamp and the time interval of the data to be stored. Each location represents data of corresponding time. Thus, storage of the timestamp is avoided, saving the data storage resources. In addition, since each array includes 100 data, 100 times of the storage capacity of the database may be reduced to the utmost extent.
[0099] Referring to FIG. 5, a flowchart of data query in a method for storing time series data in accordance with another exemplary embodiment is illustrated. This embodiment takes that the method is applied to a database server as an example for explanation. The method for storing time series data includes the following steps.
[00100] In step 501, a query instruction is received. The query instruction includes a query timestamp, a query device identifier and a query data type.
[00101] The database server receives the query instruction from a client. In a possible implementation, during data query, a user inputs the query timestamp, the query device identifier and the query data type into the client.
[00102] Exemplarily, in need of querying a wind speed collected by a device xOOl from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019, the user inputs a corresponding starting query timestamp of 00:00:00 on August 10, 2019 and a closing query timestamp of 11:20:00 on August 11, 2019, the query device identifier, xOOl, and the query data type, wind speed, into a data query interface of a query terminal. The query instruction including the above-mentioned query conditions is sent by the query terminal to the database server.
[00103] In step 502, a query device logic primary key corresponding to a target device is determined according to the query device identifier and the query data type.
[00104] The database server searches a corresponding query device logic primary key ID 1 in a main data table according to the query device identifier and the query data type.
[00105] Exemplarily, the database server finds ID1 corresponding to xOOl and the wind speed in the main data table and determines ID 1 as the query device logic primary key. [00106] In step 503, a query logic group and a query database are determined according to the query device logic primary key and the query timestamp, and a query array in the query database is decompressed.
[00107] The database server determines a time index corresponding to query data according to the query timestamp. In a possible implementation, a preset time interval of the query data is determined by the query device logic primary key first. Then, the time index is determined according to a time difference between the query timestamp and an initial timestamp, the preset time interval and an array length of the query array. An array identifier of the query array is determined according to the time index and the query device logic primary key. Thus, a query logic group and a query database are determined. The query array corresponding to the query database is decompressed.
[00108] Exemplarily, a wind speed in a certain place from 00:00:00 on August 10, 2019 to 11:20:00 on August 11, 2019 is queried. There is 201 data in total. ID1 and the preset time interval of 10 minutes are determined according to the query device identifier and the query data type, wind speed. A time difference between a timestamp of data to be queried and the initial timestamp is 1565395200 seconds through calculation. The time index is 1565395200/(10*60)/! 00=26089 after a rounding calculation. Thus, the corresponding query array is determined.
[00109] In step 504, query data is extracted from the decompressed query array according to the query timestamp.
[00110] Since the array length is 100, and the location of the query data in the array is not fixed, the database server needs to determine a start location and an end location of the query data in the query array according to the query timestamp first, then, extracts the query data between the two locations, and returs the extracted query data to the client. After that, the query array is compressed again. [00111] Exemplarily, the database server determines that the data to be queried is stored in three databases of the first logic group according to the query timestamp of 00:00:00 on August 10, 2019. The array identifiers of the query arrays are 0, 1 and 2. A modulo calculation is performed on the time index corresponding to the query array to determine that the data to be queried is the last 8 bits in the query array 0 and the first 93 bits of the query array 1 and the query array 2. The database server extracts the data in the corresponding location after decompressing the three query arrays and merges the extracted data in chronological order. The merged data is returned to the query terminal. [00112] In this embodiment, the data to be queried is accurately located in the query array by the query timestamp and the query device logic primary key. The database only needs to decompress the relevant arrays but not decompress a big data block file. Therefore, the speed is high and less system resources are consumed. When the volume of the queried data is relatively large or the data queried by a plurality of query terminals is relevant, since the data is stored in the plurality of databases in a distributed manner, hot spots caused by frequent visits to a certain database are avoided.
[00113] FIG. 6 is a structural block diagram of an apparatus for storing time series data in accordance with one exemplary embodiment of the present application. The apparatus may be disposed on the database server as described in the above embodiment. As shown in FIG. 6, the apparatus for storing time series data includes:
[00114] an acquiring module 601, configured to acquire data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval;
[00115] a first determining module 602, configured to determine a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored;
[00116] a second determining module 603, configured to determine a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group includes a plurality of databases; and
[00117] a storing module 604, configured to write the data to be stored into a target array in the target database of the target logic group.
[00118] Optionally, the second determining module 603 includes:
[00119] a first determining unit, configured to determine a time index corresponding to the data to be stored according to the timestamp;
[00120] a second determining unit, configured to determine an array identifier of the target array according to the time index and the device logic primary key; and
[00121] a third determining unit, configured to determine the target logic group and the target database according to the array identifier.
[00122] Optionally, the first determining unit is further configured to:
[00123] calculate a time difference between the timestamp and an initial timestamp; and [00124] determine the time index according to the time difference, the preset time interval and an array length of the target array, wherein the time index is obtained by a rounding calculation.
[00125] Optionally, the array identifier is an integer. The third determining unit is further configured to: [00126] determine the target logic group according to a numerical interval to which the array identifier belongs, wherein different logic groups correspond to different numerical intervals; and [00127] determine the target database according to the array identifier and the number of the databases of the target logic group, wherein the target database is obtained by a Hash modulo calculation.
[00128] Optionally, the storing module 604 includes:
[00129] a decompressing unit, configured to, if the target array exists, decompress the target array from the target database, write the data to be stored into the decompressed target array, and compress the decompressed target array; and
[00130] a creating unit, configured to, if the target array does not exist, create the target array in the target database, write the data to be stored into the target array, and compress the target array. [00131] Optionally, the decompressing unit is further configured to:
[00132] determine a data start of the data to be stored in the target array according to the timestamp; and
[00133] write the data to be stored into the decompressed target array from the data start.
[00134] Optionally, the apparatus for storing time series data further includes:
[00135] a receiving module, configured to receive a query instruction, wherein the query instruction includes a query timestamp, a query device identifier and a query data type;
[00136] a third determining module, configured to determine a query device logic primary key corresponding to the target device according to the query device identifier and the query data type; [00137] a fourth determining module, configured to determine a query logic group and a query database according to the query device logic primary key and the query timestamp, and decompress a query array in the query database; and
[00138] an extracting module, configured to extract query data from the decompressed query array according to the query timestamp.
[00139] It should be noted that the apparatus for storing time series data according to the above embodiments only takes division of all the functional modules as an example for explanation. In practice, the above functions may be implemented by the different functional modules as required. That is, the internal structure of the device is divided into different functional modules to finish all or part of the functions described above. In addition, the apparatus for storing time series data according to the above embodiments is based on the same inventive concept as the method for storing time series data according to the above embodiment. For the specific implementation process of the device, reference may be made to the method embodiment, which is not repeated herein. [00140] FIG. 7 is a schematic structural diagram of a server according to one exemplary embodiment of the present disclosure. Specifically, the server 700 includes a central processing unit (CPU) 701, a system memory 704 including a random access memory (RAM) 702 and a read-only memory (ROM) 703, and a system bus 705 connecting the system memory 704 and the central processing unit 701. The server 700 further includes a basic input/output system (I/O system) 706 which helps transmit information between various components within the server, and a high-capacity storage device 707 for storing an operating system 713, an application 714 and other program modules 715.
[00141] The basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse and a keyboard, for inputting information by the user. Both the display 708 and the input device 709 are connected to the central processing unit 701 through an input/output controller 710 connected to the system bus 705. The basic input/output system 706 may also include the input/output controller 710 for receiving and processing input from a plurality of other devices, such as the keyboard, the mouse, or an electronic stylus. Similarly, the input/output controller 710 further provides output to the display, a printer or other types of output devices. [00142] The high-capacity storage device 707 is connected to the central processing unit 701 through a high-capacity storage controller (not shown) connected to the system bus 705. The high-capacity storage device 707 and a server-readable medium associated therewith provide non-volatile storage fbr the server 700. That is, the high-capacity storage device 707 may include the server-readable medium (not shown), such as a hard disk or a compact disc read-only memory (CD-ROM) driver.
[00143] Without loss of generality, the server-readable medium may include a server storage medium and a communication medium. The server storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as a server-readable instruction, a data structure, a program module or other data. The server storage medium includes a RAM, an ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory or other solid-state storage technologies; a CD-ROM, DVD or other optical storage; and a tape cartridge, a magnetic tape, a disk storage or other magnetic storage devices. Nevertheless, it may be known by a person skilled in the art that the server storage medium is not limited to above. The above system memory 704 and the high-capacity storage device 707 may be collectively referred to as the memory.
[00144] The memory stores one or more programs. The one or more programs are configured to be executed by the one or more CPUs 701. The one or more programs include instructions fbr performing the method for storing time series data. The CPU 701 runs the one or more programs to perform the methods according to the above method embodiments.
[00145] According to the various embodiments of the present disclosure, the server 700 may also be run through a remote server connected to a network via a network, such as the Interet. That is, the server 700 may be connected to the network 712 through a network interface unit 711 connected to the system bus 705, or may be connected to other types of networks or remote server systems (not shown) with the network interface unit 711.
[00146] The memory further includes one or more programs stored therein, and the one or more programs include the steps performed by the database server in the methods according to the embodiments of the present disclosure.
[00147] An embodiment of the present disclosure further provides a computer-readable storage medium storing at least one instruction which is loaded and executed by the processor to perform the method for storing time series data according to the above embodiments.
[00148] A person skilled in the art shall appreciate that in one or more examples described above, the functions described in the embodiments of the present disclosure may be implemented in hardware, software, firmware, or any combination thereof. If the functions are implemented in the software, they may be stored in a computer-readable medium or transmitted as one or more instructions or codes on a computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, wherein the communication medium includes any medium that facilitates transfer of a computer program from one place to another, and the storage medium may be any available medium that may be accessed by a general-purpose or special-purpose computer.
[00149] Described above are only exemplary embodiments of the present disclosure, and are not intended to limit the present disclosure. Various variations and modifications may be made to the present disclosure for those skilled in the art. Any modifications, equivalent substitutions, or improvements made within the spirit and principles of the present disclosure should be included within the scope of the present disclosure.

Claims

CLAIMS What is claimed is:
1. A method for storing time series data, comprising: acquiring data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval; determining a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored; determining a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group comprises a plurality of databases; and writing the data to be stored into a target array in the target database of the target logic group.
2. The method according to claim 1, wherein said determining a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored comprises: determining a time index corresponding to the data to be stored according to the timestamp; determining an array identifier of the target array according to the time index and the device logic primary key; and determining the target logic group and the target database according to the array identifier.
3. The method according to claim 2, wherein the determining a time index corresponding to the data to be stored according to the timestamp comprises: calculating a time difference between the timestamp and an initial timestamp; and determining the time index according to the time difference, the preset time interval and an array length of the target array, wherein the time index is obtained by a rounding calculation.
4. The method according to claim 2, wherein the array identifier is an integer, and said determining the target logic group and the target database according to the array identifier comprises: determining the target logic group according to a numerical interval to which the array identifier belongs, wherein different logic groups correspond to different numerical intervals; and determining the target database according to the array identifier and the number of the databases of the target logic group, wherein the target database is obtained by a Hash modulo calculation.
5. The method according to any one of claims 1 to 4, wherein the writing the data to be stored into a target array in the target database of the target logic group comprises: if the target array exists, decompressing the target array from the target database, writing the data to be stored into the decompressed target array, and compressing the decompressed target array; and if the target array does not exist, creating the target array in the target database, writing the data to be stored into the target array, and compressing the target array.
6. The method according to claim 5, wherein the writing the data to be stored into the decompressed target array comprises: determining a data start of the data to be stored in the target array according to the timestamp; and writing the data to be stored into the decompressed target array from the data start.
7. The method according to any one of claims 1 to 4, upon the writing the data to be stored into a target array in the target database of the target logic group, the method further comprises: receiving a query instruction, wherein the query instruction comprises a query timestamp, a query device identifier and a query data type; determining a query device logic primary key corresponding to the target device according to the query device identifier and the query data type; determining a query logic group and a query database according to the query device logic primary key and the query timestamp, and decompressing a query array in the query- database; and extracting query data from the decompressed query- array according to the query timestamp.
8. An apparatus for storing time series data, comprising: an acquiring module, configured to acquire data to be stored collected by a target device, wherein the data to be stored is time series data collected by the target device at a preset time interval; a first determining module, configured to determine a device logic primary key corresponding to the target device according to a device identifier of the target device and a data type of the data to be stored; a second determining module, configured to determine a target logic group and a target database according to the device logic primary key and a timestamp corresponding to the data to be stored, wherein the target database belongs to the target logic group, and the target logic group comprises a plurality of databases; and a storing module, configured to write the data to be stored into a target array in the target database of the target logic group.
9. The apparatus according to claim 8, wherein the second determining module comprises: a first determining unit, configured to determine a time index corresponding to the data to be stored according to the timestamp; a second determining unit, configured to determine an array identifier of the target array according to the time index and the device logic primary' key; and a third determining unit, configured to determine the target logic group and the target database according to the array identifier.
10. The apparatus according to claim 9, wherein the first determining unit is further configured to: calculate a time difference between the timestamp and an initial timestamp; and determine the time index according to the time difference, the preset time interval and an array length of the target array, wherein the time index is obtained by a rounding calculation.
11. The apparatus according to claim 9, wherein the array identifier is an integer, and the third determining unit is further configured to: determine the target logic group according to a numerical interval to which the array identifier belongs, wherein different logic groups correspond to different numerical intervals; and determine the target database according to the array identifier and the number of the databases of the target logic group, wherein the target database is obtained by a Hash modulo calculation.
12. The apparatus according to any one of claims 8 to 11, wherein the storing module comprises: a decompressing unit, configured to, if the target array exists, decompress the target array from the target database, write the data to be stored into the decompressed target array, and compress the decompressed target array; and a creating unit, configured to, if the target array does not exist, create the target array in the target database, write the data to be stored into the target array, and compress the target array.
13. The apparatus according to claim 12, wherein the decompressing unit is further configured to: determine a data start of the data to be stored in the target array according to the timestamp; and write the data to be stored into the decompressed target array from the data start.
14. The apparatus according to any one of claims 8 to 11, further comprising: a receiving module, configured to receive a query instruction, wherein the query instruction comprises a query timestamp, a query device identifier and a query data type; a third determining module, configured to determine a query' device logic primary key corresponding to the target device according to the query device identifier and the query' data type; a fourth determining module, configured to determine a query logic group and a query database according to the query device logic primary key and the query timestamp, and decompress a query' array in tire query database; and an extracting module, configured to extract query data from the decompressed query array according to the query timestamp.
15. A server, comprising a processor and a memory, wherein at least one instruction, at least one program and a code set or an instruction set ate stored in the memory and loaded and executed by the processor to perform the method for storing time series data as defined in any one of claims 1 to 7.
16. A computer-readable storage medium, wherein at least one instruction, at least one program and a code set or an instruction set are stored in the computer-readable storage medium and loaded and executed by a processor to perform the method for storing time series data as defined in any one of claims 1 to 7.
PCT/SG2020/050634 2019-11-05 2020-11-04 Method and apparatus for storing time series data, and server and storage medium thereof WO2021091489A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911073197.0A CN111125089B (en) 2019-11-05 2019-11-05 Time sequence data storage method, device, server and storage medium
CN201911073197.0 2019-11-05

Publications (1)

Publication Number Publication Date
WO2021091489A1 true WO2021091489A1 (en) 2021-05-14

Family

ID=70495552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2020/050634 WO2021091489A1 (en) 2019-11-05 2020-11-04 Method and apparatus for storing time series data, and server and storage medium thereof

Country Status (2)

Country Link
CN (1) CN111125089B (en)
WO (1) WO2021091489A1 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581220A (en) * 2020-05-28 2020-08-25 泰康保险集团股份有限公司 Storage and retrieval method, device, equipment and storage medium for time series data
CN111723075B (en) * 2020-06-11 2023-05-30 阳光新能源开发股份有限公司 Method, system and medium for constructing, searching and storing data of real-time database
CN111767276B (en) * 2020-06-29 2024-03-15 北京百度网讯科技有限公司 Data storage method, device, electronic equipment and storage medium
CN112000619A (en) * 2020-08-21 2020-11-27 杭州安恒信息技术股份有限公司 Time sequence data storage method, device, equipment and readable storage medium
CN112084147A (en) * 2020-09-10 2020-12-15 珠海美佳音科技有限公司 Data storage method, data acquisition recorder and electronic equipment
CN112199419A (en) * 2020-10-09 2021-01-08 深圳市欢太科技有限公司 Distributed time sequence database, storage method, equipment and storage medium
CN112445795A (en) * 2020-10-22 2021-03-05 浙江蓝卓工业互联网信息技术有限公司 Distributed storage capacity expansion method and data query method for time sequence database
CN112269670B (en) * 2020-10-30 2023-08-25 重庆紫光华山智安科技有限公司 Data warehouse-in method, device, system and storage medium
CN112434015B (en) * 2020-12-08 2022-08-19 新华三大数据技术有限公司 Data storage method and device, electronic equipment and medium
CN112612793B (en) * 2020-12-25 2022-11-15 恒生电子股份有限公司 Resource query method, device, node equipment and storage medium
CN112579834B (en) * 2021-02-22 2021-09-03 北京工业大数据创新中心有限公司 Industrial equipment data storage method and system
CN113010484A (en) * 2021-03-12 2021-06-22 维沃移动通信有限公司 Log file management method and device
CN113342284B (en) * 2021-06-30 2023-02-28 招商局金融科技有限公司 Time sequence data storage method and device, computer equipment and storage medium
CN113297278B (en) * 2021-07-26 2022-03-18 阿里云计算有限公司 Time sequence database, data processing method, storage device and computer program product
CN114205372A (en) * 2021-12-08 2022-03-18 南方电网深圳数字电网研究院有限公司 Data storage method and device of Internet of things
CN114844911A (en) * 2022-04-20 2022-08-02 网易(杭州)网络有限公司 Data storage method and device, electronic equipment and computer readable storage medium
CN114969171B (en) * 2022-07-22 2023-05-12 电科云(北京)科技有限公司 Space-time consistent data display and playback method, device, equipment and storage medium
CN116069870A (en) * 2023-04-06 2023-05-05 深圳开鸿数字产业发展有限公司 Data storage method, device, equipment and medium based on distributed system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170177646A1 (en) * 2012-10-31 2017-06-22 International Business Machines Corporation Processing time series data from multiple sensors
US20180004812A1 (en) * 2016-06-30 2018-01-04 Referentia Systems, Inc. Time series data query engine
US20190108265A1 (en) * 2017-10-10 2019-04-11 Servicenow, Inc. Visualizing time metric database
US20190286440A1 (en) * 2017-11-16 2019-09-19 Sas Institute Inc. Scalable cloud-based time series analysis

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152672B2 (en) * 2012-12-17 2015-10-06 General Electric Company Method for storage, querying, and analysis of time series data
US10007690B2 (en) * 2014-09-26 2018-06-26 International Business Machines Corporation Data ingestion stager for time series database
CN104731896B (en) * 2015-03-18 2018-11-09 北京百度网讯科技有限公司 A kind of data processing method and system
CN107807969A (en) * 2017-10-18 2018-03-16 上海华电电力发展有限公司 New time series data storage method for power plant
CN108256088A (en) * 2018-01-23 2018-07-06 清华大学 A kind of storage method and system of the time series data based on key value database
CN108399263B (en) * 2018-03-15 2022-03-01 北京大众益康科技有限公司 Time sequence data storage and query method and storage and processing platform
CN110134723A (en) * 2019-05-22 2019-08-16 网易(杭州)网络有限公司 A kind of method and database of storing data
CN110287199B (en) * 2019-07-01 2021-11-16 联想(北京)有限公司 Database processing method and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170177646A1 (en) * 2012-10-31 2017-06-22 International Business Machines Corporation Processing time series data from multiple sensors
US20180004812A1 (en) * 2016-06-30 2018-01-04 Referentia Systems, Inc. Time series data query engine
US20190108265A1 (en) * 2017-10-10 2019-04-11 Servicenow, Inc. Visualizing time metric database
US20190286440A1 (en) * 2017-11-16 2019-09-19 Sas Institute Inc. Scalable cloud-based time series analysis

Also Published As

Publication number Publication date
CN111125089A (en) 2020-05-08
CN111125089B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
WO2021091489A1 (en) Method and apparatus for storing time series data, and server and storage medium thereof
JP7279266B2 (en) Methods and apparatus for storing and querying time series data, and their servers and storage media
CN110019004B (en) Data processing method, device and system
CN104317800A (en) Hybrid storage system and method for mass intelligent power utilization data
US11636083B2 (en) Data processing method and apparatus, storage medium and electronic device
US20150095381A1 (en) Method and apparatus for managing time series database
CN111339103B (en) Data exchange method and system based on full-quantity fragmentation and incremental log analysis
CN110795499B (en) Cluster data synchronization method, device, equipment and storage medium based on big data
CN111046034A (en) Method and system for managing memory data and maintaining data in memory
CN105303456A (en) Method for processing monitoring data of electric power transmission equipment
CN111586091B (en) Edge computing gateway system for realizing computing power assembly
CN104239377A (en) Platform-crossing data retrieval method and device
CN110851474A (en) Data query method, database middleware, data query device and storage medium
CN111400393A (en) Data processing method and device based on multi-application platform and storage medium
CN113297269A (en) Data query method and device
CN116842012A (en) Method, device, equipment and storage medium for storing Redis cluster in fragments
CN116126238A (en) Data storage method, system, device and nonvolatile storage medium
CN115269519A (en) Log detection method and device and electronic equipment
CN116628042A (en) Data processing method, device, equipment and medium
CN113885803A (en) Data storage method and device, electronic equipment and storage medium
CN113342813A (en) Key value data processing method and device, computer equipment and readable storage medium
CN111782588A (en) File reading method, device, equipment and medium
CN112667149A (en) Data heat sensing method, device, equipment and medium
CN111104416A (en) Distributed electric power data management system
CN112988736B (en) Mass data quality checking method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883917

Country of ref document: EP

Kind code of ref document: A1

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883917

Country of ref document: EP

Kind code of ref document: A1