WO2020159397A1 - Method and computerized device for processing numeric time series data - Google Patents

Method and computerized device for processing numeric time series data Download PDF

Info

Publication number
WO2020159397A1
WO2020159397A1 PCT/RU2019/000055 RU2019000055W WO2020159397A1 WO 2020159397 A1 WO2020159397 A1 WO 2020159397A1 RU 2019000055 W RU2019000055 W RU 2019000055W WO 2020159397 A1 WO2020159397 A1 WO 2020159397A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
index
certain
segment
time series
Prior art date
Application number
PCT/RU2019/000055
Other languages
French (fr)
Inventor
Yury Vladimirovich KUZNETCOV
Denis NASONOV
Alexander Fleksandrovich VISHERATIN
Ksenia Dmitrievna MUKHINA
Gali Ketema MBOGO
Original Assignee
Siemens Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft filed Critical Siemens Aktiengesellschaft
Priority to PCT/RU2019/000055 priority Critical patent/WO2020159397A1/en
Publication of WO2020159397A1 publication Critical patent/WO2020159397A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Definitions

  • the present invention relates to the field of industrial Big Data applications and, particularly, to a method and a computerized device for processing numeric time series data, in particular for performing a range search in numeric time series data acquired in an industrial facility.
  • Industrial facilities such as power plants are equipped with sensors supplying readings such as pressure or temperature.
  • the readings are stored as time series for later analysis.
  • the amount of data stored for an industrial facility is approaching the tera- or petabyte level.
  • Such numeric time series data is typically stored in data warehouses or in a cloud.
  • An expert user may need to identify a time interval during which a given sensor supplied readings within a given amplitude range.
  • a linear scan through all data to identify the applicable time intervals is too costly in terms of data traffic, CPU traffic and takes too long to be of practical use.
  • Known fast search mechanisms such as Google® search are adapted to search in alphanumeric data and do not work well with numeric time series data.
  • a method and computerized device capable of performing an amplitude range search within terabytes of numeric time series data within a sub-second response time is not known .
  • PCT/RU2018/000373 discloses a method and device for performing an index-based amplitude range search within numeric time series data.
  • a method for processing numeric time series data from a number of sensors by a computerized device including a local storage unit comprises :
  • step h) it is the index, and not the numeric time series, that is processed to determine the range that corresponds to the data request.
  • the above steps g) , h) and i) may therefore also be collectively referred to as “performing an index-based range search”.
  • steps a) to f) may be referred to as "building an index adapted for an index-based range search”.
  • the numeric time series is stored in the external storage, like a data warehouse or a cloud, since the index is stored in the local storage unit, costly data traffic to the data warehouse or to the cloud may be avoided. Therefore, a cost for responding to the data request may be advantageously reduced .
  • the method may be a computer-implemented method.
  • the method may be carried out using the computerized device which may include said local storage unit and one or more processing units, such as one or more CPUs.
  • the local storage unit may include a hard disk, solid state disk, RAID storage, and the like.
  • the index may be a numeric index.
  • the numeric index may be adapted to provide a response to a search request including an amplitude range criterion and/or a time range criterion.
  • the index may comprise information necessary and sufficient to determine the range for which the numeric time series is known to match the range criterion with a predetermined precision. Therefore, advantageously, step h) may be carried out with the predetermined precision without having to process the numeric time series.
  • the index may favorably require less storage space than the numeric time series itself. Thereby, processing the index in step h) may require less time than processing the numeric time series.
  • the index may thus be regarded as a compressed representation of the numeric time series.
  • the compressed index may be advantageously stored in the local storage unit rather than in the external storage, like a cloud or data warehouse .
  • a reading may be a value originally provided by one of the sensors installed in an industrial facility, such as a temperature, a pressure, a power output or load, etc.
  • the sensor is a gas turbine sensor and the time series includes pairs of sensed pressure values and associated time stamps.
  • the time stamp may have a milliseconds precision and may be represented by a 64-bit value, e.g. int64.
  • the value may be a real number, e.g. representing the physical pressure parameter, represented by a 32-bit float number, e.g. float 32. Also int8, intl6, int32, int64, float32 and float64 may be used.
  • the numeric time series may be a non-equidistant numeric time series or an equidistant numeric time series.
  • the numeric time series may be acquired by directly receiving the plurality of readings from the sensors installed in an industrial facility and storing, in the time series, a respective time of reception of each reading as the timestamp associated with the reading (value) .
  • the acquired time series is split into data blocks and data segments.
  • the acquired time series is divided into said data blocks by time.
  • the data blocks are grouped into said data segments which may be limited by time duration, and thus have a limited number of data blocks.
  • the steps a) to d) are executed repeatedly for different portions of the numeric time series; and the processing step e) to derive the index includes creating the index upon a first execution of the steps a) to d) and includes updating the index upon each subsequent execution of the steps a) to d) .
  • step d) the data segments including said data blocks are stored into the external storage which may be a cloud or cloud system, for example.
  • Storing said time series data in the form of said data segments in a cloud storage service provides a cost-efficiency solution.
  • high- availability and scalability of storing said data segments is achieved by respective qualities of the backing cloud storage service.
  • the time series may be acquired portion-by-portion, and an amount of local storage space at the computerized device required for providing the index for each of the sensors may be reduced.
  • easy and efficient updates may be possible when further portions/ further readings are added to the numeric time series over time.
  • such updates may be advantageously performed without repeating acquisition of the already acquired portions, thereby reducing an amount of data to be transferred for each update.
  • step f) the index for each sensor is advantageously stored in the local storage unit, such as a hard disk of a local workstation, even in a case where the actual numeric time series comprises Big Data that may only be stored in the external storage, like a data warehouse or a cloud.
  • the local storage unit such as a hard disk of a local workstation
  • the data request may be received through an input unit, such as a keyboard, connected to a computerized device carrying out the method.
  • the data request may be received via network from another computerized device by a service endpoint.
  • the data request may include a logical expression formed by one or more range criteria and, particularly, one or more logical operators.
  • the method may advantageously support complex and sophisticated search requests .
  • a logical operator may be a Boolean operator such as logical "AND”, logical "OR” or logical "NOT".
  • Boolean operator such as logical "AND”, logical "OR” or logical "NOT”.
  • complex search requests correlating different readings from different numeric time series may be made possible, such as "temperature greater than 500 degrees Celsius AND power lower than 300 Megawatts" .
  • processing the index may refer to accessing the index based on the range criterion.
  • the range criterion, or an upper or lower boundary comprised therein may be used in a keyword-like manner to gain fast access to portions of the index containing information about a matching range .
  • the determined range may be output by displaying a human readable representation of the determined range on a display device.
  • the determined range may also be output by transmitting a digital representation of the determined range via a wired or wireless network.
  • determining a matching range and outputting the matching range may also comprise determining a plurality of matching ranges and outputting the plurality of matching ranges .
  • the index or indices stored in the local storage unit provides fast random-access which allows searching and extracting the data even without accessing the external storage.
  • a fast value range search is provided over a relatively small number of data segments and data blocks in the index locally stored at the local storage unit.
  • fast time series data extraction is provided using information, in particular the metadata, about data segments and data blocks read only specific parts of the data files instead of the whole files.
  • step b) the readings and the timestamps are converted into the respective data blocks in a differential manner.
  • step b) differences of a current reading and a preceding reading and differences of a current timestamp and a preceding timestamp are stored into the respective data block. Therefore, memory space is saved in the computerized device advantageously.
  • the data blocks are compressed using a certain compression scheme.
  • the compression scheme is based on or corresponds to the Zstandard (see reference [10]).
  • the block information of the block index of a certain one of the data blocks includes: a minimum value of all the readings contained in the certain data block,
  • the segment information of a segment index of a certain one of the data segments includes :
  • the start time of the certain data segment corresponds to the oldest one of the timestamps contained in the data blocks of the certain data segment.
  • the steps a) to f) are executed for each of the sensors.
  • the data request is a data search request including a sensor identification identifying the certain sensor and an amplitude range criterion specifying an amplitude range within readings of the numeric time series are required to match the amplitude range criterion.
  • An amplitude range criterion may be a criterion that specifies a range (value range, amplitude range) within which readings are required to fall for the time series to match the amplitude range criterion. Examples of the amplitude range criterion are criteria such as "between 3000 and 4000 rotations per minute” or "more than 300 megawatts", "less than 300 degrees Celsius”. In other words, the amplitude range criterion may specify at least one of a lower and an upper boundary for readings that match the criterion.
  • the data search request may constitute a digital representation of the amplitude range criterion.
  • the numeric time series is known to match the amplitude range criterion" for the time range determined in step h) (which is also referred to as “matching time range” hereinbelow) may refer to the fact that, based on the index, it is known that the numeric time series comprises at least one reading (also referred to as “matching reading” hereinbelow) that is within the amplitude range specified by the amplitude range criterion and is associated with a time that is within the matching time range.
  • the time range determined in step h) is a time range that includes a time for which the numeric time series is known to include at least one reading within an amplitude range specified by the amplitude range criterion, and excludes a time for which the numeric time series is known not to include any reading within the specified amplitude range.
  • the method includes:
  • the determined data segments are those data segments that are determined to satisfy the amplitude range criterion.
  • the identified data blocks are those data blocks of the determined data segments that are identified to satisfy the amplitude range criterion.
  • the data request is a data extraction request for extracting data
  • the data extraction request includes a sensor identification identifying the certain sensor and a time range criterion
  • the time range criterion includes a start time indicating a time of an oldest reading to be extracted, an end time indicating a time of a newest reading to be extracted and a resolution information indicating a time resolution for the data to be extracted.
  • the resolution information of the data extraction request indicates a lower or equal data density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the certain sensor, only the local storage unit is accessed for answering the data extraction request, and in a second case, if the resolution information of the data extraction request indicates a higher density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the certain sensor, the external storage is accessed for answering the data extraction request.
  • the data size indicators of the data blocks are referenced by the arrays of the data segments which are themselves referenced by the segment indices of the index of the certain sensor.
  • the minimum values and the maximum values of the data blocks referenced by the segment indices of the index allocated to the certain sensor are interpolated for providing extracted data, wherein the response including the extracted data is output .
  • the index stored in the local storage unit is used for providing extraction metadata including a data file name of a data file stored in the external storage, an offset in this data file and a data length of the data to be extracted, wherein the external storage is accessed using the provided extraction metadata for providing extracted data, wherein the response including the extracted data is output.
  • the method further includes discarding the acquired time series at the computerized device .
  • the acquired time series may be not used at the computerized device, it may be favorably discarded after the building of the index in steps a) to f) has been completed, so as to reduce an amount of storage space required at the computerized device.
  • Any embodiment of the first aspect may be combined with any embodiment of the first aspect to obtain another embodiment of the first aspect.
  • a computer program product comprises a program code for executing the above-described method for performing a range search in numeric time series data, when run on at least one computer.
  • a computer program product such as a computer program means, may be embodied as a memory card, USB stick, CD-ROM, DVD or as a file which may be downloaded from a server in a network.
  • a file may be provided by transferring the file comprising the computer program product from a wireless communication network.
  • a computerized device for processing numeric time series data from a number of sensors.
  • the computerized device has a local storage unit and further: a first entity for acquiring, from each of the sensors, a numeric time series including a plurality of readings and associated timestamps,
  • each of the data segments includes a plurality of the data blocks
  • a fifth entity for processing, for each of said sensors, the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index including block information suitable for an index-based range search is created, and, for each of the data segments, one segment index including segment information suitable for the indexed-based range search is created,
  • a seventh entity for receiving a data request including a range criterion for a certain sensor of the sensors
  • an eighth entity for processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion (in the local storage unit and/or in the external storage) , and
  • a ninth entity for outputting a response using the determined range in response to the data request.
  • the computerized device of the present invention may be implemented to carry out the method of the present invention.
  • the respective entity e.g. the first to ninth entity, may be implemented in hardware and/or in software. If said entity is implemented in hardware, it may be embodied as a device, e.g. as a computer or as a processor or as a part of a system, e.g. a computer system. If said entity is implemented in software it may be embodied as a computer program product, as a function, as a routine, as a program code or as an executable obj ect .
  • a system comprising a number of sensors installed in an industrial facility and a computerized device of the third aspect for processing numeric time series data from the sensors.
  • the system may provide a data search function and a data extraction function.
  • the computerized device of said system may include a search engine for providing the data search function and an extraction engine for providing the data extraction function.
  • the system may include a service endpoint for communicating with a number of clients or requesting entities transmitting a data request to the system and awaiting a response to said data request.
  • the search engine may be adapted to scan the index stored in the local storage unit of the computerized device for determining the data segments to be extracted. Moreover, the search engine may calculate the start time and the end time for each of the identified data blocks contained in the identified data segments. This calculation is based on the timestamps included in the respective data blocks. Then, the search engine may create a response including a time-range based on the calculated start times and the calculated end times of all the identified data blocks. The search engine may forward the created response to the service endpoint which transmits said response to requesting client.
  • the extraction engine may selectively access the local storage unit or the local storage unit and the external storage. Further details are described below.
  • the system may use a master node including said local storage unit storing the indices and a number of slave units for answering and processing the data requests.
  • the master node or one of the slave nodes may access the external storage .
  • the external storage may be accessed by the extraction engine for answering the data extraction request.
  • the index stored in local storage of the master unit may be used for providing extraction metadata.
  • Said extraction metadata may be directed to a slave node which is adapted to access the external storage using the extraction metadata.
  • the extraction metadata may include a data file, the name of the data file stored in the external storage, an offset in this data file and a data length of the data to be extracted.
  • Fig. 1 shows a flow chart illustrating steps of a method according to an exemplary embodiment
  • Fig. 2 shows a block diagram of an embodiment a of computerized device according to the exemplary embodiment
  • Fig. 3 shows a diagram illustrating one example of a numeric time series
  • Fig. 4 illustrates an example for data blocks converted from the acquired time series illustrated in Fig. 3
  • Fig. 5 illustrates an example for data segments combined from the data blocks illustrated in Fig. 4;
  • Fig. 6 shows a schematic example of block information of a block index for a data block illustrated in Fig. 4;
  • Fig. 7 shows a schematic example of segment information of a segment index for a data segment illustrated in Fig. 5;
  • Fig. 8 shows a schematic example of an index for a certain sensor including segment indices of Fig. 7 and referenced block indices of Fig. 6;
  • Fig. 9 shows a block diagram of an embodiment a system providing both functions, i.e. data search function and a data extraction function.
  • FIG. 1 shows a flow chart illustrating steps of a method
  • Fig. 2 shows a block diagram of a computerized device 1 according to an exemplary embodiment. Reference will now be made to Fig. 1 and Fig. 2.
  • the computerized device 1 of Fig. 2 is connected to an industrial facility 5 in which a number of sensors 2 is arranged. Without loss of generality, Fig. 2 shows one sensor 2. Further, the computerized device 1 of Fig. 2 is coupled to an external storage 4, for example to a cloud or to a cloud service .
  • the computerized device 1 includes a local storage unit 3 for storing data, a first entity 10, a second entity 20, a third entity 30, a fourth entity 40, a fifth entity 50, a sixth entity 60, a seventh entity 70, an eighth entity 80 and a ninth entity 90.
  • Said entities 10 - 90 are adapted to execute the method steps S10 - S90 of Fig. 1.
  • step S10 a numeric time series S including a plurality of readings v and associate timestamps TS is acquired from each of the sensors 2.
  • This step S10 may be executed by said first entity 10 of Fig. 2.
  • Fig. 3 shows a diagram illustrating one example of a numeric time series S.
  • the x-axis of Fig. 3 shows the time t with the time intervals At.
  • the y-axis of Fig. 3 shows an amplitude, in the example of Fig. 3 a temperature in the industrial facility 5.
  • step S20 the acquired time series S is converted into data blocks DB having a certain binary format.
  • Fig. 4 shows an example for data blocks DB converted from the acquired time series S illustrated in Fig. 3.
  • Each of the N data blocks DB 1 - DB N has a plurality of readings v and associate timestamps TS.
  • the readings v and timestamps TS are converted into the respective data block DB in a differential manner.
  • the data blocks DB may be compressed using a certain compression scheme.
  • step S30 the data blocks DB are combined into data segments DS.
  • Fig. 5 illustrates an example for data segments DS combined from the data blocks DB illustrated in Fig. 4.
  • each of the data segments DS includes a plurality of said data blocks DB.
  • step S40 the data segments DS are stored into the external storage 4.
  • the fourth entity 40 may be adapted to provide said storing of the data segments DS into said external storage 4.
  • step S50 the acquired numeric time series S is processed, for each of said sensors 2 to derive an index I allocated to the sensor 2.
  • the derived index I includes segment indices SI and referenced block indices BI, referenced by said segment indices SI.
  • one block index BI including block information suitable for an index-based range search is created, and, for each of the data segments DS, one segment index SI including segment information suitable for the index-based range search is created.
  • Fig. 6 shows a schematic example of block information of a block index BI for a data block DB as illustrated in Fig. 4.
  • Fig. 7 shows a schematic example of segment information of a segment index SI for a data segment DS as illustrated in Fig. 5 and, furthermore, Fig. 8 shows a schematic example of an index I for certain sensor 2 including segment indices SI of Fig. 7 and reference block indices BI of Fig. 6.
  • the block information of the block index BI for a certain data block DB includes a minimum value MIN of all the readings v contained in the certain data block DB, a maximum value MAX of all the readings v contained in the certain data block DB, a data size indicator SZ indicating a data size of the certain data block DB and an number NUM of elements indicating a number of the readings v contained in the certain data block DB.
  • the segment information of a segment index SI of a certain data segment DS includes a segment identification GID for identifying the certain data segment DS, an array A(BI) containing all the block indices BI of the data blocks DB contained in the certain data segment DS, a start time ST of the certain segment DS, a minimum value MIN of all the readings v contained in the data blocks DB of the certain data segment DS and a maximum value MAX of all the readings v contained in the data blocks DB of the certain data segment DS .
  • step S60 the created index I for the certain sensor is stored in the local storage unit 3 of the computerized device 1. This step S60 may be executed by the sixth entity 60 of
  • a data request DR for example a data search request or a data extraction request, including a range criterion for a certain sensor of the sensors 2 is received.
  • the seventh entity 70 may receive said data request DR and forward it to the eighth entity 80.
  • step S80 the eighth entity 80 may process the index I allocated to the certain sensor 2 by assessing said local storage unit 3 to determine a range for which the numeric time series S is known to match the range criterion. As discussed in detail below, said processing of step S80 may include to firstly access the local storage unit 3 and to further access said external storage 4.
  • step S90 a response R is output in response to the data request DR. Said response R is created using said determined range .
  • Fig. 9 shows a system 100 providing both functions, i.e. data search and data extraction.
  • the system 100 of Fig. 9 includes a computerized device 1.
  • Said computerized device 1 provides all the functionality as described with reference to Fig. 2.
  • said computerized device 1 of Fig. 9 is adapted to create an index I for each of the sensors 2 and to store said created indices I in its local storage unit 3. This functionality is illustrated by circle 1 in Fig. 9.
  • the computerized device 1 comprises a search engine 7 and an extraction engine 8.
  • the system 100 includes a service endpoint 6 which is connectable to a client 9, for example a laptop or a PC.
  • a client 9 sends a data request DR, i.e. a data search request or a data extraction request, to the system 100 and awaits a response R to the data request DR.
  • the functionality for communicating with the client 9 is provided by said service endpoint 6 of the system 100 of Fig. 9.
  • the data request DR is a data search request
  • said data search request includes a sensor identification identifying the certain sensor 2 for which the data has to be extracted and an amplitude range criterion specifying an amplitude range within readings v of the numeric time series S are required to match the amplitude range criterion.
  • the data request is a data search request
  • the service endpoint 6 forwards the received data search request to the search engine 7.
  • the search engine 7 is adapted to scan the index I allocated to the certain sensor 2 and stored in the local storage unit 3 for determining the data segments DS satisfying the amplitude range criterion. Further, data blocks DB are identified which satisfy the amplitude range criterion in the determined data segments DS . Moreover, the search engine 7 calculates a start time and an end time for each of the identified data blocks DB based on the timestamps TS included in the respective data block DB (see Fig. 4) .
  • the search engine 7 creates a response R including a time range based on the calculated start times and the calculated end times of all the identified data blocks DB.
  • the search engine 7 forwards the created response R to the service endpoint 6 which transmits said response R to the requesting client 9.
  • the receiving service endpoint 6 forwards the received data extraction request DR to the extraction engine 8.
  • the data extraction request DR includes a sensor identification identifying the certain sensor 2 and a time range criterion.
  • the time range criterion includes a start time indicating a time of an oldest reading v to be extracted, an end time indicating a time of a newest reading v to be extracted and a resolution information indicating a time resolution for the data to be extracted.
  • the extraction engine 8 interpolates the minimum values MIN and the maximum values MAX of the data blocks DB referenced by the segment indices SI of the index I allocated to said certain sensor for providing extracted data.
  • the extracted data are part of the response R which is transmitted to the requesting client 9.
  • the external storage 3 is accessed by the extraction engine 8 for answering the data extraction request. This is shown by circle 3 in Fig. 9.
  • the index I stored in the local storage 3 is used by the extraction engine 8 for providing extraction metadata.
  • Said extraction metadata may include a data file, the name of a data file stored in the external storage 4, an offset in this data file and a data length of the data to be extracted. Then, the extraction engine 8 accesses the external storage 4 using the extraction metadata for providing extracted data. The extracted data provided by the external storage 4 is then transmitted to the requesting client 9.
  • thermo readings any kind of readings, such as pressure readings, power readings and any other analog or discrete readings, signals and the like.
  • A(BI) array of block indices A (SI ) array of segment indices BI block index

Abstract

The method comprises: acquiring a numeric time series, converting the acquired time series into data blocks, combining the data blocks into data segments, storing the data segments into an external storage, processing the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index is created, and, for each of the data segments, one segment index is created, storing the index in the local storage unit, receiving a data request including a range criterion, processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion in the local storage unit and/or in the external storage, and outputting a response using the determined range in response to the data request.

Description

Description
METHOD AND COMPUTERIZED DEVICE FOR PROCESSING NUMERIC TIME SERIES DATA
The present invention relates to the field of industrial Big Data applications and, particularly, to a method and a computerized device for processing numeric time series data, in particular for performing a range search in numeric time series data acquired in an industrial facility.
Industrial facilities such as power plants are equipped with sensors supplying readings such as pressure or temperature. The readings are stored as time series for later analysis. The amount of data stored for an industrial facility is approaching the tera- or petabyte level. Such numeric time series data is typically stored in data warehouses or in a cloud.
There is a need to analyze the numeric time series data for fault diagnosis, operation monitoring, predictive maintenance and similar purposes. An expert user may need to identify a time interval during which a given sensor supplied readings within a given amplitude range.
A linear scan through all data to identify the applicable time intervals is too costly in terms of data traffic, CPU traffic and takes too long to be of practical use. Known fast search mechanisms such as Google® search are adapted to search in alphanumeric data and do not work well with numeric time series data. A method and computerized device capable of performing an amplitude range search within terabytes of numeric time series data within a sub-second response time is not known .
PCT/RU2018/000373 discloses a method and device for performing an index-based amplitude range search within numeric time series data.
Moreover, conventional methods and devices which may be used for processing numeric time series data are described in references [1] to [11] . It is one object of the present invention to enhance the processing of numeric time series data.
According to a first aspect, a method for processing numeric time series data from a number of sensors by a computerized device including a local storage unit is proposed. The method comprises :
a) acquiring, from each of the sensors, a numeric time series including a plurality of readings and associated timestamps, b) converting the acquired time series into data blocks having a certain binary format,
c) combining the data blocks into data segments, wherein each of the data segments includes a plurality of the data blocks, d) storing the data segments into an external storage being external to the computerized device,
e) processing, for each of said sensors, the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index including block information suitable for an index-based range search is created, and, for each of the data segments, one segment index including segment information suitable for the indexed-based range search is created,
f) storing the index in the local storage unit,
g) receiving a data request including a range criterion for a certain sensor of the sensors,
h) processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion in the local storage unit and/or in the external storage, and
i) outputting a response using the determined range in response to the data request.
It is noted that in step h) , it is the index, and not the numeric time series, that is processed to determine the range that corresponds to the data request. The above steps g) , h) and i) may therefore also be collectively referred to as "performing an index-based range search". Likewise, steps a) to f) may be referred to as "building an index adapted for an index-based range search".
By performing an index-based range search by processing the index rather than performing a direct or linear range search by processing the numeric time series, it may be favorably possible to significantly reduce the processing time for determining the range in response to the data request.
Further, if the numeric time series is stored in the external storage, like a data warehouse or a cloud, since the index is stored in the local storage unit, costly data traffic to the data warehouse or to the cloud may be avoided. Therefore, a cost for responding to the data request may be advantageously reduced .
Specifically, the method may be a computer-implemented method. In particular, the method may be carried out using the computerized device which may include said local storage unit and one or more processing units, such as one or more CPUs. The local storage unit may include a hard disk, solid state disk, RAID storage, and the like.
The index may be a numeric index. The numeric index may be adapted to provide a response to a search request including an amplitude range criterion and/or a time range criterion. Advantageously, the index may comprise information necessary and sufficient to determine the range for which the numeric time series is known to match the range criterion with a predetermined precision. Therefore, advantageously, step h) may be carried out with the predetermined precision without having to process the numeric time series.
The index may favorably require less storage space than the numeric time series itself. Thereby, processing the index in step h) may require less time than processing the numeric time series. The index may thus be regarded as a compressed representation of the numeric time series. The compressed index may be advantageously stored in the local storage unit rather than in the external storage, like a cloud or data warehouse .
A reading may be a value originally provided by one of the sensors installed in an industrial facility, such as a temperature, a pressure, a power output or load, etc. For example, the sensor is a gas turbine sensor and the time series includes pairs of sensed pressure values and associated time stamps. The time stamp may have a milliseconds precision and may be represented by a 64-bit value, e.g. int64. The value may be a real number, e.g. representing the physical pressure parameter, represented by a 32-bit float number, e.g. float 32. Also int8, intl6, int32, int64, float32 and float64 may be used.
The numeric time series may be a non-equidistant numeric time series or an equidistant numeric time series.
In step a) , the numeric time series may be acquired by directly receiving the plurality of readings from the sensors installed in an industrial facility and storing, in the time series, a respective time of reception of each reading as the timestamp associated with the reading (value) .
In steps b) and c) , the acquired time series is split into data blocks and data segments. In particular, the acquired time series is divided into said data blocks by time. Then, the data blocks are grouped into said data segments which may be limited by time duration, and thus have a limited number of data blocks.
In particular, the steps a) to d) are executed repeatedly for different portions of the numeric time series; and the processing step e) to derive the index includes creating the index upon a first execution of the steps a) to d) and includes updating the index upon each subsequent execution of the steps a) to d) .
In particular, in step d) , the data segments including said data blocks are stored into the external storage which may be a cloud or cloud system, for example. Storing said time series data in the form of said data segments in a cloud storage service provides a cost-efficiency solution. Moreover, high- availability and scalability of storing said data segments is achieved by respective qualities of the backing cloud storage service.
Thereby, the time series may be acquired portion-by-portion, and an amount of local storage space at the computerized device required for providing the index for each of the sensors may be reduced. Likewise, easy and efficient updates may be possible when further portions/ further readings are added to the numeric time series over time. In particular, such updates may be advantageously performed without repeating acquisition of the already acquired portions, thereby reducing an amount of data to be transferred for each update.
In step f) , the index for each sensor is advantageously stored in the local storage unit, such as a hard disk of a local workstation, even in a case where the actual numeric time series comprises Big Data that may only be stored in the external storage, like a data warehouse or a cloud.
In step g) , the data request may be received through an input unit, such as a keyboard, connected to a computerized device carrying out the method. Alternatively, the data request may be received via network from another computerized device by a service endpoint. The data request may include a logical expression formed by one or more range criteria and, particularly, one or more logical operators. Thus, the method may advantageously support complex and sophisticated search requests .
A logical operator may be a Boolean operator such as logical "AND", logical "OR" or logical "NOT". For example, complex search requests correlating different readings from different numeric time series may be made possible, such as "temperature greater than 500 degrees Celsius AND power lower than 300 Megawatts") .
In step h) , processing the index may refer to accessing the index based on the range criterion. For example, the range criterion, or an upper or lower boundary comprised therein, may be used in a keyword-like manner to gain fast access to portions of the index containing information about a matching range .
In step i), the determined range may be output by displaying a human readable representation of the determined range on a display device. The determined range may also be output by transmitting a digital representation of the determined range via a wired or wireless network.
It is noted that determining a matching range and outputting the matching range may also comprise determining a plurality of matching ranges and outputting the plurality of matching ranges .
In particular, the index or indices stored in the local storage unit provides fast random-access which allows searching and extracting the data even without accessing the external storage. In particular, a fast value range search is provided over a relatively small number of data segments and data blocks in the index locally stored at the local storage unit. Further, fast time series data extraction is provided using information, in particular the metadata, about data segments and data blocks read only specific parts of the data files instead of the whole files.
According to an embodiment, in step b) , the readings and the timestamps are converted into the respective data blocks in a differential manner.
In particular, in step b) , differences of a current reading and a preceding reading and differences of a current timestamp and a preceding timestamp are stored into the respective data block. Therefore, memory space is saved in the computerized device advantageously.
According to a further embodiment, after step b) and before step c) , the data blocks are compressed using a certain compression scheme.
In particular, the compression scheme is based on or corresponds to the Zstandard (see reference [10]). According to a further embodiment, the block information of the block index of a certain one of the data blocks includes: a minimum value of all the readings contained in the certain data block,
a maximum value of all the readings contained in the certain data block,
a data size indicator indicating a data size of the certain data block, and
an element number indicating a number of the readings contained in the certain data block.
According to a further embodiment, the segment information of a segment index of a certain one of the data segments includes :
a segment identification for identifying the certain data segment,
an array containing all the block indices of the data blocks contained in the certain data segment,
a start time of the certain data segment,
a minimum value of the all the readings contained in the data blocks of the certain data segment, and
a maximum value of the all the readings contained in the data blocks of the certain data segment.
In particular, the start time of the certain data segment corresponds to the oldest one of the timestamps contained in the data blocks of the certain data segment.
According to a further embodiment, the steps a) to f) are executed for each of the sensors.
According to a further embodiment, the data request is a data search request including a sensor identification identifying the certain sensor and an amplitude range criterion specifying an amplitude range within readings of the numeric time series are required to match the amplitude range criterion. An amplitude range criterion may be a criterion that specifies a range (value range, amplitude range) within which readings are required to fall for the time series to match the amplitude range criterion. Examples of the amplitude range criterion are criteria such as "between 3000 and 4000 rotations per minute" or "more than 300 megawatts", "less than 300 degrees Celsius". In other words, the amplitude range criterion may specify at least one of a lower and an upper boundary for readings that match the criterion.
The data search request may constitute a digital representation of the amplitude range criterion.
The fact that "the numeric time series is known to match the amplitude range criterion" for the time range determined in step h) (which is also referred to as "matching time range" hereinbelow) may refer to the fact that, based on the index, it is known that the numeric time series comprises at least one reading (also referred to as "matching reading" hereinbelow) that is within the amplitude range specified by the amplitude range criterion and is associated with a time that is within the matching time range.
In particular, the time range determined in step h) is a time range that includes a time for which the numeric time series is known to include at least one reading within an amplitude range specified by the amplitude range criterion, and excludes a time for which the numeric time series is known not to include any reading within the specified amplitude range.
According to a further embodiment, the method includes:
scanning the index allocated to the certain sensor and stored in the local storage unit for determining the data segments satisfying the amplitude range criterion,
identifying data blocks satisfying the amplitude range criterion in the determined data segments,
calculating a start time and an end time for each of the identified data blocks based on the timestamps included in the respective data block, and outputting the response including a time range based on the calculated start times and the calculated end times of all the identified data blocks.
The determined data segments are those data segments that are determined to satisfy the amplitude range criterion. Moreover, the identified data blocks are those data blocks of the determined data segments that are identified to satisfy the amplitude range criterion.
According to a further embodiment, the data request is a data extraction request for extracting data, wherein the data extraction request includes a sensor identification identifying the certain sensor and a time range criterion, wherein the time range criterion includes a start time indicating a time of an oldest reading to be extracted, an end time indicating a time of a newest reading to be extracted and a resolution information indicating a time resolution for the data to be extracted.
According to a further embodiment, in a first case, if the resolution information of the data extraction request indicates a lower or equal data density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the certain sensor, only the local storage unit is accessed for answering the data extraction request, and in a second case, if the resolution information of the data extraction request indicates a higher density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the certain sensor, the external storage is accessed for answering the data extraction request.
In particular, the data size indicators of the data blocks are referenced by the arrays of the data segments which are themselves referenced by the segment indices of the index of the certain sensor.
According to a further embodiment, in the first case, the minimum values and the maximum values of the data blocks referenced by the segment indices of the index allocated to the certain sensor are interpolated for providing extracted data, wherein the response including the extracted data is output .
According to a further embodiment, in the second case, the index stored in the local storage unit is used for providing extraction metadata including a data file name of a data file stored in the external storage, an offset in this data file and a data length of the data to be extracted, wherein the external storage is accessed using the provided extraction metadata for providing extracted data, wherein the response including the extracted data is output.
According to a further embodiment, the method further includes discarding the acquired time series at the computerized device .
Since the acquired time series may be not used at the computerized device, it may be favorably discarded after the building of the index in steps a) to f) has been completed, so as to reduce an amount of storage space required at the computerized device.
Any embodiment of the first aspect may be combined with any embodiment of the first aspect to obtain another embodiment of the first aspect.
According to a second aspect, a computer program product comprises a program code for executing the above-described method for performing a range search in numeric time series data, when run on at least one computer.
A computer program product, such as a computer program means, may be embodied as a memory card, USB stick, CD-ROM, DVD or as a file which may be downloaded from a server in a network. For example, such a file may be provided by transferring the file comprising the computer program product from a wireless communication network.
According to a third aspect, a computerized device for processing numeric time series data from a number of sensors is proposed. The computerized device has a local storage unit and further: a first entity for acquiring, from each of the sensors, a numeric time series including a plurality of readings and associated timestamps,
a second entity for converting the acquired time series into data blocks having a certain binary format,
a third entity for combining the data blocks into data segments, wherein each of the data segments includes a plurality of the data blocks,
a fourth entity for storing the data segments into an external storage being external to the computerized device,
a fifth entity for processing, for each of said sensors, the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index including block information suitable for an index-based range search is created, and, for each of the data segments, one segment index including segment information suitable for the indexed-based range search is created,
a sixth entity for storing the index in the local storage unit ,
a seventh entity for receiving a data request including a range criterion for a certain sensor of the sensors,
an eighth entity for processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion (in the local storage unit and/or in the external storage) , and
a ninth entity for outputting a response using the determined range in response to the data request.
The embodiments and features described with reference to the method of the present invention apply mutatis mutandis to the computerized device of the present invention. Specifically, the computerized device of the present invention may be implemented to carry out the method of the present invention. The respective entity, e.g. the first to ninth entity, may be implemented in hardware and/or in software. If said entity is implemented in hardware, it may be embodied as a device, e.g. as a computer or as a processor or as a part of a system, e.g. a computer system. If said entity is implemented in software it may be embodied as a computer program product, as a function, as a routine, as a program code or as an executable obj ect .
According to a fourth aspect, a system is suggested comprising a number of sensors installed in an industrial facility and a computerized device of the third aspect for processing numeric time series data from the sensors.
The system may provide a data search function and a data extraction function. In this regard, the computerized device of said system may include a search engine for providing the data search function and an extraction engine for providing the data extraction function. Further, the system may include a service endpoint for communicating with a number of clients or requesting entities transmitting a data request to the system and awaiting a response to said data request.
For example, the search engine may be adapted to scan the index stored in the local storage unit of the computerized device for determining the data segments to be extracted. Moreover, the search engine may calculate the start time and the end time for each of the identified data blocks contained in the identified data segments. This calculation is based on the timestamps included in the respective data blocks. Then, the search engine may create a response including a time-range based on the calculated start times and the calculated end times of all the identified data blocks. The search engine may forward the created response to the service endpoint which transmits said response to requesting client.
In the case of a data extraction request, the extraction engine may selectively access the local storage unit or the local storage unit and the external storage. Further details are described below. Moreover, the system may use a master node including said local storage unit storing the indices and a number of slave units for answering and processing the data requests. In case of extraction from the external storage, the master node or one of the slave nodes may access the external storage .
For example, if the requested resolution of the received data extraction request indicates a higher density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the requested sensor, the external storage may be accessed by the extraction engine for answering the data extraction request. In this case, the index stored in local storage of the master unit may be used for providing extraction metadata. Said extraction metadata may be directed to a slave node which is adapted to access the external storage using the extraction metadata. The extraction metadata may include a data file, the name of the data file stored in the external storage, an offset in this data file and a data length of the data to be extracted.
Further possible implementations or alternative solutions of the invention also encompass combinations - that are not explicitly mentioned herein - of features described above or below with regard to the embodiments. The person skilled in the art may also add individual or isolated aspects and features to the most basic form of the invention.
Further embodiments, features and advantages of the present invention will become apparent from the subsequent description and dependent claims, taken in conjunction with the accompanying drawings, in which:
Fig. 1 shows a flow chart illustrating steps of a method according to an exemplary embodiment;
Fig. 2 shows a block diagram of an embodiment a of computerized device according to the exemplary embodiment;
Fig. 3 shows a diagram illustrating one example of a numeric time series;
Fig. 4 illustrates an example for data blocks converted from the acquired time series illustrated in Fig. 3; Fig. 5 illustrates an example for data segments combined from the data blocks illustrated in Fig. 4;
Fig. 6 shows a schematic example of block information of a block index for a data block illustrated in Fig. 4;
Fig. 7 shows a schematic example of segment information of a segment index for a data segment illustrated in Fig. 5;
Fig. 8 shows a schematic example of an index for a certain sensor including segment indices of Fig. 7 and referenced block indices of Fig. 6; and
Fig. 9 shows a block diagram of an embodiment a system providing both functions, i.e. data search function and a data extraction function.
In the Figures, like reference numerals designate like or functionally equivalent elements, unless otherwise indicated.
Fig. 1 shows a flow chart illustrating steps of a method, and Fig. 2 shows a block diagram of a computerized device 1 according to an exemplary embodiment. Reference will now be made to Fig. 1 and Fig. 2.
The computerized device 1 of Fig. 2 is connected to an industrial facility 5 in which a number of sensors 2 is arranged. Without loss of generality, Fig. 2 shows one sensor 2. Further, the computerized device 1 of Fig. 2 is coupled to an external storage 4, for example to a cloud or to a cloud service .
The computerized device 1 includes a local storage unit 3 for storing data, a first entity 10, a second entity 20, a third entity 30, a fourth entity 40, a fifth entity 50, a sixth entity 60, a seventh entity 70, an eighth entity 80 and a ninth entity 90.
Said entities 10 - 90 are adapted to execute the method steps S10 - S90 of Fig. 1.
In step S10, a numeric time series S including a plurality of readings v and associate timestamps TS is acquired from each of the sensors 2. This step S10 may be executed by said first entity 10 of Fig. 2. In this regard, Fig. 3 shows a diagram illustrating one example of a numeric time series S. Here, the x-axis of Fig. 3 shows the time t with the time intervals At. The y-axis of Fig. 3 shows an amplitude, in the example of Fig. 3 a temperature in the industrial facility 5.
In step S20, the acquired time series S is converted into data blocks DB having a certain binary format. Here, Fig. 4 shows an example for data blocks DB converted from the acquired time series S illustrated in Fig. 3. Each of the N data blocks DB 1 - DB N has a plurality of readings v and associate timestamps TS. In particular, in step S20, the readings v and timestamps TS are converted into the respective data block DB in a differential manner. Furthermore, the data blocks DB may be compressed using a certain compression scheme.
In step S30, the data blocks DB are combined into data segments DS. Here, Fig. 5 illustrates an example for data segments DS combined from the data blocks DB illustrated in Fig. 4. As shown in Fig. 5, each of the data segments DS includes a plurality of said data blocks DB.
In step S40, the data segments DS are stored into the external storage 4. With reference to Fig. 2, the fourth entity 40 may be adapted to provide said storing of the data segments DS into said external storage 4.
In step S50, the acquired numeric time series S is processed, for each of said sensors 2 to derive an index I allocated to the sensor 2. The derived index I includes segment indices SI and referenced block indices BI, referenced by said segment indices SI. In this regard, for each of the data blocks DB, one block index BI including block information suitable for an index-based range search is created, and, for each of the data segments DS, one segment index SI including segment information suitable for the index-based range search is created.
Details for creating said index I, said block indices BI and said segment indices SI are described with reference to Figs. 6 - 8 in the following. In this regard, Fig. 6 shows a schematic example of block information of a block index BI for a data block DB as illustrated in Fig. 4. Fig. 7 shows a schematic example of segment information of a segment index SI for a data segment DS as illustrated in Fig. 5 and, furthermore, Fig. 8 shows a schematic example of an index I for certain sensor 2 including segment indices SI of Fig. 7 and reference block indices BI of Fig. 6.
Moreover, with reference to Fig. 6, the block information of the block index BI for a certain data block DB includes a minimum value MIN of all the readings v contained in the certain data block DB, a maximum value MAX of all the readings v contained in the certain data block DB, a data size indicator SZ indicating a data size of the certain data block DB and an number NUM of elements indicating a number of the readings v contained in the certain data block DB.
Further, with reference to Fig. 7, the segment information of a segment index SI of a certain data segment DS includes a segment identification GID for identifying the certain data segment DS, an array A(BI) containing all the block indices BI of the data blocks DB contained in the certain data segment DS, a start time ST of the certain segment DS, a minimum value MIN of all the readings v contained in the data blocks DB of the certain data segment DS and a maximum value MAX of all the readings v contained in the data blocks DB of the certain data segment DS .
In step S60, the created index I for the certain sensor is stored in the local storage unit 3 of the computerized device 1. This step S60 may be executed by the sixth entity 60 of
Fig. 2.
In step S70, a data request DR, for example a data search request or a data extraction request, including a range criterion for a certain sensor of the sensors 2 is received. With reference to Fig. 2, the seventh entity 70 may receive said data request DR and forward it to the eighth entity 80.
In step S80, the eighth entity 80 may process the index I allocated to the certain sensor 2 by assessing said local storage unit 3 to determine a range for which the numeric time series S is known to match the range criterion. As discussed in detail below, said processing of step S80 may include to firstly access the local storage unit 3 and to further access said external storage 4.
In step S90, a response R is output in response to the data request DR. Said response R is created using said determined range .
As indicated above, said data request DR may be a data search request or a data extraction request. In this regard, Fig. 9 shows a system 100 providing both functions, i.e. data search and data extraction. The system 100 of Fig. 9 includes a computerized device 1. Said computerized device 1 provides all the functionality as described with reference to Fig. 2. In particular, said computerized device 1 of Fig. 9 is adapted to create an index I for each of the sensors 2 and to store said created indices I in its local storage unit 3. This functionality is illustrated by circle 1 in Fig. 9.
Moreover, the computerized device 1 comprises a search engine 7 and an extraction engine 8.
Moreover, the system 100 includes a service endpoint 6 which is connectable to a client 9, for example a laptop or a PC. In the example of Fig. 9, the client 9 sends a data request DR, i.e. a data search request or a data extraction request, to the system 100 and awaits a response R to the data request DR. The functionality for communicating with the client 9 is provided by said service endpoint 6 of the system 100 of Fig. 9.
For the example, the data request DR is a data search request, said data search request includes a sensor identification identifying the certain sensor 2 for which the data has to be extracted and an amplitude range criterion specifying an amplitude range within readings v of the numeric time series S are required to match the amplitude range criterion.
For said example, the data request is a data search request, the service endpoint 6 forwards the received data search request to the search engine 7. The search engine 7 is adapted to scan the index I allocated to the certain sensor 2 and stored in the local storage unit 3 for determining the data segments DS satisfying the amplitude range criterion. Further, data blocks DB are identified which satisfy the amplitude range criterion in the determined data segments DS . Moreover, the search engine 7 calculates a start time and an end time for each of the identified data blocks DB based on the timestamps TS included in the respective data block DB (see Fig. 4) .
Moreover, the search engine 7 creates a response R including a time range based on the calculated start times and the calculated end times of all the identified data blocks DB. The search engine 7 forwards the created response R to the service endpoint 6 which transmits said response R to the requesting client 9.
Furthermore, if the data request DR is a data extraction request, the receiving service endpoint 6 forwards the received data extraction request DR to the extraction engine 8. The data extraction request DR includes a sensor identification identifying the certain sensor 2 and a time range criterion. The time range criterion includes a start time indicating a time of an oldest reading v to be extracted, an end time indicating a time of a newest reading v to be extracted and a resolution information indicating a time resolution for the data to be extracted.
In a first case, if the resolution information of the data extraction request DR indicates a lower or equal data density as indicated by the data size indicators SZ of the data blocks DB referenced by the segment indices SI of the index I allocated to the certain sensor 2, only the local storage 3 is accessed by the extraction engine 8 for answering the data extraction request DR. This is shown by a circle 2 in Fig. 9.
In the first case, the extraction engine 8 interpolates the minimum values MIN and the maximum values MAX of the data blocks DB referenced by the segment indices SI of the index I allocated to said certain sensor for providing extracted data. The extracted data are part of the response R which is transmitted to the requesting client 9.
In a second case, if the resolution information of the data extraction request DR indicates a higher density as indicated by the data size indicators SZ of the data blocks DB referenced by the segment indices SI of the index I allocated to said certain sensor 2, the external storage 3 is accessed by the extraction engine 8 for answering the data extraction request. This is shown by circle 3 in Fig. 9.
In the second case, the index I stored in the local storage 3 is used by the extraction engine 8 for providing extraction metadata. Said extraction metadata may include a data file, the name of a data file stored in the external storage 4, an offset in this data file and a data length of the data to be extracted. Then, the extraction engine 8 accesses the external storage 4 using the extraction metadata for providing extracted data. The extracted data provided by the external storage 4 is then transmitted to the requesting client 9.
Although the present invention has been described in accordance with an exemplary embodiment, an exemplary use case and preferred variants thereof, it is obvious for the person skilled in the art that modifications are possible in all embodiments, use cases and variants.
The exemplary embodiment and its variants mainly referred to temperature readings, however, the proposed method and computerized device may be used with any kind of readings, such as pressure readings, power readings and any other analog or discrete readings, signals and the like. Reference Numerals:
S10-S90 method steps
1 computerized device 2 sensor
3 local storage unit
4 external storage
5 industrial facility-
6 service endpoint
7 search engine
8 extraction engine
9 client
10 first entity
20 second entity
30 third entity
40 fourth entity
50 fifth entity
60 sixth entity
70 seventh entity
80 eighth entity
90 ninth entity
A(BI) array of block indices A (SI ) array of segment indices BI block index
BTI block time interval DB data block DR data request
I index
MIN minimum value
MAX maximum value
NUM number of element R response
SI segment index
ST start time
SZ data size indicator SID sensor ID
STI segment time interval t time
At time intervals
TS timestamp
v value, reading
References :
[1] Yang, Fangjin, et al. "Druid: A real-time analytical data store." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014
[2] Benchmarking Druid [Electronic resource] . URL:
http: / /druid. io/blog/2014/03/17/benchmarkingdruid.html
[3] Apache Cassandra NoSQL Performance Benchmarks
[Electronic resource] . URL: https : / /academy. datastax. com/planet-cassandra/nosql- performance-benchmarks
[4] Apache Ignite and Apache Cassandra Benchmarks: The
Power of In-Memory Computing [Electronic resource] . URL: https : //dzone . com/articles/apachereg-ignite-and- apachereg-cassandrabenchmark
[5] Time series Database Benchmarks [Electronic resource] .
URL : https : / /blog . outIyer . com/timeseries-database- benchmarks
[6] Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark [Electronic resource] . URL : https : //www . percona . com/blog/2017/03/17 /column- store-databasebenchmarks-mariadb-columnstore-vs- clickhouse-vs-apache-spark/
[7] ClickHouse vs Amazon RedShift Benchmark [Electronic resource] . URL: https : / /www. altinity . com/blog/2017/ 6/20/clickhouse-vs- redshift
[8] Distinctive Features of ClickHouse [Electronic resource] . URL: https : //clickhouse . yandex/docs/en/introduction/distinct ive features/ [9] Interactive Real-Time Visualization for Streaming Data [Electronic resource] . URL: http : //openproceedings . org/2017/conf/edbt /paper-276. pdf
[10] Zstandard, a real-time compression algorithm
[Electronic resource] . URL: https : / /facebook. github. io/zstd/
[11] AWS Documentation, Amazon Simple Storage Service (S3),
API Reference, Operations on Objects, GET Object [Electronic resource] . URL: https : //docs . aws . amazon . com/AmazonS3 /latest /API /RESTObj ectGET . html

Claims

Patent claims
1. A method for processing numeric time series data from a number of sensors (2) by a computerized device (1) including a local storage unit (3) , the method comprising:
a) acquiring (S10) , from each of the sensors (2), a numeric time series (S) including a plurality of readings (v) and associated timestamps (TS) ,
b) converting (S20) the acquired time series (S) into data blocks (DB) having a certain binary format,
c) combining (S30) the data blocks (DB) into data segments (DS) , wherein each of the data segments (DS) includes a plurality of the data blocks (DB) ,
d) storing (S40) the data segments (DS) into an external storage (4) being external to the computerized device (1), e) processing (S50), for each of said sensors (2), the acquired numeric time series (S) to derive an index (I) allocated to the sensor (2), the derived index (I) including segment indices (SI) and referenced block indices (BI) , wherein, for each of the data blocks (DB), one block index (BI) including block information suitable for an index-based range search is created, and, for each of the data segments (DS) , one segment index (SI) including segment information suitable for the indexed-based range search is created,
f) storing (S60) the index (I) in the local storage unit (3) , g) receiving (S70) a data request (DR) including a range criterion for a certain sensor of the sensors (2),
h) processing (S80) the index (I) allocated to the certain sensor (2) to determine a range for which the numeric time series (S) is known to match the range criterion, and
i) outputting (S90) a response (R) using the determined range in response to the data request (DR) .
2. The method of claim 1, characterized in that, in step b) , the readings (v) and the timestamps (TS) are converted into the respective data blocks (DB) in a differential manner.
3. The method of claim 1 or 2, characterized in that, after step b) and before step c) , the data blocks (DB) are compressed using a certain compression scheme.
4. The method of any of claims 1 to 3, characterized in that the block information of the block index (BI) of a certain one of the data blocks (DB) includes
a minimum value (MIN) of all the readings (v) contained in the certain data block (DB) ,
a maximum value (MAX) of all the readings (v) contained in the certain data block (DB) ,
a data size indicator (SZ) indicating a data size of the certain data block (DB) , and
an element number (NUM) indicating a number of the readings (v) contained in the certain data block (DB) .
5. The method of any claims 1 to 4, characterized in that the segment information of a segment index (SI) of a certain one of the data segments (DS) includes
a segment identification (GID) for identifying the certain data segment (DS),
an array (A(BI)) containing all the block indices (BI) of the data blocks (DB) contained in the certain data segment (DS) ,
a start time (ST) of the certain data segment (DS) , a minimum value (MIN) of the all the readings (v) contained in the data blocks (DB) of the certain data segment (DS) , and
a maximum value (MAX) of the all the readings (v) contained in the data blocks (DB) of the certain data segment (DS) .
6. The method of any of claims 1 to 5, characterized in that the steps a) to f) are executed for each of the sensors (2).
7. The method of any of claims 1 to 6, characterized in that the data request (DR) is a data search request including a sensor identification identifying the certain sensor and an amplitude range criterion specifying an amplitude range within readings (v) of the numeric time series (S) that are required to match the amplitude range criterion.
8. The method of claim 7, characterized by:
scanning the index (I) allocated to the certain sensor (2) and stored in the local storage unit (3) for determining the data segments (DS) satisfying the amplitude range criterion, identifying data blocks (DB) satisfying the amplitude range criterion in the determined data segments (DS) ,
calculating a start time and an end time for each of the identified data blocks (DB) based on the timestamps (TS) included in the respective data block (DB), and
outputting the response (R) including a time range based on the calculated start times and the calculated end times of all the identified data blocks (DB) .
9. The method of any of claims 1 to 6, characterized in that the data request (DR) is a data extraction request for extracting data, wherein the data extraction request includes a sensor identification identifying the certain sensor (2) and a time range criterion, wherein the time range criterion includes a start time indicating a time of an oldest reading (v) to be extracted, an end time indicating a time of a newest reading (v) to be extracted and a resolution information indicating a time resolution for the data to be extracted.
10. The method of claim 9, characterized in
that in a first case, if the resolution information of the data extraction request indicates a lower or equal data density as indicated by the data size indicators (SZ) of the data blocks (DB) referenced by the segment indices (SI) of the index (I) allocated to the certain sensor (2), only the local storage unit (3) is accessed for answering the data extraction request ,
and in a second case, if the resolution information of the data extraction request indicates a higher density as indicated by the data size indicators (S3) of the data blocks (DB) referenced by the segment indices (SI) of the index (I) allocated to the certain sensor (2), the external storage (3) is accessed for answering the data extraction request.
11. The method of claim 10, characterized in that, in the first case, the minimum values (MIN) and the maximum values (MAX) of the data blocks (DB) referenced by the segment indices (SI) of the index (I) allocated to the certain sensor (2) are interpolated for providing extracted data, wherein the response (R) including the extracted data is output.
12. The method of claim 10 or 11, characterized in
that, in the second case, the index (I) stored in the local storage unit (3) is used for providing extraction metadata including a data file name of a data file stored in the external storage (4), an offset in this data file and a data length of the data to be extracted,
wherein the external storage (4) is accessed using the provided extraction metadata for providing extracted data, wherein the response (R) including the extracted data is output .
13. A computer program product comprising a program code for executing the method of any of claims 1 to 12 when run on at least one computer.
14. A computerized device (1) for processing numeric time series data from a number of sensors (2) , the computerized device (1) comprising a local storage unit (3) and further comprising :
a first entity (10) for acquiring, from each of the sensors (2), a numeric time series (S) including a plurality of readings (v) and associated timestamps (TS) , a second entity (20) for converting the acquired time series (S) into data blocks (DB) having a certain binary format,
a third entity (30) for combining the data blocks (DB) into data segments (DS) , wherein each of the data segments (DS) includes a plurality of the data blocks (DB) ,
a fourth entity (40) for storing the data segments (DS) into an external storage (4) being external to the computerized device (1),
a fifth entity (50) for processing, for each of said sensors (2) , the acquired numeric time series (S) to derive an index (I) allocated to the sensor (2), the derived index (I) including segment indices (SI) and referenced block indices (BI), wherein, for each of the data blocks (DB) , one block index (BI) including block information suitable for an index- based range search is created, and, for each of the data segments (DS) , one segment index (SI) including segment information suitable for the indexed-based range search is created,
a sixth entity (60) for storing the index (I) in the local storage unit ( 3 ) ,
a seventh entity (70) for receiving a data request (DR) including a range criterion for a certain sensor of the sensors ( 2 ) ,
an eighth entity (80) for processing the index (I) allocated to the certain sensor (2) to determine a range for which the numeric time series (S) is known to match the range criterion, and
a ninth entity (90) for outputting a response (R) using the determined range in response to the data request (DR) .
15. A system comprising a number of sensors installed in an industrial facility (5) and a computerized device (1) of claim 14 for processing numeric time series data from the sensors (2) .
PCT/RU2019/000055 2019-01-30 2019-01-30 Method and computerized device for processing numeric time series data WO2020159397A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/RU2019/000055 WO2020159397A1 (en) 2019-01-30 2019-01-30 Method and computerized device for processing numeric time series data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2019/000055 WO2020159397A1 (en) 2019-01-30 2019-01-30 Method and computerized device for processing numeric time series data

Publications (1)

Publication Number Publication Date
WO2020159397A1 true WO2020159397A1 (en) 2020-08-06

Family

ID=65904517

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2019/000055 WO2020159397A1 (en) 2019-01-30 2019-01-30 Method and computerized device for processing numeric time series data

Country Status (1)

Country Link
WO (1) WO2020159397A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988916A (en) * 2021-03-05 2021-06-18 杭州天阙科技有限公司 Full and incremental synchronization method, device and storage medium for Clickhouse
CN114449011A (en) * 2021-12-21 2022-05-06 武汉中海庭数据技术有限公司 Data analysis and time sequence broadcasting method and system of multi-source fusion positioning system
WO2023024247A1 (en) * 2021-08-26 2023-03-02 苏州浪潮智能科技有限公司 Range query method, apparatus and device for tag data, and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150120749A1 (en) * 2013-10-30 2015-04-30 Microsoft Corporation Data management for connected devices

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150120749A1 (en) * 2013-10-30 2015-04-30 Microsoft Corporation Data management for connected devices

Non-Patent Citations (13)

* Cited by examiner, † Cited by third party
Title
APACHE CASSANDRA NOSQL PERFORMANCE BENCHMARKS [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks>
APACHE IGNITE AND APACHE CASSANDRA BENCHMARKS: THE POWER OF IN-MEMORY COMPUTING [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://dzone.com/articles/apachereg-ignite-and-apachereg-cassandrabenchmark>
ASSFALG JOHANNES ET AL: "Periodic Pattern Analysis in Time Series Databases", 21 April 2009, IMAGE ANALYSIS AND RECOGNITION : 11TH INTERNATIONAL CONFERENCE, ICIAR 2014, VILAMOURA, PORTUGAL, OCTOBER 22-24, 2014, PROCEEDINGS, PART I; IN: LECTURE NOTES IN COMPUTER SCIENCE , ISSN 1611-3349 ; VOL. 8814; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NO, ISBN: 978-3-642-17318-9, XP047401848 *
AWS DOCUMENTATION, AMAZON SIMPLE STORAGE SERVICE (S3), API REFERENCE, OPERATIONS ON OBJECTS, GET OBJECT [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObj ectGET.html>
BENCHMARKING DRUID [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:http://druid.io/blog/2014/03/17/benchmarkingdruid.html>
CLICKHOUSE VS AMAZON REDSHIFT BENCHMARK [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://www.altinity.com/blog/2017/6/20/clickhouse-vs-redshift>
COLUMN STORE DATABASE BENCHMARKS: MARIADB COLUMNSTORE VS. CLICKHOUSE VS. APACHE SPARK [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://www.percona.com/blog/2017/03/17/columnstore-databasebenchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark>
DISTINCTIVE FEATURES OF CLICKHOUSE [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://clickhouse.yandex/docs/en/introduction/distinct ive features>
EUGENE SIOW ET AL: "TritanDB: Time-series Rapid Internet of Things Analytics", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 24 January 2018 (2018-01-24), XP081209380 *
INTERACTIVE REAL-TIME VISUALIZATION FOR STREAMING DATA [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:http://openproceedings.org/2017/conf/edbt/paper-276.pdf>
TIME SERIES DATABASE BENCHMARKS [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://blog.outlyer.com/timeseries-database-benchmarks>
YANG, FANGJIN ET AL.: "Proceedings of the 2014 ACM SIGMOD international conference on Management of data", 2014, ACM, article "Druid: A real-time analytical data store"
ZSTANDARD, A REAL-TIME COMPRESSION ALGORITHM [ELECTRONIC RESOURCE, Retrieved from the Internet <URL:https://facebook.github.io/zstd>

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988916A (en) * 2021-03-05 2021-06-18 杭州天阙科技有限公司 Full and incremental synchronization method, device and storage medium for Clickhouse
CN112988916B (en) * 2021-03-05 2023-06-16 杭州天阙科技有限公司 Full and incremental synchronization method, apparatus and storage medium for Clickhouse
WO2023024247A1 (en) * 2021-08-26 2023-03-02 苏州浪潮智能科技有限公司 Range query method, apparatus and device for tag data, and storage medium
CN114449011A (en) * 2021-12-21 2022-05-06 武汉中海庭数据技术有限公司 Data analysis and time sequence broadcasting method and system of multi-source fusion positioning system
CN114449011B (en) * 2021-12-21 2023-06-02 武汉中海庭数据技术有限公司 Data analysis and time sequence broadcasting method and system of multi-source fusion positioning system

Similar Documents

Publication Publication Date Title
US10176208B2 (en) Processing time series data from multiple sensors
US8954377B1 (en) Data pre-processing and indexing for efficient retrieval and enhanced presentation
WO2020159397A1 (en) Method and computerized device for processing numeric time series data
US9842134B2 (en) Data query interface system in an event historian
KR20220108186A (en) Method and apparatus for storing and querying time series data, and server and storage medium thereof
EP3384391B1 (en) Real-time change data from disparate sources
US20130139167A1 (en) Identification of Thread Progress Information
CA3167981C (en) Offloading statistics collection
CN112613271A (en) Data paging method and device, computer equipment and storage medium
CN114911830A (en) Index caching method, device, equipment and storage medium based on time sequence database
CN112506969A (en) BMC address query method, system, equipment and readable storage medium
US20050108327A1 (en) Hit ratio estimation device, hit ratio estimation method, hit ration estimation program and recording medium
EP2626796A1 (en) File list generation method, system, and program, and file list generation device
JP2016024486A (en) Data utilization system and control method therefor
CN112235358A (en) Data acquisition method and device, electronic equipment and computer readable storage medium
CN103324567A (en) App engine debugging method and debugging system
CN110543509B (en) Monitoring system, method and device for user access data and electronic equipment
CN116136794A (en) Information processing system, information processing method, and recording medium storing information processing program
KR101785166B1 (en) Selective DB Configuration Method in accordance with Data Type and System applying the same
CN113553341A (en) Multidimensional data analysis method, multidimensional data analysis device, multidimensional data analysis equipment and computer readable storage medium
EP3835970A1 (en) A method of visualizing a time series relation
CN110647533A (en) Data monitoring method, system and storage medium for structure monitoring system
WO2019137712A1 (en) Industrial process data processing
CN113419936B (en) Dynamic point burying method, device, equipment and storage medium
Melchiades et al. Fastiot: an efficient and very fast compression model for displaying a huge volume of iot data in web environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19713228

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19713228

Country of ref document: EP

Kind code of ref document: A1