WO2020159397A1

WO2020159397A1 - Method and computerized device for processing numeric time series data

Info

Publication number: WO2020159397A1
Application number: PCT/RU2019/000055
Authority: WO
Inventors: Yury Vladimirovich KUZNETCOV; Denis NASONOV; Alexander Fleksandrovich VISHERATIN; Ksenia Dmitrievna MUKHINA; Gali Ketema MBOGO
Original assignee: Siemens Aktiengesellschaft
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2020-08-06

Abstract

The method comprises: acquiring a numeric time series, converting the acquired time series into data blocks, combining the data blocks into data segments, storing the data segments into an external storage, processing the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index is created, and, for each of the data segments, one segment index is created, storing the index in the local storage unit, receiving a data request including a range criterion, processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion in the local storage unit and/or in the external storage, and outputting a response using the determined range in response to the data request.

Description

METHOD AND COMPUTERIZED DEVICE FOR PROCESSING NUMERIC TIME SERIES DATA

The present invention relates to the field of industrial Big Data applications and, particularly, to a method and a computerized device for processing numeric time series data, in particular for performing a range search in numeric time series data acquired in an industrial facility.

Industrial facilities such as power plants are equipped with sensors supplying readings such as pressure or temperature. The readings are stored as time series for later analysis. The amount of data stored for an industrial facility is approaching the tera- or petabyte level. Such numeric time series data is typically stored in data warehouses or in a cloud.

There is a need to analyze the numeric time series data for fault diagnosis, operation monitoring, predictive maintenance and similar purposes. An expert user may need to identify a time interval during which a given sensor supplied readings within a given amplitude range.

A linear scan through all data to identify the applicable time intervals is too costly in terms of data traffic, CPU traffic and takes too long to be of practical use. Known fast search mechanisms such as Google® search are adapted to search in alphanumeric data and do not work well with numeric time series data. A method and computerized device capable of performing an amplitude range search within terabytes of numeric time series data within a sub-second response time is not known .

PCT/RU2018/000373 discloses a method and device for performing an index-based amplitude range search within numeric time series data.

Moreover, conventional methods and devices which may be used for processing numeric time series data are described in references [1] to [11] . It is one object of the present invention to enhance the processing of numeric time series data.

According to a first aspect, a method for processing numeric time series data from a number of sensors by a computerized device including a local storage unit is proposed. The method comprises :

a) acquiring, from each of the sensors, a numeric time series including a plurality of readings and associated timestamps, b) converting the acquired time series into data blocks having a certain binary format,

c) combining the data blocks into data segments, wherein each of the data segments includes a plurality of the data blocks, d) storing the data segments into an external storage being external to the computerized device,

e) processing, for each of said sensors, the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index including block information suitable for an index-based range search is created, and, for each of the data segments, one segment index including segment information suitable for the indexed-based range search is created,

f) storing the index in the local storage unit,

g) receiving a data request including a range criterion for a certain sensor of the sensors,

h) processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion in the local storage unit and/or in the external storage, and

i) outputting a response using the determined range in response to the data request.

It is noted that in step h) , it is the index, and not the numeric time series, that is processed to determine the range that corresponds to the data request. The above steps g) , h) and i) may therefore also be collectively referred to as "performing an index-based range search". Likewise, steps a) to f) may be referred to as "building an index adapted for an index-based range search".

By performing an index-based range search by processing the index rather than performing a direct or linear range search by processing the numeric time series, it may be favorably possible to significantly reduce the processing time for determining the range in response to the data request.

Further, if the numeric time series is stored in the external storage, like a data warehouse or a cloud, since the index is stored in the local storage unit, costly data traffic to the data warehouse or to the cloud may be avoided. Therefore, a cost for responding to the data request may be advantageously reduced .

Specifically, the method may be a computer-implemented method. In particular, the method may be carried out using the computerized device which may include said local storage unit and one or more processing units, such as one or more CPUs. The local storage unit may include a hard disk, solid state disk, RAID storage, and the like.

The index may be a numeric index. The numeric index may be adapted to provide a response to a search request including an amplitude range criterion and/or a time range criterion. Advantageously, the index may comprise information necessary and sufficient to determine the range for which the numeric time series is known to match the range criterion with a predetermined precision. Therefore, advantageously, step h) may be carried out with the predetermined precision without having to process the numeric time series.

The index may favorably require less storage space than the numeric time series itself. Thereby, processing the index in step h) may require less time than processing the numeric time series. The index may thus be regarded as a compressed representation of the numeric time series. The compressed index may be advantageously stored in the local storage unit rather than in the external storage, like a cloud or data warehouse .

A reading may be a value originally provided by one of the sensors installed in an industrial facility, such as a temperature, a pressure, a power output or load, etc. For example, the sensor is a gas turbine sensor and the time series includes pairs of sensed pressure values and associated time stamps. The time stamp may have a milliseconds precision and may be represented by a 64-bit value, e.g. int64. The value may be a real number, e.g. representing the physical pressure parameter, represented by a 32-bit float number, e.g. float 32. Also int8, intl6, int32, int64, float32 and float64 may be used.

The numeric time series may be a non-equidistant numeric time series or an equidistant numeric time series.

In step a) , the numeric time series may be acquired by directly receiving the plurality of readings from the sensors installed in an industrial facility and storing, in the time series, a respective time of reception of each reading as the timestamp associated with the reading (value) .

In steps b) and c) , the acquired time series is split into data blocks and data segments. In particular, the acquired time series is divided into said data blocks by time. Then, the data blocks are grouped into said data segments which may be limited by time duration, and thus have a limited number of data blocks.

In particular, the steps a) to d) are executed repeatedly for different portions of the numeric time series; and the processing step e) to derive the index includes creating the index upon a first execution of the steps a) to d) and includes updating the index upon each subsequent execution of the steps a) to d) .

In particular, in step d) , the data segments including said data blocks are stored into the external storage which may be a cloud or cloud system, for example. Storing said time series data in the form of said data segments in a cloud storage service provides a cost-efficiency solution. Moreover, high- availability and scalability of storing said data segments is achieved by respective qualities of the backing cloud storage service.

Thereby, the time series may be acquired portion-by-portion, and an amount of local storage space at the computerized device required for providing the index for each of the sensors may be reduced. Likewise, easy and efficient updates may be possible when further portions/ further readings are added to the numeric time series over time. In particular, such updates may be advantageously performed without repeating acquisition of the already acquired portions, thereby reducing an amount of data to be transferred for each update.

In step f) , the index for each sensor is advantageously stored in the local storage unit, such as a hard disk of a local workstation, even in a case where the actual numeric time series comprises Big Data that may only be stored in the external storage, like a data warehouse or a cloud.

In step g) , the data request may be received through an input unit, such as a keyboard, connected to a computerized device carrying out the method. Alternatively, the data request may be received via network from another computerized device by a service endpoint. The data request may include a logical expression formed by one or more range criteria and, particularly, one or more logical operators. Thus, the method may advantageously support complex and sophisticated search requests .

A logical operator may be a Boolean operator such as logical "AND", logical "OR" or logical "NOT". For example, complex search requests correlating different readings from different numeric time series may be made possible, such as "temperature greater than 500 degrees Celsius AND power lower than 300 Megawatts") .

In step h) , processing the index may refer to accessing the index based on the range criterion. For example, the range criterion, or an upper or lower boundary comprised therein, may be used in a keyword-like manner to gain fast access to portions of the index containing information about a matching range .

In step i), the determined range may be output by displaying a human readable representation of the determined range on a display device. The determined range may also be output by transmitting a digital representation of the determined range via a wired or wireless network.

It is noted that determining a matching range and outputting the matching range may also comprise determining a plurality of matching ranges and outputting the plurality of matching ranges .

In particular, the index or indices stored in the local storage unit provides fast random-access which allows searching and extracting the data even without accessing the external storage. In particular, a fast value range search is provided over a relatively small number of data segments and data blocks in the index locally stored at the local storage unit. Further, fast time series data extraction is provided using information, in particular the metadata, about data segments and data blocks read only specific parts of the data files instead of the whole files.

According to an embodiment, in step b) , the readings and the timestamps are converted into the respective data blocks in a differential manner.

In particular, in step b) , differences of a current reading and a preceding reading and differences of a current timestamp and a preceding timestamp are stored into the respective data block. Therefore, memory space is saved in the computerized device advantageously.

According to a further embodiment, after step b) and before step c) , the data blocks are compressed using a certain compression scheme.

In particular, the compression scheme is based on or corresponds to the Zstandard (see reference [10]). According to a further embodiment, the block information of the block index of a certain one of the data blocks includes: a minimum value of all the readings contained in the certain data block,

a maximum value of all the readings contained in the certain data block,

a data size indicator indicating a data size of the certain data block, and

an element number indicating a number of the readings contained in the certain data block.

According to a further embodiment, the segment information of a segment index of a certain one of the data segments includes :

a segment identification for identifying the certain data segment,

an array containing all the block indices of the data blocks contained in the certain data segment,

a start time of the certain data segment,

a minimum value of the all the readings contained in the data blocks of the certain data segment, and

a maximum value of the all the readings contained in the data blocks of the certain data segment.

In particular, the start time of the certain data segment corresponds to the oldest one of the timestamps contained in the data blocks of the certain data segment.

According to a further embodiment, the steps a) to f) are executed for each of the sensors.

According to a further embodiment, the data request is a data search request including a sensor identification identifying the certain sensor and an amplitude range criterion specifying an amplitude range within readings of the numeric time series are required to match the amplitude range criterion. An amplitude range criterion may be a criterion that specifies a range (value range, amplitude range) within which readings are required to fall for the time series to match the amplitude range criterion. Examples of the amplitude range criterion are criteria such as "between 3000 and 4000 rotations per minute" or "more than 300 megawatts", "less than 300 degrees Celsius". In other words, the amplitude range criterion may specify at least one of a lower and an upper boundary for readings that match the criterion.

The data search request may constitute a digital representation of the amplitude range criterion.

The fact that "the numeric time series is known to match the amplitude range criterion" for the time range determined in step h) (which is also referred to as "matching time range" hereinbelow) may refer to the fact that, based on the index, it is known that the numeric time series comprises at least one reading (also referred to as "matching reading" hereinbelow) that is within the amplitude range specified by the amplitude range criterion and is associated with a time that is within the matching time range.

In particular, the time range determined in step h) is a time range that includes a time for which the numeric time series is known to include at least one reading within an amplitude range specified by the amplitude range criterion, and excludes a time for which the numeric time series is known not to include any reading within the specified amplitude range.

According to a further embodiment, the method includes:

scanning the index allocated to the certain sensor and stored in the local storage unit for determining the data segments satisfying the amplitude range criterion,

identifying data blocks satisfying the amplitude range criterion in the determined data segments,

calculating a start time and an end time for each of the identified data blocks based on the timestamps included in the respective data block, and outputting the response including a time range based on the calculated start times and the calculated end times of all the identified data blocks.

The determined data segments are those data segments that are determined to satisfy the amplitude range criterion. Moreover, the identified data blocks are those data blocks of the determined data segments that are identified to satisfy the amplitude range criterion.

According to a further embodiment, the data request is a data extraction request for extracting data, wherein the data extraction request includes a sensor identification identifying the certain sensor and a time range criterion, wherein the time range criterion includes a start time indicating a time of an oldest reading to be extracted, an end time indicating a time of a newest reading to be extracted and a resolution information indicating a time resolution for the data to be extracted.

According to a further embodiment, in a first case, if the resolution information of the data extraction request indicates a lower or equal data density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the certain sensor, only the local storage unit is accessed for answering the data extraction request, and in a second case, if the resolution information of the data extraction request indicates a higher density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the certain sensor, the external storage is accessed for answering the data extraction request.

In particular, the data size indicators of the data blocks are referenced by the arrays of the data segments which are themselves referenced by the segment indices of the index of the certain sensor.

According to a further embodiment, in the first case, the minimum values and the maximum values of the data blocks referenced by the segment indices of the index allocated to the certain sensor are interpolated for providing extracted data, wherein the response including the extracted data is output .

According to a further embodiment, in the second case, the index stored in the local storage unit is used for providing extraction metadata including a data file name of a data file stored in the external storage, an offset in this data file and a data length of the data to be extracted, wherein the external storage is accessed using the provided extraction metadata for providing extracted data, wherein the response including the extracted data is output.

According to a further embodiment, the method further includes discarding the acquired time series at the computerized device .

Since the acquired time series may be not used at the computerized device, it may be favorably discarded after the building of the index in steps a) to f) has been completed, so as to reduce an amount of storage space required at the computerized device.

Any embodiment of the first aspect may be combined with any embodiment of the first aspect to obtain another embodiment of the first aspect.

According to a second aspect, a computer program product comprises a program code for executing the above-described method for performing a range search in numeric time series data, when run on at least one computer.

A computer program product, such as a computer program means, may be embodied as a memory card, USB stick, CD-ROM, DVD or as a file which may be downloaded from a server in a network. For example, such a file may be provided by transferring the file comprising the computer program product from a wireless communication network.

According to a third aspect, a computerized device for processing numeric time series data from a number of sensors is proposed. The computerized device has a local storage unit and further: a first entity for acquiring, from each of the sensors, a numeric time series including a plurality of readings and associated timestamps,

a second entity for converting the acquired time series into data blocks having a certain binary format,

a third entity for combining the data blocks into data segments, wherein each of the data segments includes a plurality of the data blocks,

a fourth entity for storing the data segments into an external storage being external to the computerized device,

a fifth entity for processing, for each of said sensors, the acquired numeric time series to derive an index allocated to the sensor, the derived index including segment indices and referenced block indices, wherein, for each of the data blocks, one block index including block information suitable for an index-based range search is created, and, for each of the data segments, one segment index including segment information suitable for the indexed-based range search is created,

a sixth entity for storing the index in the local storage unit ,

a seventh entity for receiving a data request including a range criterion for a certain sensor of the sensors,

an eighth entity for processing the index allocated to the certain sensor to determine a range for which the numeric time series is known to match the range criterion (in the local storage unit and/or in the external storage) , and

a ninth entity for outputting a response using the determined range in response to the data request.

The embodiments and features described with reference to the method of the present invention apply mutatis mutandis to the computerized device of the present invention. Specifically, the computerized device of the present invention may be implemented to carry out the method of the present invention. The respective entity, e.g. the first to ninth entity, may be implemented in hardware and/or in software. If said entity is implemented in hardware, it may be embodied as a device, e.g. as a computer or as a processor or as a part of a system, e.g. a computer system. If said entity is implemented in software it may be embodied as a computer program product, as a function, as a routine, as a program code or as an executable obj ect .

According to a fourth aspect, a system is suggested comprising a number of sensors installed in an industrial facility and a computerized device of the third aspect for processing numeric time series data from the sensors.

The system may provide a data search function and a data extraction function. In this regard, the computerized device of said system may include a search engine for providing the data search function and an extraction engine for providing the data extraction function. Further, the system may include a service endpoint for communicating with a number of clients or requesting entities transmitting a data request to the system and awaiting a response to said data request.

For example, the search engine may be adapted to scan the index stored in the local storage unit of the computerized device for determining the data segments to be extracted. Moreover, the search engine may calculate the start time and the end time for each of the identified data blocks contained in the identified data segments. This calculation is based on the timestamps included in the respective data blocks. Then, the search engine may create a response including a time-range based on the calculated start times and the calculated end times of all the identified data blocks. The search engine may forward the created response to the service endpoint which transmits said response to requesting client.

In the case of a data extraction request, the extraction engine may selectively access the local storage unit or the local storage unit and the external storage. Further details are described below. Moreover, the system may use a master node including said local storage unit storing the indices and a number of slave units for answering and processing the data requests. In case of extraction from the external storage, the master node or one of the slave nodes may access the external storage .

For example, if the requested resolution of the received data extraction request indicates a higher density as indicated by the data size indicators of the data blocks referenced by the segment indices of the index allocated to the requested sensor, the external storage may be accessed by the extraction engine for answering the data extraction request. In this case, the index stored in local storage of the master unit may be used for providing extraction metadata. Said extraction metadata may be directed to a slave node which is adapted to access the external storage using the extraction metadata. The extraction metadata may include a data file, the name of the data file stored in the external storage, an offset in this data file and a data length of the data to be extracted.

Further possible implementations or alternative solutions of the invention also encompass combinations - that are not explicitly mentioned herein - of features described above or below with regard to the embodiments. The person skilled in the art may also add individual or isolated aspects and features to the most basic form of the invention.

Further embodiments, features and advantages of the present invention will become apparent from the subsequent description and dependent claims, taken in conjunction with the accompanying drawings, in which:

Fig. 1 shows a flow chart illustrating steps of a method according to an exemplary embodiment;

Fig. 2 shows a block diagram of an embodiment a of computerized device according to the exemplary embodiment;

Fig. 3 shows a diagram illustrating one example of a numeric time series;

Fig. 4 illustrates an example for data blocks converted from the acquired time series illustrated in Fig. 3; Fig. 5 illustrates an example for data segments combined from the data blocks illustrated in Fig. 4;

Fig. 6 shows a schematic example of block information of a block index for a data block illustrated in Fig. 4;

Fig. 7 shows a schematic example of segment information of a segment index for a data segment illustrated in Fig. 5;

Fig. 8 shows a schematic example of an index for a certain sensor including segment indices of Fig. 7 and referenced block indices of Fig. 6; and

Fig. 9 shows a block diagram of an embodiment a system providing both functions, i.e. data search function and a data extraction function.

In the Figures, like reference numerals designate like or functionally equivalent elements, unless otherwise indicated.

Fig. 1 shows a flow chart illustrating steps of a method, and Fig. 2 shows a block diagram of a computerized device 1 according to an exemplary embodiment. Reference will now be made to Fig. 1 and Fig. 2.

The computerized device 1 of Fig. 2 is connected to an industrial facility 5 in which a number of sensors 2 is arranged. Without loss of generality, Fig. 2 shows one sensor 2. Further, the computerized device 1 of Fig. 2 is coupled to an external storage 4, for example to a cloud or to a cloud service .

The computerized device 1 includes a local storage unit 3 for storing data, a first entity 10, a second entity 20, a third entity 30, a fourth entity 40, a fifth entity 50, a sixth entity 60, a seventh entity 70, an eighth entity 80 and a ninth entity 90.

Said entities 10 - 90 are adapted to execute the method steps S10 - S90 of Fig. 1.

In step S10, a numeric time series S including a plurality of readings v and associate timestamps TS is acquired from each of the sensors 2. This step S10 may be executed by said first entity 10 of Fig. 2. In this regard, Fig. 3 shows a diagram illustrating one example of a numeric time series S. Here, the x-axis of Fig. 3 shows the time t with the time intervals At. The y-axis of Fig. 3 shows an amplitude, in the example of Fig. 3 a temperature in the industrial facility 5.

In step S20, the acquired time series S is converted into data blocks DB having a certain binary format. Here, Fig. 4 shows an example for data blocks DB converted from the acquired time series S illustrated in Fig. 3. Each of the N data blocks DB 1 - DB N has a plurality of readings v and associate timestamps TS. In particular, in step S20, the readings v and timestamps TS are converted into the respective data block DB in a differential manner. Furthermore, the data blocks DB may be compressed using a certain compression scheme.

In step S30, the data blocks DB are combined into data segments DS. Here, Fig. 5 illustrates an example for data segments DS combined from the data blocks DB illustrated in Fig. 4. As shown in Fig. 5, each of the data segments DS includes a plurality of said data blocks DB.

In step S40, the data segments DS are stored into the external storage 4. With reference to Fig. 2, the fourth entity 40 may be adapted to provide said storing of the data segments DS into said external storage 4.

In step S50, the acquired numeric time series S is processed, for each of said sensors 2 to derive an index I allocated to the sensor 2. The derived index I includes segment indices SI and referenced block indices BI, referenced by said segment indices SI. In this regard, for each of the data blocks DB, one block index BI including block information suitable for an index-based range search is created, and, for each of the data segments DS, one segment index SI including segment information suitable for the index-based range search is created.

Details for creating said index I, said block indices BI and said segment indices SI are described with reference to Figs. 6 - 8 in the following. In this regard, Fig. 6 shows a schematic example of block information of a block index BI for a data block DB as illustrated in Fig. 4. Fig. 7 shows a schematic example of segment information of a segment index SI for a data segment DS as illustrated in Fig. 5 and, furthermore, Fig. 8 shows a schematic example of an index I for certain sensor 2 including segment indices SI of Fig. 7 and reference block indices BI of Fig. 6.

Moreover, with reference to Fig. 6, the block information of the block index BI for a certain data block DB includes a minimum value MIN of all the readings v contained in the certain data block DB, a maximum value MAX of all the readings v contained in the certain data block DB, a data size indicator SZ indicating a data size of the certain data block DB and an number NUM of elements indicating a number of the readings v contained in the certain data block DB.

Further, with reference to Fig. 7, the segment information of a segment index SI of a certain data segment DS includes a segment identification GID for identifying the certain data segment DS, an array A(BI) containing all the block indices BI of the data blocks DB contained in the certain data segment DS, a start time ST of the certain segment DS, a minimum value MIN of all the readings v contained in the data blocks DB of the certain data segment DS and a maximum value MAX of all the readings v contained in the data blocks DB of the certain data segment DS .

In step S60, the created index I for the certain sensor is stored in the local storage unit 3 of the computerized device 1. This step S60 may be executed by the sixth entity 60 of

Fig. 2.

In step S70, a data request DR, for example a data search request or a data extraction request, including a range criterion for a certain sensor of the sensors 2 is received. With reference to Fig. 2, the seventh entity 70 may receive said data request DR and forward it to the eighth entity 80.

In step S80, the eighth entity 80 may process the index I allocated to the certain sensor 2 by assessing said local storage unit 3 to determine a range for which the numeric time series S is known to match the range criterion. As discussed in detail below, said processing of step S80 may include to firstly access the local storage unit 3 and to further access said external storage 4.

In step S90, a response R is output in response to the data request DR. Said response R is created using said determined range .

As indicated above, said data request DR may be a data search request or a data extraction request. In this regard, Fig. 9 shows a system 100 providing both functions, i.e. data search and data extraction. The system 100 of Fig. 9 includes a computerized device 1. Said computerized device 1 provides all the functionality as described with reference to Fig. 2. In particular, said computerized device 1 of Fig. 9 is adapted to create an index I for each of the sensors 2 and to store said created indices I in its local storage unit 3. This functionality is illustrated by circle 1 in Fig. 9.

Moreover, the computerized device 1 comprises a search engine 7 and an extraction engine 8.

Moreover, the system 100 includes a service endpoint 6 which is connectable to a client 9, for example a laptop or a PC. In the example of Fig. 9, the client 9 sends a data request DR, i.e. a data search request or a data extraction request, to the system 100 and awaits a response R to the data request DR. The functionality for communicating with the client 9 is provided by said service endpoint 6 of the system 100 of Fig. 9.

For the example, the data request DR is a data search request, said data search request includes a sensor identification identifying the certain sensor 2 for which the data has to be extracted and an amplitude range criterion specifying an amplitude range within readings v of the numeric time series S are required to match the amplitude range criterion.

For said example, the data request is a data search request, the service endpoint 6 forwards the received data search request to the search engine 7. The search engine 7 is adapted to scan the index I allocated to the certain sensor 2 and stored in the local storage unit 3 for determining the data segments DS satisfying the amplitude range criterion. Further, data blocks DB are identified which satisfy the amplitude range criterion in the determined data segments DS . Moreover, the search engine 7 calculates a start time and an end time for each of the identified data blocks DB based on the timestamps TS included in the respective data block DB (see Fig. 4) .

Moreover, the search engine 7 creates a response R including a time range based on the calculated start times and the calculated end times of all the identified data blocks DB. The search engine 7 forwards the created response R to the service endpoint 6 which transmits said response R to the requesting client 9.

Furthermore, if the data request DR is a data extraction request, the receiving service endpoint 6 forwards the received data extraction request DR to the extraction engine 8. The data extraction request DR includes a sensor identification identifying the certain sensor 2 and a time range criterion. The time range criterion includes a start time indicating a time of an oldest reading v to be extracted, an end time indicating a time of a newest reading v to be extracted and a resolution information indicating a time resolution for the data to be extracted.

In a first case, if the resolution information of the data extraction request DR indicates a lower or equal data density as indicated by the data size indicators SZ of the data blocks DB referenced by the segment indices SI of the index I allocated to the certain sensor 2, only the local storage 3 is accessed by the extraction engine 8 for answering the data extraction request DR. This is shown by a circle 2 in Fig. 9.

In the first case, the extraction engine 8 interpolates the minimum values MIN and the maximum values MAX of the data blocks DB referenced by the segment indices SI of the index I allocated to said certain sensor for providing extracted data. The extracted data are part of the response R which is transmitted to the requesting client 9.

In a second case, if the resolution information of the data extraction request DR indicates a higher density as indicated by the data size indicators SZ of the data blocks DB referenced by the segment indices SI of the index I allocated to said certain sensor 2, the external storage 3 is accessed by the extraction engine 8 for answering the data extraction request. This is shown by circle 3 in Fig. 9.

In the second case, the index I stored in the local storage 3 is used by the extraction engine 8 for providing extraction metadata. Said extraction metadata may include a data file, the name of a data file stored in the external storage 4, an offset in this data file and a data length of the data to be extracted. Then, the extraction engine 8 accesses the external storage 4 using the extraction metadata for providing extracted data. The extracted data provided by the external storage 4 is then transmitted to the requesting client 9.

Although the present invention has been described in accordance with an exemplary embodiment, an exemplary use case and preferred variants thereof, it is obvious for the person skilled in the art that modifications are possible in all embodiments, use cases and variants.

The exemplary embodiment and its variants mainly referred to temperature readings, however, the proposed method and computerized device may be used with any kind of readings, such as pressure readings, power readings and any other analog or discrete readings, signals and the like. Reference Numerals:

S10-S90 method steps

1 computerized device 2 sensor

3 local storage unit

4 external storage

5 industrial facility-

6 service endpoint

7 search engine

8 extraction engine

9 client

10 first entity

20 second entity

30 third entity

40 fourth entity

50 fifth entity

60 sixth entity

70 seventh entity

80 eighth entity

90 ninth entity

A(BI) array of block indices A (SI ) array of segment indices BI block index

BTI block time interval DB data block DR data request

I index

MIN minimum value

MAX maximum value

NUM number of element R response

SI segment index

ST start time

SZ data size indicator SID sensor ID

STI segment time interval t time

At time intervals

TS timestamp

v value, reading

References :

[1] Yang, Fangjin, et al. "Druid: A real-time analytical data store." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014

[2] Benchmarking Druid [Electronic resource] . URL:

http: / /druid. io/blog/2014/03/17/benchmarkingdruid.html

[3] Apache Cassandra NoSQL Performance Benchmarks

[Electronic resource] . URL: https : / /academy. datastax. com/planet-cassandra/nosql- performance-benchmarks

[4] Apache Ignite and Apache Cassandra Benchmarks: The

Power of In-Memory Computing [Electronic resource] . URL: https : //dzone . com/articles/apachereg-ignite-and- apachereg-cassandrabenchmark

[5] Time series Database Benchmarks [Electronic resource] .

URL : https : / /blog . outIyer . com/timeseries-database- benchmarks

[6] Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark [Electronic resource] . URL : https : //www . percona . com/blog/2017/03/17 /column- store-databasebenchmarks-mariadb-columnstore-vs- clickhouse-vs-apache-spark/

[7] ClickHouse vs Amazon RedShift Benchmark [Electronic resource] . URL: https : / /www. altinity . com/blog/2017/ 6/20/clickhouse-vs- redshift

[8] Distinctive Features of ClickHouse [Electronic resource] . URL: https : //clickhouse . yandex/docs/en/introduction/distinct ive features/ [9] Interactive Real-Time Visualization for Streaming Data [Electronic resource] . URL: http : //openproceedings . org/2017/conf/edbt /paper-276. pdf

[10] Zstandard, a real-time compression algorithm

[Electronic resource] . URL: https : / /facebook. github. io/zstd/

[11] AWS Documentation, Amazon Simple Storage Service (S3),

API Reference, Operations on Objects, GET Object [Electronic resource] . URL: https : //docs . aws . amazon . com/AmazonS3 /latest /API /RESTObj ectGET . html

Claims

Patent claims

1. A method for processing numeric time series data from a number of sensors (2) by a computerized device (1) including a local storage unit (3) , the method comprising:

a) acquiring (S10) , from each of the sensors (2), a numeric time series (S) including a plurality of readings (v) and associated timestamps (TS) ,

b) converting (S20) the acquired time series (S) into data blocks (DB) having a certain binary format,

c) combining (S30) the data blocks (DB) into data segments (DS) , wherein each of the data segments (DS) includes a plurality of the data blocks (DB) ,

d) storing (S40) the data segments (DS) into an external storage (4) being external to the computerized device (1), e) processing (S50), for each of said sensors (2), the acquired numeric time series (S) to derive an index (I) allocated to the sensor (2), the derived index (I) including segment indices (SI) and referenced block indices (BI) , wherein, for each of the data blocks (DB), one block index (BI) including block information suitable for an index-based range search is created, and, for each of the data segments (DS) , one segment index (SI) including segment information suitable for the indexed-based range search is created,

f) storing (S60) the index (I) in the local storage unit (3) , g) receiving (S70) a data request (DR) including a range criterion for a certain sensor of the sensors (2),

h) processing (S80) the index (I) allocated to the certain sensor (2) to determine a range for which the numeric time series (S) is known to match the range criterion, and

i) outputting (S90) a response (R) using the determined range in response to the data request (DR) .

2. The method of claim 1, characterized in that, in step b) , the readings (v) and the timestamps (TS) are converted into the respective data blocks (DB) in a differential manner.

3. The method of claim 1 or 2, characterized in that, after step b) and before step c) , the data blocks (DB) are compressed using a certain compression scheme.

4. The method of any of claims 1 to 3, characterized in that the block information of the block index (BI) of a certain one of the data blocks (DB) includes

a minimum value (MIN) of all the readings (v) contained in the certain data block (DB) ,

a maximum value (MAX) of all the readings (v) contained in the certain data block (DB) ,

a data size indicator (SZ) indicating a data size of the certain data block (DB) , and

an element number (NUM) indicating a number of the readings (v) contained in the certain data block (DB) .

5. The method of any claims 1 to 4, characterized in that the segment information of a segment index (SI) of a certain one of the data segments (DS) includes

a segment identification (GID) for identifying the certain data segment (DS),

an array (A(BI)) containing all the block indices (BI) of the data blocks (DB) contained in the certain data segment (DS) ,

a start time (ST) of the certain data segment (DS) , a minimum value (MIN) of the all the readings (v) contained in the data blocks (DB) of the certain data segment (DS) , and

a maximum value (MAX) of the all the readings (v) contained in the data blocks (DB) of the certain data segment (DS) .

6. The method of any of claims 1 to 5, characterized in that the steps a) to f) are executed for each of the sensors (2).

7. The method of any of claims 1 to 6, characterized in that the data request (DR) is a data search request including a sensor identification identifying the certain sensor and an amplitude range criterion specifying an amplitude range within readings (v) of the numeric time series (S) that are required to match the amplitude range criterion.

8. The method of claim 7, characterized by:

scanning the index (I) allocated to the certain sensor (2) and stored in the local storage unit (3) for determining the data segments (DS) satisfying the amplitude range criterion, identifying data blocks (DB) satisfying the amplitude range criterion in the determined data segments (DS) ,

calculating a start time and an end time for each of the identified data blocks (DB) based on the timestamps (TS) included in the respective data block (DB), and

outputting the response (R) including a time range based on the calculated start times and the calculated end times of all the identified data blocks (DB) .

9. The method of any of claims 1 to 6, characterized in that the data request (DR) is a data extraction request for extracting data, wherein the data extraction request includes a sensor identification identifying the certain sensor (2) and a time range criterion, wherein the time range criterion includes a start time indicating a time of an oldest reading (v) to be extracted, an end time indicating a time of a newest reading (v) to be extracted and a resolution information indicating a time resolution for the data to be extracted.

10. The method of claim 9, characterized in

that in a first case, if the resolution information of the data extraction request indicates a lower or equal data density as indicated by the data size indicators (SZ) of the data blocks (DB) referenced by the segment indices (SI) of the index (I) allocated to the certain sensor (2), only the local storage unit (3) is accessed for answering the data extraction request ,

and in a second case, if the resolution information of the data extraction request indicates a higher density as indicated by the data size indicators (S3) of the data blocks (DB) referenced by the segment indices (SI) of the index (I) allocated to the certain sensor (2), the external storage (3) is accessed for answering the data extraction request.

11. The method of claim 10, characterized in that, in the first case, the minimum values (MIN) and the maximum values (MAX) of the data blocks (DB) referenced by the segment indices (SI) of the index (I) allocated to the certain sensor (2) are interpolated for providing extracted data, wherein the response (R) including the extracted data is output.

12. The method of claim 10 or 11, characterized in

that, in the second case, the index (I) stored in the local storage unit (3) is used for providing extraction metadata including a data file name of a data file stored in the external storage (4), an offset in this data file and a data length of the data to be extracted,

wherein the external storage (4) is accessed using the provided extraction metadata for providing extracted data, wherein the response (R) including the extracted data is output .

13. A computer program product comprising a program code for executing the method of any of claims 1 to 12 when run on at least one computer.

14. A computerized device (1) for processing numeric time series data from a number of sensors (2) , the computerized device (1) comprising a local storage unit (3) and further comprising :

a first entity (10) for acquiring, from each of the sensors (2), a numeric time series (S) including a plurality of readings (v) and associated timestamps (TS) , a second entity (20) for converting the acquired time series (S) into data blocks (DB) having a certain binary format,

a third entity (30) for combining the data blocks (DB) into data segments (DS) , wherein each of the data segments (DS) includes a plurality of the data blocks (DB) ,

a fourth entity (40) for storing the data segments (DS) into an external storage (4) being external to the computerized device (1),

a fifth entity (50) for processing, for each of said sensors (2) , the acquired numeric time series (S) to derive an index (I) allocated to the sensor (2), the derived index (I) including segment indices (SI) and referenced block indices (BI), wherein, for each of the data blocks (DB) , one block index (BI) including block information suitable for an index- based range search is created, and, for each of the data segments (DS) , one segment index (SI) including segment information suitable for the indexed-based range search is created,

a sixth entity (60) for storing the index (I) in the local storage unit ( 3 ) ,

a seventh entity (70) for receiving a data request (DR) including a range criterion for a certain sensor of the sensors ( 2 ) ,

an eighth entity (80) for processing the index (I) allocated to the certain sensor (2) to determine a range for which the numeric time series (S) is known to match the range criterion, and

a ninth entity (90) for outputting a response (R) using the determined range in response to the data request (DR) .

15. A system comprising a number of sensors installed in an industrial facility (5) and a computerized device (1) of claim 14 for processing numeric time series data from the sensors (2) .