WO2020171729A1

WO2020171729A1 - Method and computerized device for performing a range search based on numeric time series data

Info

Publication number: WO2020171729A1
Application number: PCT/RU2019/000111
Authority: WO
Inventors: Yury Vladimirovich KUZNETCOV; Uwe Pfeifer
Original assignee: Siemens Aktiengesellschaft
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2020-08-27

Abstract

A method for performing a range search based on numeric time series data comprises: acquiring a numeric time series including a plurality of discrete readings of a physical quantity associated with time; processing the acquired numeric time series to derive an index therefrom; storing the index in the storage unit; receiving a search request including an amplitude range criterion; accessing the index to determine a time range within which the physical quantity has matched the amplitude range criterion; and outputting the determined time range in response to the search request. Advantages include the ability to determine a matching time range even if none of the discrete readings matches the amplitude range criterion; smaller index size, and faster response times. A corresponding computerized device and computer program product are also proposed.

Description

METHOD AND COMPUTERIZED DEVICE FOR PERFORMING A RANGE SEARCH BASED ON NUMERIC TIME SERIES DATA

The present invention relates to the field of industrial Big Data applications and, particularly, to a method and a computerized device for performing a range search based on numeric time series data acquired in an industrial facility.

Industrial facilities such as power plants are equipped with sensors supplying readings of a physical quantity such as pressure or temperature. The readings are stored as a time series of discrete readings for later analysis. The amount of data stored for an industrial facility is approaching the petabyte level. Such data is typically stored in a cloud.

There is a need to analyze the numeric time series data for fault diagnosis, operation monitoring, predictive maintenance and similar purposes. During analysis, an expert user may need to quickly identify a time interval during which the physical quantity is known to have been within a given amplitude range.

A linear scan through all data to identify applicable time intervals is costly in terms of data traffic, CPU traffic and takes too long to be of practical use. Known fast search mechanisms such as Google® search are adapted to search in alphanumeric data rather than numeric time series data.

PCT/RU2018/000373 discloses a method and device for performing an index-based amplitude range search within numeric time series data. However, there is a need for a further reduction of the size of the index. Further, it is desirable to determine also a time range in which the physical quantity has passed through the searched amplitude range, but no discrete reading has been recorded within the searched amplitude range.

It is therefore one object of the present invention to provide an improved computerized device and method for performing a range search in numeric time series data. Accordingly, a method for performing, using a computerized device comprising at least a processing unit and a storage unit, a range search based on numeric time series data is proposed. The method comprises: a) acquiring, at least temporarily, a numeric time series including a plurality of discrete readings of a physical quantity associated with time; b) processing the acquired numeric time series to derive an index from the acquired numeric time series; c) storing the index in the storage unit; d) receiving a search request including an amplitude range criterion; e) accessing the index stored in the storage unit to determine a time range within which the physical quantity has matched the amplitude range criterion; and f) outputting the determined time range in response to the search request.

The above steps d) , e) and f) may be referred to as "performing an index-based range search". Likewise, steps a), b) and c) may be referred to as "building an index adapted for an index-based range search".

A "time range, within which the physical quantity has matched the amplitude range criterion" may also be referred to as "a time range, for which it is known, determined and/or assumed, based on information included in the index, that the physical quantity has matched the amplitude criterion"), and will be referred to as a "matching time range" for brevity.

By performing the index-based range search, an amount of processing for determining the matching time range in response to the range search request may be significantly reduced.

Furthermore, steps b) and/or e) may leverage an assumption about a behavior of the physical quantity in between any two discrete readings. In particular, the assumption may relate to a steadiness of the physical quantity. Thereby, step e) may determine a time range within which not necessarily any of the discrete readings, from which the index has been derived in step b) , matches the amplitude range criterion, but for which it is possible to determine, based on the index and the assumption, that the physical quantity has matched the amplitude range criterion. That is, it may be beneficially possible to determine a time range in which the physical quantity has passed through the searched amplitude range but no discrete reading within the searched amplitude range has been recorded.

Thus, a response to the search request may be more accurate.

Furthermore, through leveraging the above-mentioned assumption, an amount of information included in the index may be reduced. That is, the size of the index may be reduced. This may result in a further improvement of the speed of the index-based range search.

The proposed method may be a computer-implemented method. In particular, the method may be carried out using at least one computerized device, which may include one or more processing units, such as one or more CPUs, and one or more storage units, such as a hard disk, solid-state disk, RAID storage, network-attached storage and the like.

The index may be a numeric index specifically adapted to provide a response to a search request including an amplitude range criterion. Advantageously, the index may include information necessary and sufficient to determine the matching time range with a predetermined precision without requiring access the numeric time series.

The index may favorably require less storage space than the numeric time series itself. More particularly, the index that allows determining the matching time range with a predetermined precision may favorably require less storage space than a comparative index that allows determining the matching time range with the same predetermined precision and constitutes a discrete quantization of the numeric time series in a time-amplitude space. Preferably, the index may be reduced to a size that fits into a RAM of the computerized device, thereby significantly improving the speed of the range search .

A discrete reading may be a value acquired from a sensor installed in an industrial facility. The discrete reading may be indicative of the physical quantity (of a value or amplitude of the physical quantity) . The physical quantity may be a temperature, a pressure, a power output or load, or the like .

The numeric time series may be an equidistant or a non- equidistant series of discrete readings. In the numeric time series, each discrete reading may be stored in association with a time at which the respective discrete reading was acquired. Alternatively, the acquisition time may be derivable through calculations based on a position of the respective discrete reading in the numeric time series.

An amplitude range criterion may be a criterion that specifies a range (value range or amplitude range/interval) of the physical quantity. Examples of the amplitude range criterion are "between 3000 and 4000 rotations per minute" or "more than 300 megawatts", "less than 300 degrees Celsius". In other words, the amplitude range criterion may specify at least one of a lower and an upper boundary for the physical quantity.

The search request may constitute a representation of the amplitude range criterion, such as a digital or a machine- readable representation.

In step a) , the numeric time series may be acquired from a sensor installed in an industrial facility or the like and/or may be acquired by reading from a storage medium such as a hard drive, data lake, data warehouse or a cloud.

In step b) , processing the acquired numeric time series may include traversing, crawling, or processing the numeric time series reading by reading, and creating and/or updating the index, based on each of the readings, such that the index comprises information necessary and sufficient to provide a response to an amplitude range search request with the predetermined precision.

In step d) , the search request may be received through an input unit, such as a keyboard, connected to the computerized device. Alternatively, the search request may be received via a network from another computerized device. In step e) , accessing the index may refer to accessing and/or processing the index based on the amplitude range criterion. For example, the method may traverse or iterate through the index to identify one or more matching time ranges. Said iteration may be a linear traversal or may follow a binary search scheme or the like.

More particularly, the time range determined in step e) may be a time range that includes a time for which the physical quantity is known and/or assumed to have matched the amplitude range criterion, and excludes a time for which the physical quantity is known and/or assumed not to have matched the amplitude range criterion.

In other words, it may not be possible to determine an exact time range when the physical quantity has matched the amplitude range criterion; however, it may be possible to determine a matching time range with the predetermined precision .

In step f) , the determined time range may be output by displaying a human readable representation of the determined time range on a display device. The determined time range may also be output by transmitting a digital representation of the determined time range via a wired or wireless network.

It is noted that determining a matching time range and outputting the matching time range may also include determining a plurality of matching time ranges and outputting the plurality of matching time ranges.

According to an embodiment, step c) further includes discarding the acquired numeric time series.

Since the acquired numeric time series is not used when performing the index-based range search in steps d) , e) and f) , it may be favorably discarded after the building of the index in steps a) and b) has been completed, so as to reduce an amount of long-term storage space required by the proposed method.

According to a further embodiment, the index is a lossy index. "Lossy", herein, may refer to the fact that, when deriving the index, information which is required to determine the matching time range for any given search request is retained, whereas at least some information which would be required to determine each discrete reading of the numeric time series is not retained (is discarded or lost) .

An amount of lossiness may be adjusted so as correspond to the predetermined precision. Thereby, a tradeoff between speed of the index-based range search, storage space required for storing the index, and the predetermined precision of the response to the search request may be suitably adjusted.

It is noted that a receiving device that has received the search response may retrieve the actual discrete readings from the numeric time series. The receiving device may use the search response to access only those portions of the actual numeric time series that are of interest, i.e. only the discrete readings in the time range or time ranges included in the search response. Thereby, an amount of time spent, data transferred and/or computing power involved in retrieving the discrete readings of interest may be significantly reduced.

According to a further embodiment, the lossy index includes a lossy compressed representation of a quantization of the numeric time series into time intervals according to a predetermined time resolution and into amplitude intervals according to a predetermined amplitude resolution.

In particular, the quantization may be a quantization of the numeric time series in a time-amplitude space.

A respective quantization, herein, may refer to a result of quantizing (constraining, discrediting) the discrete readings and times included in the numeric time series into a set of discrete time interval bins and amplitude interval bins.

The predetermined time resolution and the predetermined amplitude resolution may define the predetermined precision of the response to the search request.

It may be appreciated that the quantization of the numeric time series involves a first level of loss of information, and that the lossy compression of said quantization may involve a second level of loss of information.

The lossy compression may be performed based on an assumed behavior of the physical quantity represented by the discrete readings, such as its steadiness or the like. For example, the lossy compression may discard information about certain time intervals and/or certain amplitude intervals.

Through subjecting the numeric time series to two levels of loss of information when deriving the index, an amount of storage space required to store the lossy index and/or an amount of computing time required to access and/or process the lossy index may be reduced yet further.

According to a further embodiment, the lossy compressed representation of the quantization of the numeric time series includes, for each time interval, an indication of a lowest amplitude interval and a highest amplitude interval within which the physical quantity has been during the respective time interval.

The lowest/highest amplitude interval may be the lowest/highest amplitude interval in which at least one discrete reading has been observed in the numeric time series during the respective time interval.

Herein, an assumption about a behavior of the physical quantity, such as its steadiness, may be leveraged to discard (lose; not store) information about an intermediate amplitude interval between the lowest and the highest amplitude interval .

An indication of a respective amplitude interval may include a number indicating one of a plurality of amplitude intervals of the quantization of the numeric time series.

That is, the lossy compressed representation of the quantization of the numeric time series may be a representation in which information about intermediate amplitude intervals is discarded.

Thereby, the size of the index may be advantageously reduced. According to a further embodiment, step b) includes: creating for each of the time intervals, a flat histogram including, for each of the amplitude intervals, a binary bin indicative of whether or not the numeric time series includes at least one discrete reading that is within the respective amplitude interval and is associated with a time within the respective time interval; compressing each flat histogram into a compressed flat histogram constituted by a number of the lowest binary bin and a number of the highest binary bin that is indicative of the numeric time series including at least one respective discrete reading that is within the respective amplitude interval and is associated with a time within the respective time interval; and forming the index from the plurality of compressed flat histograms.

In particular, it is noted that each binary bin may be associated with a corresponding time interval and a corresponding amplitude interval. Each flat histogram may be associated with a corresponding time interval and include binary bins for all amplitude intervals of the corresponding time interval.

In particular, a flat histogram is a histogram including binary bins. A binary bin is a bin having (being able to assume) one of two possible states.

In particular, a first state of a respective binary bin may indicate that during the corresponding time interval, the numeric time series does not include any discrete reading that is within the corresponding amplitude interval. A second state of a respective binary bin may indicate that during the corresponding time interval, the numeric time series does include at least one discrete reading that is within the corresponding amplitude interval.

It is understood that a number of a respective binary bin may correspond to an indication of a respective amplitude interval. That is, the binary bins may be numbered according to their corresponding amplitude intervals. The flat histograms and binary bins may advantageously reduce an amount of memory and/or storage space involved while carrying out step b) .

Furthermore, the flat histograms are compressed into respective compressed flat histograms. A respective compressed flat histogram may be constituted by the number of the lowest bin and the number of the highest binary bin found to be in the second state. In particular, the compressed flat histogram may not include the numbers or any other indications of any other bins found to be in the second state.

That is, the index may include, for each time interval, two binary bin numbers defining a binary bin number interval. Any other information may be advantageously discarded. Thereby, an extremely small index size may be achieved.

It is noted that the structure of the index according to the present embodiment may allow to perform step e) by identifying one or more target binary bin numbers that correspond to the amplitude range criterion; and traversing the index to determine time intervals for which the bin number interval defined by the corresponding compressed flat histogram includes at least one of the one or more target binary bin numbers .

Herein, advantageously, an amount of compressed flat histograms to be traversed may depend solely on the predetermined time resolution and may favorably not depend on an amount of readings included in the numeric time series. Thereby, an amount of processing time for determining the matching time range in step e) may be reduced significantly.

Likewise, an amount of storage space required to store the index may depend solely on the predetermined time resolution (number of compressed flat histograms) and may favorably not depend on an amount of readings included in the numeric time series. Further, by only storing two binary bin numbers per time interval, a dependence of the amount of storage space from the predetermined amplitude resolution may be reduced. Thereby, an amount of storage space for storing the index may be reduced significantly.

According to a further development of the present embodiment, prior to compressing each flat histogram, step b) includes, for each of the time intervals, determining a first amplitude interval of the latest discrete reading of the numeric time series that is associated with a time in a time interval preceding the respective time interval and a second amplitude interval of the earliest discrete reading that is associated with a time in a time interval following after the respective time interval; and setting the binary bins corresponding to the determined first and second amplitude intervals to the second state.

In this way, a problem may be avoided if a time interval does not include any discrete reading, or if the physical quantity changes largely between discrete readings near a boundary between two time intervals.

According to a further embodiment, the lossy index includes a plurality of lossy compressed representations of respective quantizations of the numeric time series into different time intervals according to different predetermined time resolutions .

As noted above, a respective predetermined time resolution may correlate with an amount of data to be processed and a processing time required for determining the matching time range .

Specifically, by reducing the time resolution, the amount of data to be processed may be reduced and a time required to identify a matching time range may be reduced at the expense of a lower precision of the determined matching time range.

For example, the different predetermined time resolutions may include a first time resolution and a second time resolution that is higher than the first time resolution.

It may thus be possible to swiftly identify a lesser precise matching time range by accessing the lossy compressed representation of the quantization according to the first (or "low") time resolution. Thereafter, a more precise matching time range according to the second (or "high") time resolution may be identified by accessing only those portions of the lossy compressed representation of the quantization according to the second (or "high") time resolution that correspond to the lesser precise matching time range.

Thereby, it may advantageously be possible to reduce an amount of data to be processed and to reduce a processing time for identifying the matching time range even in a case where a high precision and/or time resolution is required.

According to another embodiment, the plurality of different predetermined time resolutions is a logarithmic series of time resolutions .

"Logarithmic", herein, may refer to a series with changing dimensions, or orders, of time.

For example, a logarithmic series of time resolutions may be time resolutions (or time interval lengths) such as 365 days or one year, 30 days or one month, 1 day, 1 hour, 1 minute.

The lossy index including a plurality of lossy compressed representations of respective quantizations of the numeric time series according to a logarithmic series of different predetermined time resolutions may further improve search speed and reduce memory and/or storage space requirements for the index.

According to a further embodiment, step a) is executed repeatedly for different portions of the numeric time series; and step b) includes creating the index upon a first execution of the acquiring step a) , and includes updating the index upon each subsequent execution of the acquiring step a) .

In step b) , updating the index may include, for example, updating the highest and lowest binary bin numbers of each compressed flat histogram according to the discrete readings observed in the respective portion of the numeric time series that is read in the respective preceding step a) . Thereby, the numeric time series may be acquired in portions, and an amount of memory and/or storage space required for acquiring the time series may be reduced. Likewise, easy and efficient updates may be possible when further portions/further readings are added to the numeric time series over time without requiring the index to be re-built from scratch .

In particular, it may be possible to keep a "live" index up to date for "live" numeric time series data acquired continuously and directly from an industrial facility or the like.

According to a further embodiment, a non-contiguous and non overlapping plurality of smallest time ranges matching the amplitude range criterion is determined in step e) and output in step f) .

Specifically, by accessing the index as described in the preceding embodiments, it may preferably be possible to determine a plurality of time intervals within which the physical quantity has matched (is known, determined and/or assumed to have matched) the amplitude range criterion ("matching time intervals") . The matching time intervals may then be combined into matching time ranges, wherein each matching time range comprises one or more matching time interval. In particular, adjacent matching time intervals may be combined into a same matching time range. Thereby, a non contiguous and non-overlapping plurality of smallest matching time ranges may be determined.

Advantageously, thereby, the response that is output of step f) may be made more concise.

According to a further embodiment, in step d) , the search request includes a logical expression formed by one or more amplitude range criteria and one or more logical operators, and in step e) , a time range is determined within which the physical quantity is known, based on the index, to have matched the logical expression.

A logical operator may be a Boolean operator such as logical "AND", logical "OR" or logical "NOT". According to one variant, step e) may be repeated for each of the amplitude range criteria to determine a respective matching time range. A number of matching time ranges determined in this way may then be joined (in case of logical "OR"), intersected (in case of logical "AND") or negated (in case of logical "NOT") to form the matching time range within which the physical quantity is known, based on the index, to have matched the logical expression.

According to another variant, a number of target binary bin numbers may be determined based on the logical expression. The index may then be traversed by comparing said target binary bin numbers with the lowest and highest binary bin numbers of each compressed flat histogram.

Thereby, a single traversal of the index may be sufficient to provide the response to the logical expression, thus further improving the search speed.

That is, the proposed method may advantageously support complex and sophisticated search requests.

According to a further embodiment, in step a) , the numeric time series is acquired from a sensor installed in an industrial facility.

The numeric time series may be acquired directly from the sensor. Alternatively, the numeric time series may be acquired and stored in a data warehouse or cloud, and acquiring the numeric time series in step a) may comprise reading the numeric-series, portion-by-portion, from the data warehouse or cloud .

According to a further embodiment, the plurality of discrete readings of the physical quantity are subjected to dead banding prior to being acquired as the numeric time series.

Dead-banding may refer to discarding a respective discrete reading if the discrete reading does not deviate from the last non-discarded discrete reading by more than a predetermined threshold. The predetermined threshold may be an absolute or relative value. Thereby, an amount of data that is acquired when creating the index and an amount of time required to create the index may be advantageously reduced.

Any embodiment of the first aspect may be combined with any embodiment of the first aspect to obtain another embodiment of the first aspect.

According to a further aspect, a computer program product comprises a program code for executing the above-described method for performing a range search in numeric time series data, when run on at least one computer.

A computer program product, such as a computer program means, may be embodied as a memory card, USB stick, CD-ROM, DVD or as a file, which may be downloaded from a server in a network. For example, such a file may be provided by transferring the file comprising the computer program product from a wireless communication network.

According to a further aspect, a computerized device for performing a range search in numeric time series data comprises at least one processing unit and a storage unit and further comprises: a) a first entity configured to acquire, at least temporarily, a numeric time series including a plurality of discrete readings of a physical quantity associated with time; b) a second entity configured to process the acquired numeric time series to derive an index from the acquired numeric time series; c) a third entity configured to store the index; d) a fourth entity configured to receive a search request including an amplitude range criterion; e) a fifth entity configured to access the index stored in the storage unit to determine a time range within which the physical quantity has matched the amplitude range criterion; and f) a sixth entity configured to output the determined time range.

The embodiments and features described with reference to the method of the present invention apply mutatis mutandis to the computerized device of the present invention. Specifically, the computerized device of the present invention may be implemented to carry out the method of the present invention. The respective entity, e.g. the at least one processing unit, the storage unit and/or the first to fifth entity, may be implemented in hardware and/or in software. If said entity is implemented in hardware, it may be embodied as a device, e.g. as a computer or as a processor or as a part of a system, e.g. a computer system. If said entity is implemented in software it may be embodied as a computer program product, as a function, as a routine, as a program code or as an executable obj ect .

Further possible implementations or alternative solutions of the invention also encompass combinations - that are not explicitly mentioned herein - of features described above or below with regard to the embodiments. The person skilled in the art may also add individual or isolated aspects and features to the most basic form of the invention.

Further embodiments, features and advantages of the present invention will become apparent from the subsequent description and dependent claims, taken in conjunction with the accompanying drawings, in which:

Fig. 1 shows a flow chart illustrating steps of a method according to one exemplary embodiment;

Fig. 2 shows a block diagram of a computerized device according to the one exemplary embodiment;

Fig. 3 shows a diagram illustrating a graph of a physical quantity and an exemplary numeric time series indicative of the physical quantity;

Fig. 4 illustrates a quantization of the numeric time series illustrated in Fig. 3 in time-amplitude space;

Fig. 5 shows another graph of a physical quantity, a corresponding numeric time series, corresponding flat histograms and an interim state of corresponding compressed flat histograms while the index is being derived;

Fig. 6 shows a block diagram of a computerized device according to another exemplary embodiment, an industrial facility and further periphery; Fig. 7 shows a flow chart illustrating steps of a method according to the other exemplary embodiment;

Fig. 8 illustrates compressed flat histograms traversed during a range search according to an exemplary use case; and

Fig. 9 shows a table illustrating numbers of available and traversed compressed flat histograms according to the exemplary use case.

In the Figures, like reference numerals designate like or functionally equivalent elements, unless otherwise indicated.

Fig. 1 shows a flow chart illustrating steps of a method, and Fig. 2 shows a block diagram of a computerized device 1 according to one exemplary embodiment. Reference will now be made to Fig. 1 and Fig. 2.

The computerized device of the exemplary embodiment will be referred to as range search device 1. The range search device 1 comprises a crawler 10 (example of a first entity) , an indexer 20 (example of a second and third entity) , a storage unit 30, a numeric search engine 50 (example of a fifth entity) and a serving layer API (Application Programming Interface) 40, 60 comprising a receiving section 40 (fourth entity) and a transmitting section 60 (sixth entity) .

In step S10, the crawler 10 temporarily acquires a numeric time series including a plurality of discrete readings associated with time. A respective discrete reading is indicative of a physical quantity, such as for example a temperature or a pressure. In step S20, the indexer 20 processes the acquired numeric time series and thereby derives an index from the plurality of discrete readings associated with time that are included in the numeric time series. In step S30, the indexer 20 stores the index in the storage unit 30.

In step S40, the receiving section 40 of the serving layer API 40, 60 receives a search request. The search request comprises an amplitude range criterion. The receiving section 40 transmits the amplitude range criterion to the numeric search engine 50. In step S50, the numeric search engine 50 accesses the index stored in the storage unit 30 based on the amplitude range criterion. More specifically, the numeric search engine 50 traverses the index stored in the storage unit 30 to determine one or more time ranges for which the physical quantity has matched the amplitude range criterion ("matching time ranges" ) .

In particular, the numeric search engine 50 may leverage an assumption about a behavior of the physical quantity such as its steadiness. For example, if the index stored in the storage unit 30 includes an indication that, for a given time range, the numeric time series included both a discrete reading within an amplitude range that is higher than an amplitude range specified by the amplitude range criterion and a discrete reading within an amplitude range that is lower than the amplitude range specified by the amplitude range criterion, the numeric search engine 50 may determine the given time range as a matching time range based on the assumption that the physical quantity has crossed the specified amplitude range while steadily transitioning from the higher amplitude range to the lower amplitude range or vice versa.

In step S60, the one or more matching time ranges determined in step S50 are output by the transmitting section 60 of the serving layer API 40, 60 as a response to the search request.

Thus, a response to the search request may be advantageously determined by performing a fast index-based range search. The index-based range search does not use the numeric time series. There is therefore no need for the numeric time series, which may comprise huge amounts of data, to be permanently stored in and/or to be transferred to the range search device 1.

Thus, according to a preferred variant, the acquired numeric time series is discarded after step S30. In other words, no portion of the numeric time series is permanently stored in the range search device 1. According to a further variant, the serving layer API 40, 60 may be configured to receive a request, such the search request, via a wired or wireless network and to transmit a response, such as the matching time range, via the wired or wireless network.

According to a further variant, each of the crawler 10, the indexer 20, the numeric search engine 50 and the serving layer API 40, 60 may be embodied by a computer program product stored in a memory (not shown) and executed on a processor (not shown) of the computerized device 1. According to yet another variant, some or all of the entities 10, 20, 50, 40,

60 may be embodied in hardware.

Further exemplary embodiments with additional details will be described below. The further exemplary embodiments are based on the exemplary embodiment described above.

Fig. 3 shows a graph 100 indicating an amplitude of a physical quantity. Merely as an example, the physical quantity is a temperature. Fig. 3 further illustrates a numeric time series comprising, merely as an illustrative example, seven discrete readings R1 to R7 or samplings of the physical quantity.

In particular, the numeric time series illustrated in Fig. 3 is a non-equidistant numeric time series and comprises the temperature readings R1 (T=187,5°C at time t=5:30), R2 (T=87,5°C at time t=5:50), R3 (T=72°C at time t= 5:55), R4

(T=62,5°C at time t=6:15), R5 (T=60°C at time t=6:55), R6

(T=70°C at time 7:30) and R7 (T=120°C at time 7:50).

It is noted that the graph 100 of the physical quantity is steady in between any two readings R1-R7. More particularly, the graph 100 of the physical quantity is well-behaved. For the graph 100 shown in Fig. 3, "well-behavedness" may mean that there are no local maxima or minima between any two readings R1-R7. However, "well-behavedness" may also mean that if there are any local maxima or minima between any two discrete readings R1-R7, then the local minima are no lower than the lower of the two discrete readings R1-R7, and the local maxima are not higher than the higher of the two discrete readings R1-R7. As will become evident in the following description, the proposed method and reading device (1 in Fig. 1) leverage the assumption that the physical quantity, a graph thereof and/or the numeric time series sampled therefrom, are steady and well-behaved.

Fig. 4 illustrates a quantization of the numeric time series illustrated in Fig. 3 in time-amplitude space. More specifically, the numeric time series is shown as being quantized into a plurality of flat histograms h5, h6, h7 each including a plurality of binary bins b0-b7.

By way of example, a predetermined time resolution (time interval width) is selected to be 1 hour, and a predetermined amplitude resolution (amplitude interval width) is selected to be 25 °C .

Each of the binary bins b0-b7 of each of the flat histograms h5-h7 comprises a flag indicating either non-occurrence (shown as an empty rectangle) or occurrence (shown as a hatched rectangle) of a discrete reading in a temperature amplitude interval associated with the respective binary bin b0-b7 and a time interval associated with the respective flat histogram h5-h7. A flag indicating non-occurrence is also referred to as a flag in a first state; and a flag indicating occurrence is also referred to as a flag in a second state.

Specifically, binary bin b7 of flat histogram h5, which is associated with the amplitude interval T=175°C - 200°C and the time interval t=5:00-6:00, comprises a flag in the second state indicating occurrence of the discrete reading R1. Binary bin b3 of flat histogram h5, which is associated with the amplitude interval T=75°C - 100°C and the time interval t=5:00-6:00, comprises a flag in the second state indicating occurrence of the discrete reading R2. Binary bin b4 of flat histogram h5, which is associated with the amplitude interval T=50°C - 75°C and the time interval t=5:00-6:00, comprises a flag in the second state indicating occurrence of the discrete reading R3. Binary bin b2 of flat histogram h6, which is associated with the amplitude interval T=50°C - 75°C and the time interval t=6:00-7:00, comprises a flag in the second state indicating occurrence of the discrete readings R4 and R5.

Binary bins b2 and b4 of flat histogram h7, which are associated with the time interval t=7:00-8:00 and respective amplitude intervals T=50°C - 75°C and T=100°C - 125°C, comprise respective flags in the second state indicating occurrence of the discrete readings R6 and R7 , respectively.

All other binary bins comprise a flag in the first state indicating no occurrence of any matching discrete reading.

It will be appreciated that the flat histograms h5 to h7 that are visualized in Fig. 4 may be represented as binary values, wherein a binary zero may indicate the first state (no matching reading) and a binary one may indicate the second state (at least one matching reading) . The binary representations of flat histograms h5 to h7 may thus be written as follows:

bO bl b2 b3 b4 b5 b5 b7

h5: 0 0 1 1 0 0 0 1

h6: 0 0 1 0 0 0 0 0

h7 : 0 0 1 0 1 0 0 0

(Table 1)

It is noted that in the present example, 8 bits per flat histogram would be required to store the flat histograms shown in Table 1 in the storage unit 30 (Fig. 2) .

It will be appreciated that the flat histograms shown in Table 1 constitute a quantization of the numeric time series in time-amplitude space and contain sufficient information to provide a response to a range search request. That is, for example, if a search request "T>= 75°C AND T<=100°C" is received, the numeric search engine 50 (Fig. 2) could check the above three flat histograms h5, h6, h7 to see that the physical quantity was in this amplitude interval in a time interval in which bin b3 (Fig. 4) is one. This is the case for the histogram h5, which is associated with the time interval from t=5:00 and t=6:00. However, it is not readily visible from the flat histograms of Table 1 that the physical quantity has also transitioned the amplitude range between T=75°C and T=100°C in the time interval between t=7:00 and t=8:00.

The inventors of the present invention have realized that in a case where the numeric time series is indicative of a physical quantity such as a temperature, it may be beneficial to assume that the physical quantity, while transitioning from one discrete reading R6 to another discrete reading R7, has passed through all intermediate amplitude intervals that are in between the amplitude interval of the one discrete reading R6 and the amplitude interval of the other discrete reading R7.

Therefore, according to the present exemplary embodiment, it is proposed to use, as the index, a more beneficial representation of the quantization of the numeric time series. Specifically, a lossy compressed representation of the quantization of the numeric time series is used as the index. More specifically, the lossy compressed representation of the quantization is a series of compressed flat histograms, one for each time interval, as will be described below.

More particularly, a respective compressed flat histogram does not include the values of the individual binary bins b0-b7, but rather includes an indication of a lowest amplitude interval and an indication of a highest amplitude interval in which a discrete reading R1-R7 is observed in the corresponding time interval. Herein, the indication of the lowest amplitude interval may be the number of the lowest binary bin and the indication of the highest amplitude interval may be the number of the highest binary bin in the second state.

For example, a compressed flat histogram ch5 (Table 2 below) corresponding the flat histogram h5 (Fig. 4 and Table 1) may include "2" as the lowest and "5" as the highest binary bin number . Table 2 shows decimal and binary representations of compressed flat histograms ch5, ch6 and ch7, which correspond to the flat histograms h5, h6 and h7 of Fig. 4 and Table 1:

_ decimal binary

ch5 : 2 7 010 111

ch6 : 2 2 010 010

ch7 : 2 4 010 100

(Table 2)

It is noted that, when creating the compressed flat histograms ch5 to ch7, some information is lost. For example, the information about the discrete reading R2 is not included in the compressed flat histogram ch5. Therefore, the representation shown in Table 2 is called a lossy compressed representation of the quantization shown in Table 1.

It is further noted that only 6 bits are required to represent each compressed flat histogram ch5-ch7. That is, when forming the index from the compressed flat histograms ch5, ch6, ch7, its size is 75 % of the size of a hypothetical uncompressed index formed from the flat histograms h5, h6, h7.

It is further noted that an amount of storage space saved by using compressed flat histograms rather than uncompressed flat histograms increases logarithmically with the predetermined temporal precision. That is, when a time space of the numeric time series is quantized into 256 bins, each flat histogram requires 256 bits, however, a corresponding compressed flat histogram created according to the principle disclosed above only requires 16 bits to store two 8-bit binary bin numbers, and the resulting index is compressed to 16/256 = 6.25 % of the size of an uncompressed index formed from the original flat histograms. When the numeric time series is quantized into 2048 bins, a flat histogram requires 2048 bits, whereas a corresponding compressed flat histogram requires 22 bits, and the resulting index is 22/256 = 1.07 % of the size of an uncompressed index. With reference to Table 2 and Figs. 1 to 4, it is noted that there are at least two alternative modes of operation of the indexer 20 that may be used in step S20 to derive an index, such as the index shown in Table 2.

That is, according to one embodiment, the indexer 20 may actually, at least temporally, create and store the corresponding flat histograms h3-h5 (Fig. 4, Table 1) in the storage unit 30. The indexer 20 may consume the numeric time series from the crawler 10 and fill the flat histograms h3-h5 in accordance with the discrete readings of the numeric time series. In this way, an uncompressed index as shown in Table 1 may be created and stored in the storage unit 30. After the flat histograms h3-h5 have been created and filled in this way, the indexer 20 may proceed to compress the flat histograms h3-h5 to arrive at the compressed flat histograms ch3-ch5 shown in Table 2. The indexer 20 may then form the lossy compressed index from the compressed flat histograms ch3-ch5 and store the lossy compressed index in the storage unit 30. After that, the indexer 20 may discard that flat histograms h3-h5 stored in the storage unit 30, as they are no longer reguired.

However, according to another embodiment, a necessity to temporarily create and store the flat histograms h5-h7 in the storage unit 30 may be avoided. Specifically, the indexer 20 may directly create the lossy compressed flat histograms ch5- ch7 shown in Table 2 and store them in the storage unit 30. Initially, for each compressed flat histogram ch5-ch7, the indexer 20 may set the lowest binary bin number to a value of INFINITY, or to a highest possible value, and may set the highest binary bin number to a value of -1 or to a lowest possible value, thereby marking the flat histogram ch5-ch7 as uninitialized. Then, the indexer 20 may proceed to consume the discrete readings of the numeric time series. For each discrete reading, the indexer 20 may determine one of the compressed flat histograms ch5-ch7 according to the time associated with the discrete reading, and may further determine a binary bin number according to the amplitude (value) of the discrete reading. If the lowest binary bin number of the determined compressed flat histogram ch5-ch7 is higher than the determined binary bin number, the lowest binary bin number of the determined compressed flat histogram ch6-c7 is set to the determined binary bin number. Likewise, if the highest binary bin number of the determined compressed flat histogram ch5-ch7 is lower than the determined binary bin number, the highest binary bin number of the determined compressed flat histogram ch5-ch7 is set to the determined binary bin number. In this way, the compressed index may be derived directly from the numeric time series.

It is noted that, according to some embodiments, the numeric time series may be subjected to dead-banding during or prior to acquisition. Dead-banding may be advantageously used to reduce the storage space required to acquire and store the numeric time series.

Fig. 5 shows another graph 100 of a physical quantity, a corresponding numeric time series R1-R14, corresponding flat histograms hi to h8 and an interim state of corresponding compressed flat histograms chi to ch8 while the index is being derived .

The numeric time series has been subjected to dead-banding during acquisition by the crawler 10 (Fig. 2). As a result, the numeric time series only comprises the discrete readings R1-R4 and RIO to R14, but does not comprise discrete readings R5 to R9, which have been removed by the dead-banding. Therefore, the flat histograms h3 to h6 for the dead-banded time intervals are empty. Thus, the corresponding compressed flat histograms ch3 to ch6 are still uninitialized (visualized as "INF;-1") after the numeric time series has been processed, or consumed, by the indexer 20. Therefore, for each compressed flat histogram ch3-ch6 that is uninitialized after the numeric time-series has been processed, and/or for each flat histogram h3-h6 all bins of which are empty after the numeric time- series has been processed, the indexer 20 (Fig. 2) may preferably perform the following step: The indexer 20 (Fig. 2) sets the lowest and highest binary bin number of the respective uninitialized compressed flat histogram ch3-ch6 to binary bin numbers corresponding to the amplitudes of the latest discrete reading R4 in the numeric time series that comes before the dead-banded time interval and of the earliest discrete reading R10 in the numeric time series that comes after the dead-banded time interval. That is, in the example shown in Fig. 5, each of the compressed flat histograms ch3- ch6 is set to "3; 3".

Now, again with reference to Table 2 and Figs. 1 to 4, and according to one exemplary embodiment, an operation of the numeric search engine 50 to determine the matching time range in step S50 is described.

Let us assume that in step S40, a search request is received that includes an amplitude range criterion that specifies an amplitude interval between T=75°C and T=100°C.

Then, in step S50, the numeric search engine 50 determines one or more target binary bin numbers that correspond to the searched amplitude interval. In this case, the numeric search engine 50 translates the search request into the target binary bin number 3 (b3, Fig. 4) . Then, the numeric search engine 50 traverses the compressed index (Table 2) stored in the storage unit 30. Herein, the numeric search engine 50 checks each of the compressed flat histograms ch5, ch6, ch7 to determine whether the target binary bin number 3 is larger than or equal to the lowest binary bin number and smaller than or equal to the highest binary bin number of the respective compressed flat histogram ch5, ch6, ch7. With reference to Table 2, this is the case for the compressed flat histograms ch5 and ch7. The compressed flat histogram ch5 is associated with the time interval t=5:00-6:00. The compressed flat histogram ch7 is associated with the time interval t=7 : 00-t=8 : 00. Therefore, in step S50, the numeric search engine 50 determines the time intervals t=5:00-6:00 and t=7:00-8:00 as respective matching time ranges to be transmitted as a response to the search request in step S60.

Several benefits of using a compressed index in the manner described above may be identified. That is: 1) the size of the compressed index may be reduced significantly through the lossy compression technique described above. 2) That notwithstanding, all time ranges in which the physical quantity has matched the amplitude range criterion were successfully determined. This also includes the time interval t=7:00-8:00, in which the numeric time series does not include any discrete reading in the amplitude range of T=75-100°C, but the physical quantity has nonetheless traversed the amplitude range of T=75 °C-100 °C3 ) Even though compressed flat histograms were used, no de-compression (re-creation of the original flat histograms h5-h7) is required when the numeric search engine 50 traverses the index. The numeric search engine 50 is able to directly access the compressed flat histograms ch5-ch7 and retrieve the required information therefrom.

According to one embodiment, and with further reference to Fig. 1 to 4 and Tables 1 and 2, the range search device 1 may be configured to support a search request that includes a logical expression.

For example, the receiving section 40 of the serving layer API of the range search device 1 may receive a search request that includes an amplitude range criterion including a number of logical expressions such as "T>=175°C OR (T>=100°C AND T<=125°C)". In response to such a search request, the range search device 1 may respond with two matching time ranges t1=5 : 00-6 : 00 and t2=7 : 00-8 : 00.

Specifically, the receiving section 40 may divide the search request into a first amplitude range criterion "T>=175 °C" and a second amplitude range criterion "T>=100°C AND T<=125°C". The numeric search engine 50 may determine the time range tl=5:00-6:00 to the first amplitude range criterion, and may determine the time range t2=07 : 00-8 : 00 in response to the second amplitude range criterion. The numeric search engine 50 may further determine that the time ranges tl and t2 are noncontiguous. The numeric search engine 50 may join the time ranges tl and t2 in response to the logical "OR" operator. Herein, the numeric search engine 50 may refrain from combining the time ranges tl and t2 into a single time range ts=5 : 00-8 : 00 , and may rather supply the plurality of noncontiguous time ranges tl=5:00-6:00 and t2=7:00-8:00 to the transmitting section 60 of the serving layer API of the range search device 1 as a result of said joining. Specifically, the response may exclude a time from 6:00 to 7:00, for which the physical quantity is known not to have matched the amplitude range criterion of the search request.

Conversely, for a search request such as "T>=50°C AND T<=75°C", the numeric search engine 50 may determine that the time intervals t3=5 : 00-6 : 00 , t4=6:00-7:00 and t5=7:00-8:00 are contiguous, and may merge the time intervals t3-t5 into a single matching time range t=5:00-8:00.

That is, the range search device 1 may respond to a range search request with a non-contiguous and non-overlapping plurality of smallest time ranges for which the physical quantity is known, based on the index, to have matched the search request (amplitude range criterion and/or logical expression including a plurality of amplitude range criteria) .

It is noted that the range search device 1 (numeric search engine 50) may determine the one or more matching time ranges described above with a small amount of processing, by referring to a mere total of 18 bits of data, i.e. the compressed flat histograms ch5-ch7 shown in Table 2, and without having to access numeric time series itself.

Another exemplary embodiment is now described with reference to Fig. 6.

Fig. 6 shows a block diagram of a range search device 1 according to the present exemplary embodiment, an industrial facility 2 and further periphery.

Specifically, Fig. 6 shows a gas turbine 2 (example of a portion of an industrial facility) , a data warehouse 11, the range search device 1 and a personal computer 4. The range search device 1 of Fig. 6 comprises elements already described in connection with the range search device 1 of Fig. 2, and further comprises a cache memory 70 to be described later. The gas turbine 2 is equipped with a sensor 3. The sensor 3 supplies a signal, such as an analog or a digital signal, indicative of amplitude values of a physical quantity, such as temperature values or the like. The sensor signal is sampled, and the samplings are stored, in the data warehouse 11, as discrete readings in association with time that form a numeric time series. Over time, a large amount, such as terabytes or petabytes of data, is accumulated in the numeric time series stored in the data warehouse 11. The data warehouse 11 may be implemented as a cloud storage, as a Hadoop HDFS or Hive file system, or as a centralized server farm.

A user of the personal computer 4 may be a technician who wants to perform offline analysis of the gas turbine 2. For example, the technician may want to know in which time ranges a certain operating condition existed in the gas turbine 2. For example, the technician may want to know during which time ranges a temperature (physical quantity) measured by the sensor 3 exceeded a predetermined threshold such as 195 °C.

However, the numeric time series is stored in the data warehouse 11, and performing a linear search for corresponding readings in terabytes of data stored in the data warehouse 11 may be prohibitively slow and costly.

Therefore, the user of the personal computer 4 may use a client software, such as a web client, which is installed on the personal computer 4, to wirelessly transmit a range search request including an amplitude range criterion, such as "T>=195 °C", to the range search device 1.

The range search device 1 may respond to the range search request with a response indicating one or more matching time ranges .

For example, the web client may be a web browser displaying a web page that includes JavaScript code. For example, sending the range search request and responding to the range search request may involve communication between the web client and the range search device according to a Representational State Transfer API or REST API. As has been discussed hereinabove, the time range outputted by the range search device 1 may be precise up to a predetermined amplitude resolution and a predetermined time resolution.

In response to receiving the response from the range search device 1, a computer program implemented on the personal computer 4 may access the data warehouse 11 and request precise readings and precise times only for the matching time ranges included in the response from the range search device 1.

Thereby, a user may be provided with precise readings and precise times of interest while, advantageously, less data is requested from the data warehouse 11, less data is transferred, and a cost incurred while transferring said data from the data warehouse 11 is reduced.

Preferential details of the mode of operation of the range search device 1 according to one exemplary embodiment will now be discussed with reference to Fig. 6 and Fig. 7.

Fig. 7 shows a flow chart illustrating steps of the method according to the present exemplary embodiment.

Specifically, an amount of data occupied by the numeric time series stored in the data warehouse 11 may be larger than a capacity of the storage unit 30 of the computerized range search device 1. Therefore, the range search device 1 of Fig. 6 may be configured to execute steps S10, S20 and S30 in loops (see Fig. 7). Specifically, in step 10, for each loop, the crawler 10 acquires a different portion of the numeric time series stored in the data warehouse 11. In steps S20 and S30, for the first loop, the indexer 20 creates and stores the index in the storage unit 30 based on the acquired portion of the numeric time series. For each following loop, the indexer 20 updates the index stored in the storage unit 30 based on the respective acquired portion of the numeric time series.

Herein, updating the index may comprise adjusting the lowest and highest binary bin numbers of the compressed flat histograms according to the newly acquired discrete readings. Thereby, advantageously, the crawler 10 and the indexer 20 may build the index stored in the storage unit 30 step by step without having to acquire the numeric time series in its entirety. A less costly low priority communication link may be used for building the index over a certain amount of time.

When the index is built, the range search device 1 is ready to respond to search requests. Specifically, the receiving section 40 may wirelessly receive a search request in step S40, the numeric search engine 50 may determine one or more matching time ranges in step S50, and the transmitting section 60 may wirelessly transmit a response including the one or more matching time ranges in step S60.

Steps S40, S50 and S60 may also be executed in loops, i.e., steps S40, S50 and S60 may be executed every time a search request is received by the receiving section S40.

The range search device 1 of Fig. 6 is preferably provided with a cache memory 70, such as a random access memory (RAM) or the like, to further reduce a response time. More preferably, a copy of the index may be stored in the cache memory 70, and the numeric search engine 50 may serve responses to incoming search requests directly from the copy of the index stored in the cache memory 70.

After the index has been built as described above, further discrete readings may be added to the numeric time series stored in the data warehouse 11 over time. Therefore, the range search device 1 may continue to perform steps S10, S20,

S30 in loops in predetermined intervals after the index has been initially built. That is, the crawler 10 may continue crawling for live data. When new discrete readings (a new portion of the numeric time series) are detected by the crawler 10, the crawler 10 acquires the new portion of the numeric time series (steps S10) , the indexer 20 updates the index stored in the storage unit 30 (step S20 and S30), and the cache 70 invalidates its contents. Thereby, the index may be kept up to date as new discrete readings are acquired from the gas turbine 2 over time. An exemplary use case will now be described. Particular reference will be made to Fig. 8 and Fig. 9.

According to the exemplary use case, a range search device 1 (Fig. 2; Fig. 6) is used to first create a multi-leveled compressed index and then to perform a range search using an index of a numeric time series that includes discrete readings indicative of a temperature.

Fig. 8 illustrates compressed flat histograms traversed during a range search according to the exemplary use case. Fig. 9 shows a table illustrating numbers of available and traversed compressed flat histograms according to the exemplary use case .

In the exemplary use case, a quantization of the temperature space into 256 bins is used. In this way, each bin may cover 1/256 or 0.4% of a total amplitude range. In the exemplary use case, temperature readings are expected to be within 0°C and 200°C. That is, the 256 bin numbers are associated with amplitude ranges such that a total amplitude range covered by the quantization is from 0 to 200°C. Thereby, a predetermined amplitude resolution of +/-0.78°C may be attained.

A numeric time series is stored in the data warehouse 11 (Fig. 5) . The numeric time series comprises temperature readings associated with times from a time period of the three years of 2015, 2016, 2017. It is noted that the year of 2016 is a leap year having 366 days, while years 2015 and 2017 have 365 days each, so the total time range covered is 1096 days. Let us suppose that the numeric time series comprises one discrete reading every second. Each temperature reading may be stored as a double float value requiring 8 bytes of storage space. A minimum amount of storage space required for storing just one numeric time series is therefore 757,555,200 bytes. The amount of storage space required for the numeric time series may be even larger if the readings are stored as textual data, XML data or the like, or are stored in association with meta data such as timestamps, status flags and the like. A compressed index is built according to the proposed method. The compressed index comprises multiple levels of respective compressed flat histograms. A first level of compressed flat histograms h0330... is built hourly time resolution. That is, a respective compressed flat histogram h033000, h030001, ... of the first level covers a time range of one hour each. A second level of compressed flat histograms d03... is build with daily time resolution. A third level of compressed flat histograms mOl, m02, ... is built with monthly time resolution. A fourth level of three compressed flat histograms y2015, y2016, y2017 is built with yearly time resolution. The different levels of time resolution (yearly, monthly, daily, hourly) thereby form a series of time resolutions that is considered to be a logarithmic series in the context of the present disclosure.

The column labeled "#h" in the table of Fig. 9 shows the number of compressed flat histograms created by the indexer 10 (Fig. 2; Fig. 6) on the yearly (y) , monthly (m) , daily (d) and hourly (h) level. The total number of compressed flat histograms comprised by the hourly index is 27439. Each compressed flat histogram comprises 16 bits (two eight-bit binary bin numbers that can range from 0 to 255) and therefore consumes 2 bytes when stored. A total amount of storage space for storing the index is therefore 54,878 bytes, or a factor of roughly 14,000 less than the minimum storage size of the original numeric time series.

The step of determining one or more matching time ranges (step S50 in Fig. 7) will now be explained with special consideration of the logarithmic series of time resolutions.

In the exemplary use case, a search request including an amplitude range criterion such as "T>195°C" is received. The numeric search engine (50 in Fig. 6) translates the amplitude range criterion into a target binary bin number range of 251 to 255. (It is noted that, when quantizing a temperature range of 0°C - 200°C into 256 bins according to the present use case, bin number 249 corresponds to an amplitude range of T=194, 53125-195.3215°C) . Reference is now made to Fig. 8 in conjunction with Fig. 6. The numeric search engine 50 continues to traverse the yearly compressed flat histograms y2015, y2016, y2017 stored as part of the index in the storage unit 30. For each compressed flat histogram y2015, y2016, y2017, the numeric search engine 50 checks whether the binary bin number range defined by the lowest and highest binary bin numbers included in the compressed flat histogram y2015, y2016, y2017 overlaps with the target binary bin number range of 251 to 255.

If there is no such overlap, then it is determined that in the corresponding year, there is no matching discrete reading in the numeric time series that matches the amplitude range criterion, and the numeric search engine 50 does not descend down into the monthly, daily or hourly histograms corresponding to the corresponding year.

In the example shown in Fig. 9, a match (overlap) is detected only for the yearly compressed flat histogram y2017. The numeric search engine 50 descends down into the monthly compressed flat histograms m01-ml2 corresponding to year 2017.

In a manner similar to the manner described with the yearly compressed flat histograms, the numeric search engine 50 identifies monthly compressed flat histogram m03 as the only monthly histogram indicating a match, and descends further down to identify daily compressed flat histogram d0330 as the only day having a match.

Finally, the numeric search engine 50 descends down to the hourly level and identifies hourly compressed flat histograms h033001 , h033019, h033020, h033021 and h033022 as hourly compressed flats histograms indicating a match.

Based on the index traversal described hereinabove, the numeric search engine 50 determines the time range from 01:00 to 02:00 on March 30, 2017, and the time range from 19:00 to

23:00 on March 30, 2017, as the matching time ranges to be output by the transmitting section 60 of the server level API of the range search device 1. Fig. 9 shows a table illustrating numbers of available histograms on a yearly (y) , monthly (m) , daily (d) and hourly (h) level (column "#h") , and numbers of traversed histograms (column "#c") that have been checked or traversed in the exemplary processing described hereinabove.

Attention is drawn to the fact that the index comprises 27439 histograms, but only 70 histograms have been traversed to determine the desired response.

In other words, instead of traversing at least 757,555,200 bytes of numeric time series data in a linear search, or traversing 50,608 bytes of compressed flat histogram data corresponding to 26304 hourly compressed flat histograms, only 140 bytes of compressed flat histogram data corresponding to 70 compressed flat histograms (3 yearly compressed flat histograms, 12 monthly compressed flat histograms, 31 daily compressed flat histograms and 24 hourly compressed flat histograms) were traversed. A processing time for providing the response may therefore be significantly reduced.

A further exemplary use case is outlined briefly. A data warehouse was set up comprising 5 years of historical data from 1 million sensors installed in industrial machinery such as gas turbines. A computer program product implementing the proposed computerized based method was installed and executed on an industry-standard server computer. An uncompressed index comprising flat histograms was built. The uncompressed index required 1.3 terabytes of storage space. According to the proposed method, a compressed index comprising a lossy compressed representation of a quantization of the data was built. The compressed index required 80 Gigabytes of storage space, thus enabling the compressed index to be loaded into RAM (cache memory 70 in Fig. 5) . With the compressed index stored in RAM, a stable response time of below 1 millisecond was achieved.

Although the present invention has been described in accordance with preferred embodiments, it is obvious for the person skilled in the art that modifications are possible in all embodiments. The exemplary embodiments mainly referred to discrete readings indicative of temperature, however, the proposed method and computerized device may be used with any kind of physical quantity, such as pressure, power, load and the like.

The personal computer 4, the web client and the REST API are merely examples, and the range search request may be transmitted and the response may be received by any entity using any technology. Herein, for example, the range search device and the transmitting and receiving entity may each be implemented in hardware or software. When implemented in hardware, the range search device and the transmitting and receiving entity may be implemented as separate devices or as one integral device. When implemented in software, respective pieces of software implementing the range search device and the transmitting and receiving entity may be installed and executed on separate computing devices or on a same computing device .

In embodiments wherein multiple levels of time resolutions are used, the logarithmic or quasi-logarithmic series of time resolutions is not limited to hourly, daily, monthly, yearly. For example, time resolutions of 1 hour, 10 hours, 100 hours, 1000 hours, 10000 hours and so on could be used instead.

In the exemplary embodiments, the fourth and sixth entity have mainly been described as sections of a serving layer API 40, 60, which may be configured for communication via a wireless or wired network. However, other entities and steps for receiving the search request and outputting the matching time range are conceived, such as input and output through a web interface or another type or graphical user interface.

In some variants, the range search device may be used in an automated maintenance scenario of an industrial facility. That is, in a system comprising an automated monitoring device and the range search device, the automated monitoring device may transmit a search request to the range search device, the range search device may transmit the determined time range to the automated monitoring device, the automated monitoring device my optionally access discrete readings of the numeric time series corresponding to the determined time range, and the automated monitoring device may cause a maintenance operation to be performed on the industrial facility based on and/or dependent on the determined time range and/or the accessed discrete readings. The maintenance operation may be a manual or an automated maintenance operation. An automated maintenance operation may involve changing an operating state of the industrial facility.

In the exemplary embodiment, a single numeric time series and a corresponding index have been described. However, the teachings disclosed herein are also applicable to a computerized device and method for performing a range search in numeric time series data including a plurality of numeric time series. In this case, a plurality of indices may be derived, one for each numeric time series. Search requests including logical expressions and amplitude range criteria for a number of different numeric time series may be supported, such as "Find when temperature T is between 195 °C-200 °C AND when power P is between 300-300 megawatts" or "Find when temperature T1 is between 195°C - 200°C OR temperature T2 is between 90°C -

100°C".

Reference Numerals :

S10-S60 method steps

1 range search device

2 industrial facility

3 sensor

4 personal computer

10 first entity (crawler)

11 data warehouse

20 second and third entity (indexer)

30 storage unit

40 fourth entity (serving layer API)

50 fifth entity (numeric search engine)

60 sixth entity (serving layer API)

70 cache memory

100 graph of physical quantity

t time

T temperature

R1-R14 discrete readings

bnnn bin position nnn

hn flat histogram for time interval n

chn compressed flat histogram for time interval n haabbcc hourly histogram for month aa, day bb, hour cc daabb daily histogram for month aa, day bb

maa monthly histogram for month aa

yj j j j yearly histogram for year jjjj

#h number of available histograms

#c number of checked histograms

Claims

Patent claims

1. A method for performing, using a computerized device (1) comprising at least a processing unit and a storage unit (30), a range search based on numeric time series data, the method comprising :

a) acquiring (S10), at least temporarily, a numeric time series including a plurality of discrete readings (R1-R7) of a physical quantity (T) associated with time (t) ;

b) processing (S20) the acquired numeric time series to derive an index from the acquired numeric time series;

c) storing (S30) the index in the storage unit (30);

d) receiving (S40) a search request including an amplitude range criterion;

e) accessing (S50) the index stored in the storage unit (30) to determine a time range within which the physical quantity has matched the amplitude range criterion; and

f) outputting (S60) the determined time range in response to the search request.

2. The method of claim 1, characterized in that step c) further includes discarding the acquired numeric time series.

3. The method of claim 1 or 2, characterized in that the index is a lossy index.

4. The method of claim 3, characterized in

that the lossy index includes a lossy compressed representation of a quantization of the numeric time series into time intervals according to a predetermined time resolution and into amplitude intervals according to a predetermined amplitude resolution.

5. The method of claim 4, characterized in

that the lossy compressed representation of the quantization of the numeric time series includes, for each time interval, an indication of a lowest amplitude interval and a highest amplitude interval within which the physical quantity has been during the respective time interval.

6. The method of claim 5, characterized in that step b) (S20) includes :

- creating, for each of the time intervals, a flat histogram (h5-h7) including, for each of the amplitude intervals, a binary bin (b0-b7) indicative of whether or not the numeric time series includes at least one discrete reading (R1-R7) that is within the respective amplitude interval and is associated with a time within the respective time interval;

- compressing each flat histogram (h5-h7) into a compressed flat histogram (ch5-ch7) constituted by a number of the lowest binary bin (b0-b7) and a number of the highest binary bin (bO- b7 ) that is indicative of the numeric time series including at least one respective discrete reading (R1-R7) that is within the respective amplitude interval and is associated with a time within the respective time interval; and

forming the index from the plurality of compressed flat histograms (ch5-ch7).

7. The method of any of claims 4 to 6, characterized in that the lossy index includes a plurality of lossy compressed representations of respective quantizations of the numeric time series into different time intervals according to different predetermined time resolutions.

8. The method of claim 7, characterized in that the plurality of different predetermined time resolutions is a logarithmic series of time resolutions.

9. The method of any of claims 1 to 8, characterized in that step a) (S10) is executed repeatedly for different portions of the numeric time series; and

step b) (S20) includes creating the index upon a first execution of the acquiring step a) (S10), and includes updating the index upon each subsequent execution of the acquiring step a) (S10) .

10. The method of any of claims 1 to 9, characterized in that a non-contiguous and non-overlapping plurality of smallest time ranges matching the amplitude range criterion is determined in step e) (S50) and output in step f) (S60) .

11. The method of any of claims 1 to 10, characterized in that in step d) (S40), the search request includes a logical expression formed by one or more amplitude range criteria and one or more logical operators, and

in step e) (S50), a time range is determined within which the physical quantity is known, based on the index, to have matched the logical expression.

12. The method of any of claims 1 to 11, characterized in that in step a) (S10), the numeric time series is acquired from a sensor (3) installed in an industrial facility (2) .

13. The method of claim 12, characterized in that in step a) (S10) , the plurality of discrete readings (R1-R14) of the physical quantity are subjected to dead-banding prior to being acquired as the numeric time series.

14. A computer program product comprising a program code for executing the method of any of claims 1 to 13 when run on at least one computer.

15. A computerized device (1) for performing a range search in numeric time series data, the computerized device (1) comprising at least one processing unit and a storage unit (30) and further comprising:

a) a first entity (10) configured to acquire, at least temporarily, a numeric time series including a plurality of discrete readings (R1-R7) of a physical quantity associated with time; b) a second entity (20) configured to process the acquired numeric time series to derive an index from the acquired numeric time series;

c) a third entity (30) configured to store the index;

d) a fourth entity (40) configured to receive a search request including an amplitude range criterion;

e) a fifth entity (50) configured to access the index stored in the storage unit to determine a time range within which the physical quantity has matched the amplitude range criterion; and

f) a sixth entity (60) configured to output the determined time range.