WO2020064309A1 - System and methods monitoring the technical status of technical equipment - Google Patents

System and methods monitoring the technical status of technical equipment Download PDF

Info

Publication number
WO2020064309A1
WO2020064309A1 PCT/EP2019/073957 EP2019073957W WO2020064309A1 WO 2020064309 A1 WO2020064309 A1 WO 2020064309A1 EP 2019073957 W EP2019073957 W EP 2019073957W WO 2020064309 A1 WO2020064309 A1 WO 2020064309A1
Authority
WO
WIPO (PCT)
Prior art keywords
technical
abnormality
univariate
alarm
threshold
Prior art date
Application number
PCT/EP2019/073957
Other languages
French (fr)
Inventor
Moncef Chioua
Matthieu Lucke
Emanuel Kolb
Martin Hollender
Nuo LI
Andrew Cohen
Original Assignee
Abb Schweiz Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Abb Schweiz Ag filed Critical Abb Schweiz Ag
Priority to CN201980062659.0A priority Critical patent/CN112740133A/en
Publication of WO2020064309A1 publication Critical patent/WO2020064309A1/en
Priority to US17/207,854 priority patent/US20210209189A1/en

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0208Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterized by the configuration of the monitoring system
    • G05B23/0213Modular or universal configuration of the monitoring system, e.g. monitoring system having modules that may be combined to build monitoring program; monitoring system that can be applied to legacy systems; adaptable monitoring system; using different communication protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0221Preprocessing measurements, e.g. data collection rate adjustment; Standardization of measurements; Time series or signal analysis, e.g. frequency analysis or wavelets; Trustworthiness of measurements; Indexes therefor; Measurements using easily measured parameters to estimate parameters difficult to measure; Virtual sensor creation; De-noising; Sensor fusion; Unconventional preprocessing inherently present in specific fault detection methods like PCA-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Definitions

  • the present invention generally relates to the monitoring of technical equipment and more particularly to alarm tools to support operators of technical equipment in controlling the equipment to avoid malfunctioning.
  • Alarm is an audible and/or visible means of indicating to the operator of an equipment an equipment malfunction, process deviation, or abnormal condition requiring a timely response (see also International Society of Automation ISA-18.2). An instance of a particular alarm is referred to as an alarm activation.
  • alarm flood situations are characterized by a combination of a plurality of alarm activations which occur repeatedly. In other words, the same or similar combinations of alarms typically appear in multiple alarm floods. In general, permanent high alarm rates indicate bad alarm quality. Good alarm quality is achieved when:
  • Multi-Variable Operations US patent application US20080234840A1 ; Brooks et al.
  • statistical or mathematical analysis rely solely on historical values of the process variables and do not take into account any process knowledge of the monitored processes and therefore suffers from high numbers of false positives in the detected alarms because it does not become clear what actually is an abnormal situation.
  • a certain deviation of a technical status parameter may be identified by statistical monitoring for triggering an alarm notification although the deviation may still be seen as being within the normal operation of the respective equipment.
  • a computer-implemented method for determining an abnormal technical status of a technical system.
  • a computer system is configured to execute said method by executing a corresponding computer program which includes program instructions that cause the computer system to execute corresponding method steps when loading the computer program into a memory of the computer system and processing the instructions with one or more processors.
  • the computer system receives a plurality of signals from a technical system.
  • Each signal is sampled over time (using the same sampling frequency), or is resampled in a preprocessing step in order to ensure the availability of either a measured or of an estimated value of the plurality of signals at each instance of the computation) and reflects the technical status of at least one system component of the technical system. That is, each signal relates to one system component but a particular system component can be monitored by an operator via multiple signals.
  • the technical system is monitored by one or more human operators.
  • the entirety of all signals reflects the overall technical status of the entire technical system.
  • a human operator cannot derive the information about the overall technical status of the technical system from single signals at the sensor level because there is no possibility for a human being to make sense out of the plurality of signals received in real-time from the sensors.
  • the computer system assists the operator in this monitoring task by deriving from the received sensor signals a single aggregate abnormality indicator reflecting the technical status of the entire system.
  • An alarm management system is associated with the technical system.
  • the alarm management system stores information in relation to alarms which are associated with the signals.
  • An alarm management system is a system for prioritizing, grouping and classifying alerts and event notifications used in supervisory control and data acquisition (SCADA)to improve the provisioning of technical status information to an operator.
  • SCADA supervisory control and data acquisition
  • Most often the major problem is that there are too many alarms annunciated in a plant upset, commonly referred to as alarm flood as explained above.
  • alarm flood commonly referred to as explained above.
  • there can also be other problems with an alarm system such as poorly designed alarms, improperly set alarm points, ineffective annunciation, unclear alarm messages, etc. Poor alarm management is one of the leading causes of unplanned downtime and of major industrial incidents.
  • the alarm management system stores high alarm thresholds and low alarm thresholds associated with respective received signals. Signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective at least one system component. In other words, the alarm thresholds for a particular signal are based on historic knowledge of normal operation and abnormal system behavior. The alarm thresholds reflect critical values beyond which the respective signal value is not perceived anymore as being within the normal operation range. The alarm management system typically raises an alarm per signal when the signal value is exceeding any of the corresponding alarm thresholds. As many technical status parameters are correlated, this typically results in the so-called alarm floods overwhelming the operator with information which cannot be resolved by the operator.
  • the alarm management system can be an integral part of the computer system or it can be a remote system which is communicatively coupled with the computer system so that the computer system can access the data available in the alarm management system.
  • the computer system retrieves the high alarm thresholds and low alarm thresholds associated with the respective received signals from the alarm management system via an appropriate interface.
  • the retrieval of the alarm threshold values may, for example, occur as a kind of initialization step for the computer system. That is, before the computer system starts any computations, it may retrieve all available alarm thresholds from the alarm management system.
  • the retrieval may be repeated at regular update intervals to take into account changes in the alarm management system. For example, an update retrieval may only retrieve alarm thresholds for signals which are actually monitored via the computer system.
  • the computer system has a data processor which is configured to perform the computing tasks as described in the following. Firstly, the data processor computes, at every sampling time point, for each signal with associated alarm thresholds, a univariate distance to its associated alarm thresholds.
  • a univariate distance is the (simple) distance between the values of a single variable j for two observations i and I.
  • a univariate distance is the maximum of the distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for the respective at least one system component.
  • the univariate distance d(t) for a particular signal at sampling time point t can be expressed by the following mathematical formula:
  • x(t) is the sample of the signal at time t
  • x h is the high alarm threshold associated with the signal as defined in the alarm management system
  • x l is the low alarm threshold associated with the signal
  • a is the normal value of the variable (x t ⁇ a ⁇ x h ).
  • a can be chosen as Xfl ⁇ +Xl by default but other values between x l and x h can be chosen for example by estimating the normal operating value based on normal operation data.
  • the univariate distance for a particular signal at a particular sampling time point can be computed as a piecewise linear index so that
  • the distance value is between 0 and 1 if the sampled signal value is between the low alarm threshold and the high alarm threshold (F2a); the distance value is 1 if the sampled signal value is less than or equal to the low alarm threshold, or greater than or equal to the high alarm threshold (F2b); and the distance value is 0 if the sampled signal value corresponds to a predefined parameter value reflecting normal operation (F2c).
  • the univariate distance d(t) can be computed as a smoothened index instead of the piecewise linear computation above.
  • a real signal is noisy in that its“normal” value is fluctuating around this value “a” with a Gaussian distribution. Therefore, the computation of the univariate distances can be further improved by introducing an interval defining a“normal range” [%, a 2 ⁇ of the signal, with the upper interval limit a 2 being less than the respective high alarm threshold x h and the lower interval limit % being greater than the respective low alarm threshold x .
  • Such an interval is used as a deadband for the normal range % and a 2 (xi ⁇ ⁇ a 2 ⁇ x h ).
  • a deadband (sometimes called a neutral zone or dead zone) is a band of input values in the domain of a transfer function in a control system or signal processing system where the output is zero (the output is 'dead' - no action occurs). Deadband regions can be used in control systems such as servo-amplifiers to prevent oscillation or repeated activation- deactivation cycles.
  • the univariate distance d(t) for a particular signal can be computed as the following index:
  • the distance value is 0 if the sampled signal value is inside the deadband interval for a particular signal at a particular sampling time point (F4a); the distance value is
  • the univariate distances are determined by the data processor, a further computing step is executed.
  • the computer system now computes, based on the univariate distances at the respective sampling time points, an aggregate abnormality indicator reflecting the technical status of the entire technical system.
  • the aggregate abnormality indicator is computed as the Euclidian distance D(t) based on the univariate distances of the respective signals and the total number of signals:
  • the aggregate abnormality indicator is computed as a weighted Euclidian distance D w (t) based on the univariate distances of the respective signals and the total number of signals wherein each univariate distance contribution is weighted with a weighting factor corresponding to the severity of an alarm associated with the respective signal as defined in the alarm management system:
  • the aggregate abnormality indicator now reflects the technical status of entire technical system because it includes the technical status information with regards to all monitored system components.
  • the presentation of the aggregate abnormality indicator to an operator provides to the operator visual indications about the internal state prevailing in said technical system.
  • the system provides a comparison of the aggregate abnormality indicator with a predetermined abnormality threshold.
  • the abnormality threshold is chosen to ensure with a given probability (or confidence, e.g. 95%) that an aggregate abnormality indicator value being below the abnormality threshold reflects normal operation of the technical system.
  • the given probability may be defined as a target probability by the user or it may be a predefined confidence value.
  • the abnormality threshold can be determined by using a cumulative distribution function of the aggregate abnormality indicator during normal operation of the technical system as know by a person skilled in the art. An abnormal technical status is determined when the aggregate abnormality indicator exceeds the abnormality threshold.
  • the aggregate abnormality indicator AAI provides simplified technical status information for the entire system which can easily be processed by the operator. For example, the moment the AAI exceeds the abnormality threshold in a respective graphical visualization the operator is alerted that the technical system shows abnormal behavior.
  • the AAI is a trigger for the operator to perform a more thorough system analysis to identify the root cause of the abnormal behavior.
  • the trigger point in the AAI curve is typically reached even before an alarm is triggered by the alarm management system as alarms triggers typically depend on patterns in the signal behavior which can easily extend over a longer time period.
  • the AAI does not need any pattern recognition but simply looks at the aggregate indicator for all signals. As a consequence, no high performance hardware and complex models for pattern recognition are not required since the claimed approach is a purely data driven approach which can readily be used for technical systems in plants without the need for adapting the hardware or OPC Alarm and Events (A&E) server.
  • A&E OPC Alarm and Events
  • a steady-state detection algorithm can be used to determine whether the technical system operates in a steady-state process. If the process is not in a steady state the AAI computation can be suppressed. This optional switching function saves computing resources for periods where a meaningful AAI computation is not possible.
  • Steady-state detection algorithms are well known in the art and disclosed in numerous papers, such as for example,“An efficient method for on-line identification of steady state” by Cao, S., & Rhinehart, R. R.,1995, in Journal of Process Control, 5(6), 363-374.
  • the AAI can be interpreted as a trigger function for the operator to perform a root cause analysis for the technical system.
  • the disclosed method can also support the operator in this task.
  • the computer system further provides to the operator a subset of the univariate distances at the respective sampling time points wherein the subset relates to such univariate distances with the highest contributions to the augmentation of the aggregate abnormality indicator.
  • the size of this subset may be configurable by the operator. For example, the operator may define 5 or 10 to configure the computer system, to show, a drill down option for the AAI, the top 5 or the top 10 univariate distances. As a result, the operator immediately can see which signals - and therefore which system components - are primarily responsible for the All increase beyond the abnormality threshold.
  • a component hierarchy of the technical system may define a plurality of functional blocks of the technical system.
  • the functional blocks can be represented by child nodes of the technical system in the component hierarchy.
  • Each functional block can again include a plurality of child nodes including further functional blocks and/or system components. That is the hierarchy can describe multiple levels of functional blocks (nested functional blocks).
  • the computer system can now compute aggregate block abnormality indicators (BAI) for the respective functional blocks at every sampling time point. The computation for a particular functional block is thereby based on a subset of univariate distances associated with the particular functional block (at the respective sampling time points).
  • BAI aggregate block abnormality indicators
  • the computed block abnormality indicator(s) reflect the technical status of the functional blocks of the technical system.
  • the computer system can now also provide a comparison of the block abnormality indicator (BAI) with a predetermined block abnormality threshold to the operator.
  • the block abnormality threshold is chosen to ensure with a given probability that an aggregate block abnormality indicator value below the block abnormality threshold reflects normal operation of the particular functional block.
  • the operator can quickly drill down to respective functional blocks of the technical system (e.g., boiler, pump, turbine, or area of the process) when the AAI exceeds the abnormality threshold and identify the functional blocks which contribute most to the abnormality. Similar as for the univariate distances the computer system can provide a ranking list with the functional blocks contributing most to the abnormal behavior. Of course, for each BAI a further drill down is possible to the respective univariate distances. Through this option, the operator can quickly identify the system components of the functional block which cause the malfunctioning of the entire system.
  • the technical system e.g., boiler, pump, turbine, or area of the process
  • a particular technical status parameter may be represented by multiple sensor signals providing redundant information in specifying the particular technical status.
  • the computation of a univariate distance for said technical status parameter can be performed in a way which is robust against failure of a sensor providing redundant information.
  • robustness against failure means that the failure of a single sensor does not significantly affect the reliability of the technical status parameter which is reflected by the corresponding univariate distance. This is achieved by aggregating the univariate distances associated with the multiple sensor signals to provide a robust univariate distance for the particular technical status parameter. Even if one of the signals disappears (e.g., because the battery or a data communication link of the sensor fails) the robust univariate distance still provides meaningful information about the normal/abnormal behavior of the respective system component.
  • a computer program product for determining an abnormal technical status of a technical system.
  • the program comprises instructions that, when loaded into a memory of a computer system and being executed by at least one processor of the computer system, cause the computer system to perform the method steps as disclosed herein.
  • the computer system for executing said computer program can be described by functional modules which are configured to execute said method steps at system runtime.
  • the computer system has an interface to receive, from the technical system, a plurality of signals wherein each signal is sampled over time and reflects the technical status of at least one system component. Further, via the interface, the computer system retrieves, from an alarm management system associated with the technical system, high alarm thresholds and low alarm thresholds associated with respective received signals. Signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective at least one system
  • the computer system has a data processor to compute for each signal with associated alarm thresholds, at every sampling time point, a univariate distance to its associated alarm thresholds as the maximum of the simple distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for the respective at least one system component; and to compute, at every sampling time point, based on the univariate distances at the respective sampling time points, an aggregate abnormality indicator reflecting the technical status of the entire technical system.
  • the term “at every sampling time point”, as used herein, refers to each sampling time point which is used for said computational steps. That is, in cases with high sampling frequencies, it may be sufficient to perform the computational steps only for every second, third, etc. sampling time point. The skilled person will understand that it is not necessary to use each physical sampling time point under any circumstances.
  • a user interface of the computer system provides a comparison of the aggregate abnormality indicator with a predetermined abnormality threshold to an operator.
  • the abnormality threshold ensures with a given probability (confidence) that an aggregate abnormality indicator value, when being below the abnormality threshold, reflects normal operation of the technical system. In other words, the technical system transitions into an abnormal technical status when the aggregate abnormality indicator exceeds the abnormality threshold.
  • the computer system further includes a computation switch with a steady-state detection algorithm (SDA) configured to determine whether the technical system operates in a steady-state process, and to suppress subsequent computation steps when the process is not in a steady state.
  • SDA steady-state detection algorithm
  • the computer system has a component hierarchy of the technical system.
  • the hierarchy defines a plurality of functional blocks as child nodes of the technical system with each functional block comprising a plurality of child nodes comprising further functional blocks and/or system components.
  • the processor of the computer system can compute, at every sampling time point (i.e., the sampling time points used for the computations), based on a subset of univariate distances associated with a particular functional block, at the respective sampling time points, an aggregate block abnormality indicator BAI for the particular functional block wherein the block abnormality indicator reflects the technical status of the functional block.
  • the user interface can provide, to the operator, a comparison of the BAI with a predetermined block abnormality threshold.
  • the block abnormality threshold ensures with a given probability that an aggregate block abnormality indicator value below is the block abnormality threshold, reflects normal operation of the particular functional block.
  • the user interface further provides to the operator a subset of the univariate distances at the respective sampling time points wherein the subset relates to such distances with the highest contributions to the augmentation of the aggregate abnormality indicator or a respective block abnormality indicator.
  • the subset has a size which is configurable (e.g., by the operator) or predefined.
  • FIG. 1 includes a block diagram of a computer system for determining an abnormal technical status of a technical system according to an embodiment
  • FIG. 2 is a simplified flow chart of a computer-implemented method for determining an abnormal technical status of a technical system according to an embodiment
  • FIG. 3A illustrates univariate distances for example signals reflecting the technical status system components of the technical system
  • FIG. 3B shows an aggregate abnormality indicator for the technical system as computed according to an embodiment
  • FIG. 3C illustrates types of cumulative distribution functions which can be used for determining abnormality thresholds according to an embodiment
  • FIG. 3D shows univariate distances for a subset of signals with high contributions to the aggregate abnormality indicator according to an embodiment
  • FIG. 4 illustrates an example of a component hierarchy of the technical system including functional blocks
  • FIGs. 5A to 5C illustrate a real-world example scenario for which an aggregate abnormality indicator is determined
  • FIG. 6 is a diagram that shows an example of a generic computer device and a generic mobile computer device, which may be used with the techniques described herein.
  • FIG. 1 is a block diagram of an example embodiment of a computer system 100 for determining an abnormal technical status of a technical system 200 according to an embodiment.
  • the computer system 100 and the technical system 200 are communicatively coupled and the computer system 100 is configured to monitor the technical status of the technical system 200.
  • the technical system 200 can be a process plant, a power plant or any other equipment to execute an industrial process.
  • the industrial processes in the plant e.g., chemical, oil refineries, paper and pulp factories, etc.
  • SCADA supervisory control and data acquisition
  • the computer system 100 has an interface 1 10 to receive from the technical system 200 a plurality of signals S1 to Sn. Each signal is sampled over time and reflects the technical status of at least one system component. For example, a temperature signal may reflect the technical status of a motor component by indicating the temperature of the motor (a too high temperature can be an indicator for overheating). At the same time, a further signal, such as a vibration sensor signal may also provide technical status information about the motor as too high vibrations may indicate a problem with the bearings of the motor.
  • a person skilled in the art knows which types of sensors are suitable in a technical system to monitor the technical status of respective components or functional blocks of the technical system.
  • a functional block can include multiple system components which together perform a certain function (e.g., cleaning of a gas).
  • FIG. 2 is a simplified flow chart of a computer-implemented method 1000 for determining an abnormal technical status of the technical system 200.
  • the computer system 100 can execute the method when loading a computer program into a memory of the computer system 100 wherein the computer program has computer-readable instructions that, when loaded and being executed by at least one processor of the computer system 100, cause the computer system to perform the steps of the method 1000.
  • FIG. 1 the computer system 100 of FIG. 1 is disclosed in the context of the flow chart of FIG. 2. For this reason, the following description uses reference numbers referring to FIG. 1 and FIG. 2. Optional components of the computer system 100 and optional method steps are illustrated by dashed lines in the respective figures.
  • the computer system 100 can use any appropriate protocol standard for process automation protocols.
  • any appropriate protocol standard for process automation protocols For example, a person skilled in the art may select an appropriate protocol from the protocol standards listed in the Wikipedia list of automation protocols available at: https://en.wikipedia.org/wiki/List_of_automation_protocols.
  • the computer system 100 is communicatively coupled with an alarm management system 300 associated with the technical system 200.
  • the alarm management system 300 can also be an integral part of the computer system 100, or it may be running on a remote computer which is accessible by the computer system 100 through a respective network.
  • the alarm management system 300 stores or determines high alarm thresholds H1 to Hn and low alarm thresholds L1 to Ln associated with respective signals S1 to Sn of the technical system 200. Thereby, signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective system component which is monitored by said particular signal.
  • Alarm management is typically used in a process manufacturing environment that is controlled by an operator using a supervisory control system, such as a DCS, a SCADA or a programmable logic controllers (PLC).
  • a supervisory control system such as a DCS, a SCADA or a programmable logic controllers (PLC).
  • Such a system may have hundreds of individual alarms that often are designed with only limited consideration of other alarms in the system. Since humans can only do one thing at a time and can pay attention to a limited number of things at a time, there needs to be a way to ensure that alarms are presented at a rate that can be assimilated by a human operator, particularly when the plant is upset or in an unusual condition.
  • alarms should be capable of directing the operator's attention to the most important problem that he or she needs to act upon, using a priority to indicate degree of importance or rank, for instance.
  • alarm management systems include all the knowledge of alarm situations for the associated technical system (reflected by the low/high alarm thresholds) the
  • the computer system 100 can retrieve 1050 the high alarm thresholds H1 to Hn and low alarm thresholds L1 to Ln associated with respective signals S1 to Sn of the technical system 200 from the alarm management system 300 and use such data for the following data processing steps to determine an indicator reflecting the technical status of entire technical system 200 based on the received signal data and alarm thresholds.
  • This indicator will be referred to as aggregate abnormality indicator AAI of the technical system 200.
  • the computer system can perform update retrieval steps 1200 to accommodate for changes in the alarm management system during the operation of the technical system.
  • update retrievals 1200 may be limited to alarm thresholds associated with signals which are actually monitored via the computer system 100.
  • the computer system 100 has a data processor 120 with various modules for performing data processing task with respect to the received input data (signals S1 to Sn and high/low alarm threshold pairs (H1/L1 to Hn/Ln).
  • each signal S1 to Sn has an associated alarm threshold pair.
  • the aggregate abnormality indicator is computed with alarms that have an associated limit like, for example, absolute alarms, deviation alarms, rate of change alarms as defined by the standard NAMUR NA 102 for the application of alarm management.
  • a version dated 02.10.2018 of the NA 102 specification can be obtained at https://www.namur.net/de/emptationitch-u-arbeitsblaetter/aberichte- nena.html.
  • a univariate distance module 121 of the data processor computes 1300 at every sampling time point a univariate distance (e.g., dS1 (t)) to the alarm thresholds associated with the respective signal.
  • the univariate distance is determined as the maximum of the distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for system component(s) associated with the respective signal.
  • exponential smoothing may be used in accordance with formulas F3, F4a to F4c.
  • the computed univariate distances are then provided as input to an abnormality indicator module 122 of the data processor.
  • Module 122 computes 1400, at every sampling time point, based on the univariate distances at the respective sampling time points, the aggregate abnormality indicator AAI reflecting the technical status of the entire technical system 200.
  • the aggregate abnormality indicator at a particular sampling time point may be computed as the Euclidian distance based on the univariate distances of the respective signals and the total number of signals in accordance with formula F5.
  • each univariate distance contribution is weighted with a weighting factor corresponding to the severity of an alarm associated with the respective signal as defined in the alarm management system.
  • alarms for signals whose associated components may have a lower impact on the overall technical performance of the technical system 200 may contribute less to the aggregate abnormality indicator.
  • the computer system 200 further has a user interface (Ul) component 130.
  • the Ul 130 can be implemented as any kind of human machine interface (HMI) which allows an operator 10 of the technical system to communicate with the computer system 200.
  • the Ul 130 can include respective input/output means including but not limited to audio-visual means including display/sound output means to convey information to the user and data input means (e.g., keyboard, mouse, touch screen, etc.) to receive input data from the user.
  • the Ul 130 provides 1500 a comparison of the aggregate abnormality indicator AAI with a predetermined abnormality threshold to the operator 10.
  • the abnormality threshold ensures with a given probability that an aggregate abnormality indicator value, when being below the abnormality threshold, reflects normal operation of the technical system 200.
  • the abnormality threshold is determined by using a cumulative distribution function of the aggregate abnormality indicator AAI during normal operation of the technical system 200.
  • the computational tasks in steps 1300, 1400 and 1500 of the method 1000 are discussed in more details with the description of FIGs. 3A to 3C.
  • the data processor 120 has a computation switch 123.
  • the computation switch is implemented as a steady-state detection algorithm SDA which can determine 1250 whether the technical system 200 operates in a steady-state process or not. If the technical system is not in a steady state (“no”) the computer system does not perform any of the computational tasks of steps 1300, 1400, 1500. Otherwise (“yes”) the method 1000 continues with step 1300. For said computational tasks it is advantageous that the process run by the technical system is in steady-state. Therefore, the computation switch 121 can switch off the computation of all indices (univariate distances and aggregate abnormality indicator) during transient stages.
  • a well-known steady-detection algorithm can be used to identify when the computation of the indices should be turned on again (e.g., Cao, S., & Rhinehart, R. R. (1995). An efficient method for on-line identification of steady state. Journal of Process Control, 5(6), 363-374).
  • the computer system 100 can access a component hierarchy of the technical system 200.
  • a component hierarchy may either be stored by the computer system itself or it may be provided by the technical system or its associated automation system.
  • the component hierarchy defines a plurality of functional blocks as child nodes of the technical system.
  • Each functional block can include a plurality of child nodes which may either be further functional blocks and/or system components of the technical system.
  • a functional block is used to group multiple system components together which can be associated with the same function of the technical system.
  • Such functional blocks are sometimes also referred to as process blocks (e.g., boiler, pump, turbine, or area of the process). Details of the component hierarchy are discussed in the context of FIG. 4.
  • the data processor 120 is further configured to compute 1450 aggregate block abnormality indicator(s) BAI at every sampling time point.
  • the block abnormality indicator(s) BAI reflects the technical status of respective functional block(s). Based on a subset of univariate distances associated with a particular functional block (at the respective sampling time points) a corresponding aggregate block abnormality indicator BAI is computed for the particular functional block. The computation is performed in a similar manner as the computation of the AAI but only for the subset of univariate distances associated with the particular functional block. Further, the user interface 130 provides 1550, to the operator, a comparison of the particular block abnormality indicator BAI with a predetermined block abnormality threshold.
  • the block abnormality threshold ensures with a given probability that an aggregate block abnormality indicator value, when being below the block abnormality threshold, reflects normal operation of the particular functional block.
  • the operator can drill down from the original AAI to the BAIs of functional blocks of the technical system. This allows the operator to perform a root cause analysis at the level of functional blocks of the technical system and to quickly identify the functional block(s) which contribute most to an abnormal situation of the technical system as a whole as identified by the AAI.
  • a drilldown function to the level of system components is enabled.
  • the Ul 130 further provides 1600 a subset TOPm of the univariate distances at the respective sampling time points to the operator.
  • the subset TOPm relates to such distances with the highest contributions to the augmentation of the aggregate abnormality indicator with the size m of the subset TOPm being predefined.
  • a drill down to the component level is enabled. For example, the operator may set the size m so that he receives an amount of technical status information which can still be handled with his cognitive capabilities. Different operators may select different sizes.
  • the computer system may set a default value which can be chosen as the average size used by all users of the computer system. Based on the technical status information conveyed to the operator 10 through the AAI (and the optional drill down information about BAIs and/or system components) the operator can initiate a corrective action 20 in response to the determined abnormality indicator(s). As a consequence, the computer system assist the operator in performing the technical task of monitoring the technical system and interact with the technical system when required.
  • a particular technical status parameter such as the status of a chemical reactor
  • the sensors provide redundant information in specifying the particular technical status of the reactor. Nonetheless, each of the temperature signals indicates normal or abnormal operation of the reactor.
  • the data processor may aggregate the univariate distances associated with the multiple sensor signals to provide a robust univariate distance for the particular technical status parameter.
  • the univariate distances corresponding to the temperature signals of the respective temperature sensors can be aggregated. If one of the sensors fails, there is still a meaningful distance value available which characterizes the technical status of the reactor.
  • a“two over three” vote can be used to get the actual reactor temperature in the case of one sensor failure.
  • the sensor redundancy may be used, for example, with a first sensor used by the control system and a second sensor used by a safety system.
  • FIG. 3A illustrates univariate distances d1 to d34 for real world example signals reflecting the technical status system components of a technical system. Some of the signals show an abnormal behavior at certain points in time which is reflected by a raise of the respective univariate distances (e.g., d3, d4, d13, d15, d20, d21 , etc.) to the upper
  • FIG. 3B shows a view 360 with the aggregate abnormality indicator AAI for the technical system which is provided to the operator of the technical system.
  • the view 360 further includes a visualization of the abnormality threshold AAI against which the AAI is compared.
  • the AAI is computed on the base of the univariate distances of FIG. 3A in accordance with formulas F5 of F6.
  • the abnormality threshold AAT is determined by using a cumulative distribution function of the aggregate abnormality indicator AAI during normal operation of the technical system.
  • CDF cumulative distribution function
  • FIG. 3C illustrates CDF types of cumulative distribution functions which can be used for determining abnormality thresholds. Cumulative distribution functions are explained in detail in many publications, such as for example in“Introduction to Statistical Modelling” by Annette J. Dobson, Chapman and Hall, 1983.
  • CDF type 371 shows the cumulative distribution function of a discrete probability distribution.
  • CDF type 372 shows the cumulative distribution function of a continuous probability distribution.
  • CDF type 373 shows the cumulative distribution function of a distribution which has both a continuous part and a discrete part.
  • a person skilled in the art is able to select the appropriate CDF type for determining the abnormality threshold. In many cases CDF type 372 is appropriate.
  • FIG. 3D shows univariate distances d20, d21 , d25, d32, d33 for a subset of signals with high contributions to the aggregate abnormality indicator.
  • the subset TOPm includes the top 5 distances amongst the univariate distances of FIG. 3A.
  • the subset allows the operator to immediately drill down to the most relevant signals contributing to the abnormal system behavior indicated by the AAI when exceeding the abnormality threshold AAT in FIG. 3B. Therefore, the operator can focus on the potential root causes of the abnormal system behavior right away by focusing on status parameters which a potential high relevance for the abnormal behavior.
  • FIG. 4 illustrates an example of a component hierarchy 400 of the technical system 200 including functional blocks 210, 220, 230.
  • the technical system 200 typically includes a substantial number of system components which are monitored by respective sensor signals.
  • Hierarchy 400 only shows a simplified view on technical system 200 with the system components 211 , 212, 221 , 231 , 232, 233 which are supposed to be representative of hundreds or even thousands of components of a real-world technical process system.
  • Each system component is associated with a respective univariate distance d21 1 , d212, d221 , d231 , d232, d233 reflecting the technical status of the component.
  • components 212, 213 are grouped into the functional block 210 for which the aggregate block abnormality indicator BAH is computed based on the subset of univariate distances d21 1 , d212.
  • the functional block 210 may be an additive supply for a reactor which includes a tank 21 1 monitored via a level meter for which the univariate distance d21 1 is determined, and further includes a pump 212 monitored via a flow meter for which the univariate distance d212 is determined. The overall technical status of block 210 is then reflected by BAI1.
  • Functional blocks may also include functional sub-blocks as shown in the example of functional block 220 which includes functional block 230 one level down in the hierarchy 400.
  • the functional block 220 may reflect a reactor function of the technical system 200 which includes the functional block 230 representing the reactor itself and a component 221 representing a peripheral component (e.g., an output valve) of the reactor function.
  • a chemical reactor may include components such as valves, tanks, heaters, pumps, coolers, sensors, security devices such as emergency cut-off switches, and others.
  • the technical status of the valve 221 may be monitored by a respective flow meter for which the univariate distance d232 is determined.
  • the technical status of the reactor 230 may be characterized by the filling level, temperature, and pressure in the reactor.
  • a corresponding level meter 231 , temperature meter 232 and pressure meter 233 are system components which are grouped into the reactor functional block 230.
  • the associated univariate distances d231 , d232 and d233 are aggregated into the respective aggregate block abnormality indicator BAI3 reflecting the overall technical status of the reactor.
  • BAI3 is then aggregated with d232 into aggregate block abnormality indicator BAI2 which reflects the technical status of the overall reactor function 220 including peripheral components.
  • aggregate block abnormality indicators associated with functional blocks of a component hierarchy 400 of the technical system 200 allows the operator to quickly drill down to a more granular view of the technical system and identify potential functions causing an abnormal behavior of the technical system. Similar as to the TOPm view of univariate distances in FIG. 3D, the user interface for the operator may also present such a top ranking list of aggregate functional block indicators allowing the operator to quickly identify functions to be analyzed in detail because of the abnormal behavior contributions reflected by the associated BAIs.
  • FIGs. 5A to 5C illustrate a real-world example scenario (including two reactor tanks) for which an aggregate abnormality indicator is determined.
  • Process alarms are a known methodology to indicate a required action to the operator. For example, when a tank level reaches a certain limit, a high alarm is raised that indicates that the tank reached a high level.
  • the affected equipment typically is sending an alarm and a message text which is shown to the operator in an alarm list. The operator can then act accordingly and, for example, open a valve and start a pump to decrease the level inside the tank.
  • an alarm appears it is usually visualized in an alarm list which includes technical names of the respective component signals like shown in table 1.
  • the alarm is also visualized in the human machine interface directly at the device.
  • the operator can now react on those alarms.
  • it is very difficult to perform any root cause analysis on this type of alarm information because often an alarm is followed by several consequential alarms.
  • a plurality of system components to control the two reactors raised alarms.
  • the operator is overwhelmed by the large number of process alarms (alarm floods) and is not able to decide on which alarm to react.
  • the operator needs therefore a compact visualization of technical status information indicating the current process state and allowing to track the process state over time.
  • FIG. 5A shows a (simplified) part of a technical process system 500 which has two connected reactor tanks R1 , R2.
  • a pump P can supply liquid to the tanks.
  • the inflow of the tanks is controlled by valves VA and VB.
  • Each reactor tank has a level meter L1 , L2 to control the fill level of the respective tank R1 , R2.
  • the outflow of the tanks is controlled by valves Vc and VD in combination with the pumps Pc and PD.
  • associated alarm visualization AVc , AVD, APC ,and APD may be available at the respective system components.
  • the level meter values may be visualized over time as a chart over time with a low level indicator LL (e.g., 5% of the tank level) and an upper level indicator UL (e.g., 95% of tank level) as boundaries of the normal operating range.
  • LL low level indicator
  • UL upper level indicator
  • the LL boundary may correspond to the low alarm threshold in the alarm
  • management system of system 500 and UL may correspond to the high alarm threshold.
  • On an actual (real-world) operator screen typically only the current value of the monitored technical parameter is displayed.
  • the operator usually opens another page of the monitoring application. Therefore, the visualization of the time trend of the level meters L1 , L2 in FIGs. 5A to 5C illustrates the concept of the visualization.
  • the data showing the time trend is typically retrieved in a multi- step interaction between the operator and the HMI.
  • the reactors R1 , R2 may be connected to further pipes with further inflow valves (e.g., for adding additives to the liquid stored in the tanks).
  • Further system components like temperature or pressure sensors for characterizing the technical status of the tanks are not shown in this figure.
  • a person skilled in the art will understand that a real-world process system includes many more system components.
  • the simplified example of FIG. 5A is sufficient.
  • the actual fill levels raise over time and move above the average level indicated by the horizontal average line between UL and LL approaching the upper limit UL.
  • the computer system can now determine the univariate distances for the level meter signals L1 , L2 and compute the AAI for the overall process system 500.
  • the result can be visualized via a human machine interface HMI to the operator. The operator immediately sees that at time ti the AAI exceeds the AAT threshold indicating an abnormal system behavior.
  • FIG. 5B illustrates that, for both reactors R1 and R2, the traditional alarm threshold for the respective level meter signals is exceeded at time points t1’, t1” later than t1.
  • the traditional alarm management raising alarms when signals exceed the high/low alarm thresholds indicates the abnormal situation in the system earliest at time point t1’ which occurs after time point t1.
  • the aggregate alarm indicator AAI raises the alert to the operator at an earlier point in time than individual alarms at the signal level.
  • the operator is informed“early enough” (i.e. before an alarm flood is generated by the control system) that the process is evolving towards an abnormal situation. The operator can take anticipated action on the process to avoid the process to reach an abnormal situation.
  • FIG.s 5B, 5C do not show univariate distances for the level meter parameters but show the signal values SR1 , SR2 in comparison to the high alarm thresholds HTR1 , HTR2. The respective univariate distances are then computed based on these values.
  • the process variables SR1 , SR2 can exceed their alarm threshold HTR1 , HTR2 (i.e., reach a level above upper limit).
  • the corresponding univariate distances d(t) are bounded by 1 ).
  • the aggregate abnormality indicator is bounded VN with N being the number of process variables. This
  • FIG. 5C illustrates a situation where a drill down of the AAI in FIG. 3A facilitates root cause analysis for the operator.
  • a drill down of the AAI in FIG. 3A facilitates root cause analysis for the operator.
  • L1 of reactor R1 shows an abnormal behavior whereas L2 of R2 stays completely within the normal range.
  • the operator can focus and react on the subset of process variables that are the most related to the deviation of the process abnormality indicator above its admissible limit.
  • FIG. 6 is a diagram that shows an example of a generic computer device 900 and a generic mobile computer device 950, which may be used with the techniques described here.
  • computing device 900 may relate to the system 100 (cf. FIG. 1 ).
  • Computing device 950 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices.
  • the computing device 950 may allow a human user to interact with the device 900.
  • the entire system 100 may be implemented on the mobile device 950.
  • the components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
  • Computing device 900 includes a processor 902, memory 904, a storage device 906, a high-speed interface 908 connecting to memory 904 and high-speed expansion ports 910, and a low speed interface 912 connecting to low speed bus 914 and storage device 906.
  • processor 902 memory 904
  • storage device 906 storage device 906
  • high-speed interface 908 connecting to memory 904 and high-speed expansion ports 910
  • low speed interface 912 connecting to low speed bus 914 and storage device 906.
  • Each of the components 902, 904, 906, 908, 910, and 912 are interconnected using various busses, and may be mounted on a common motherboard or in other manners as
  • the processor 902 can process instructions for execution within the computing device 900, including instructions stored in the memory 904 or on the storage device 906 to display graphical information for a GUI on an external input/output device, such as display 916 coupled to high speed interface 908.
  • an external input/output device such as display 916 coupled to high speed interface 908.
  • multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory.
  • multiple computing devices 900 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
  • the memory 904 stores information within the computing device 900.
  • the memory 904 is a volatile memory unit or units. In another
  • the memory 904 is a non-volatile memory unit or units.
  • the memory 904 may also be another form of computer-readable medium, such as a magnetic or optical disk.
  • the storage device 906 is capable of providing mass storage for the computing device 900.
  • the storage device 906 may be or contain a computer- readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations.
  • a computer program product can be tangibly embodied in an information carrier.
  • the computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 904, the storage device 906, or memory on processor 902.
  • the high speed controller 908 manages bandwidth-intensive operations for the computing device 900, while the low speed controller 912 manages lower bandwidth- intensive operations. Such allocation of functions is exemplary only.
  • the high-speed controller 908 is coupled to memory 904, display 916 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 910, which may accept various expansion cards (not shown).
  • low-speed controller 912 is coupled to storage device 906 and low-speed expansion port 914.
  • the low-speed expansion port which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • input/output devices such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
  • the computing device 900 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 920, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 924. In addition, it may be implemented in a personal computer such as a laptop computer 922. Alternatively, components from computing device 900 may be combined with other components in a mobile device (not shown), such as device 950. Each of such devices may contain one or more of computing device 900, 950, and an entire system may be made up of multiple computing devices 900, 950 communicating with each other.
  • Computing device 950 includes a processor 952, memory 964, an input/output device such as a display 954, a communication interface 966, and a transceiver 968, among other components.
  • the device 950 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage.
  • a storage device such as a microdrive or other device, to provide additional storage.
  • Each of the components 950, 952, 964, 954, 966, and 968 are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
  • the processor 952 can execute instructions within the computing device 950, including instructions stored in the memory 964.
  • the processor may be implemented as a chipset of chips that include separate and multiple analog and digital processors.
  • the processor may provide, for example, for coordination of the other components of the device 950, such as control of user interfaces, applications run by device 950, and wireless communication by device 950.
  • Processor 952 may communicate with a user through control interface 958 and display interface 956 coupled to a display 954.
  • the display 954 may be, for example, a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology.
  • the display interface 956 may comprise appropriate circuitry for driving the display 954 to present graphical and other information to a user.
  • the control interface 958 may receive commands from a user and convert them for submission to the processor 952.
  • an external interface 962 may be provide in communication with processor 952, so as to enable near area communication of device 950 with other devices. External interface 962 may provide, for example, for wired
  • the memory 964 stores information within the computing device 950.
  • the memory 964 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units.
  • Expansion memory 984 may also be provided and connected to device 950 through expansion interface 982, which may include, for example, a SIMM (Single In Line Memory Module) card interface.
  • SIMM Single In Line Memory Module
  • expansion memory 984 may provide extra storage space for device 950, or may also store applications or other information for device 950.
  • expansion memory 984 may include instructions to carry out or supplement the processes described above, and may include secure information also.
  • expansion memory 984 may act as a security module for device 950, and may be programmed with instructions that permit secure use of device 950.
  • secure applications may be provided via the SIMM cards, along with additional information, such as placing the identifying information on the SIMM card in a non-hackable manner.
  • the memory may include, for example, flash memory and/or NVRAM memory, as discussed below.
  • a computer program product is tangibly embodied in an information carrier.
  • the computer program product contains instructions that, when executed, perform one or more methods, such as those described above.
  • the information carrier is a computer- or machine-readable medium, such as the memory 964, expansion memory 984, or memory on processor 952, that may be received, for example, over transceiver 968 or external interface 962.
  • Device 950 may communicate wirelessly through communication interface 966, which may include digital signal processing circuitry where necessary.
  • Communication interface 966 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur, for example, through radio-frequency transceiver 968. In addition, short-range communication may occur, such as using a
  • GPS Global Positioning System
  • Device 950 may also communicate audibly using audio codec 960, which may receive spoken information from a user and convert it to usable digital information. Audio codec 960 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of device 950. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 950.
  • Audio codec 960 may receive spoken information from a user and convert it to usable digital information. Audio codec 960 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of device 950. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 950.
  • the computing device 950 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 980. It may also be implemented as part of a smart phone 982, personal digital assistant, or other similar mobile device.
  • implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • a programmable processor which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
  • machine-readable medium refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • the systems and techniques described here can be implemented in a computing device that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
  • LAN local area network
  • WAN wide area network
  • the Internet the global information network
  • the computing device can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Abstract

System (100), methods and computer program products are provided for determining an abnormal technical status of a technical system (200). The computer system (100) receives, from the technical system (200), a plurality of signals wherein each signal (S1 to Sn) reflects the technical status of at least one system component. The system further retrieves, from an alarm management system (300), high alarm thresholds (H1 to Hn) and low alarm thresholds (L1 to Ln) associated with respective received signals (S1 to Sn). Signal values in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective system component. For each signal (S1) a univariate distance to its associated alarm thresholds (H1/L1) is computed to quantify a degree of abnormality for the respective system component.Based on the univariate distances an aggregate abnormality indicator (AAI) is computed which reflects the technical status of the entire technical system (200). A comparison of the aggregate abnormality indicator (AAI) with a predetermined abnormality threshold (AAT) is provided to an operator (10).

Description

System and methods monitoring the technical status of technical equipment Technical Field
[001] The present invention generally relates to the monitoring of technical equipment and more particularly to alarm tools to support operators of technical equipment in controlling the equipment to avoid malfunctioning.
Background
[002] Many technical systems, such as for example the technical equipment in automation systems, can generate alarms to indicate to an operator a need to interact with the technical equipment in order to take corresponding action in response to the generated alarm. Alarm, as used herein and as defined in the technical standard IEC 62682 section 3.1.7, is an audible and/or visible means of indicating to the operator of an equipment an equipment malfunction, process deviation, or abnormal condition requiring a timely response (see also International Society of Automation ISA-18.2). An instance of a particular alarm is referred to as an alarm activation.
[003] In real world situations, often a series of alarm activations are generated which depend on a single root cause where actually a single alarm would be sufficient to indicate the problem in the technical system. Such series of alarm activations are usually referred to as alarm floods. Alarm flood situations are characterized by a combination of a plurality of alarm activations which occur repeatedly. In other words, the same or similar combinations of alarms typically appear in multiple alarm floods. In general, permanent high alarm rates indicate bad alarm quality. Good alarm quality is achieved when:
- each alarm alerts, informs and guides,
- alarms are presented at a rate that operators can deal with, and
- detectable problems are alarmed as early as possible.
[004] There are different approaches for monitoring large and complex industrial systems to detect abnormal situations and to generate respective alarm notifications to the operator(s). For example statistical data-driven methods for (multivariate) process monitoring such as PCA and PLS (cf. "Multivariate statistical monitoring of process operating performance" by Kresta, Macgregor, & Marlin, 1991 in The Canadian Journal of Chemical Engineering, 69(1 ), 35-47) are using statistical analysis applied to actual measurements or technical status data. Alternatively, intelligent visualization approaches such as parallel coordinate transformation combined with convex hulls calculated for each pair of variables (cf. cf. Multi-Variable Operations; US patent application US20080234840A1 ; Brooks et al.) allows displaying ranges of the process variables in parallel coordinates as a pair of linear curves between corresponding parallel axes. However, such statistical or mathematical analysis rely solely on historical values of the process variables and do not take into account any process knowledge of the monitored processes and therefore suffers from high numbers of false positives in the detected alarms because it does not become clear what actually is an abnormal situation.
[005] A certain deviation of a technical status parameter may be identified by statistical monitoring for triggering an alarm notification although the deviation may still be seen as being within the normal operation of the respective equipment.
[006] As a consequence, it is difficult for operators to retrieve reliable abnormality information regarding the overall technical status of the monitored technical equipment from said alarm notifications merely based on such statistical analysis.
Summary
[007] There is therefore a need to improve alarm detection for operators in that the operator can quickly determine the overall technical status of the monitored equipment so that the number of false positives is reduced and the operator is enabled to take appropriate corrective action if required.
[008] The method, computer program product and computer system according to the independent claims disclose embodiments of a technical solution to the above problem.
[009] In one embodiment, a computer-implemented method is provided for determining an abnormal technical status of a technical system. In another embodiment, a computer system is configured to execute said method by executing a corresponding computer program which includes program instructions that cause the computer system to execute corresponding method steps when loading the computer program into a memory of the computer system and processing the instructions with one or more processors.
[0010] The computer system receives a plurality of signals from a technical system. Each signal is sampled over time (using the same sampling frequency), or is resampled in a preprocessing step in order to ensure the availability of either a measured or of an estimated value of the plurality of signals at each instance of the computation) and reflects the technical status of at least one system component of the technical system. That is, each signal relates to one system component but a particular system component can be monitored by an operator via multiple signals. Typically, the technical system is monitored by one or more human operators. The entirety of all signals reflects the overall technical status of the entire technical system. However, a human operator cannot derive the information about the overall technical status of the technical system from single signals at the sensor level because there is no possibility for a human being to make sense out of the plurality of signals received in real-time from the sensors.
[0011] The computer system assists the operator in this monitoring task by deriving from the received sensor signals a single aggregate abnormality indicator reflecting the technical status of the entire system.
[0012] An alarm management system is associated with the technical system. The alarm management system stores information in relation to alarms which are associated with the signals. An alarm management system is a system for prioritizing, grouping and classifying alerts and event notifications used in supervisory control and data acquisition (SCADA)to improve the provisioning of technical status information to an operator. Most often the major problem is that there are too many alarms annunciated in a plant upset, commonly referred to as alarm flood as explained above. However, there can also be other problems with an alarm system such as poorly designed alarms, improperly set alarm points, ineffective annunciation, unclear alarm messages, etc. Poor alarm management is one of the leading causes of unplanned downtime and of major industrial incidents. The alarm management system stores high alarm thresholds and low alarm thresholds associated with respective received signals. Signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective at least one system component. In other words, the alarm thresholds for a particular signal are based on historic knowledge of normal operation and abnormal system behavior. The alarm thresholds reflect critical values beyond which the respective signal value is not perceived anymore as being within the normal operation range. The alarm management system typically raises an alarm per signal when the signal value is exceeding any of the corresponding alarm thresholds. As many technical status parameters are correlated, this typically results in the so-called alarm floods overwhelming the operator with information which cannot be resolved by the operator.
[0013] The alarm management system can be an integral part of the computer system or it can be a remote system which is communicatively coupled with the computer system so that the computer system can access the data available in the alarm management system. The computer system retrieves the high alarm thresholds and low alarm thresholds associated with the respective received signals from the alarm management system via an appropriate interface. The retrieval of the alarm threshold values may, for example, occur as a kind of initialization step for the computer system. That is, before the computer system starts any computations, it may retrieve all available alarm thresholds from the alarm management system. The retrieval may be repeated at regular update intervals to take into account changes in the alarm management system. For example, an update retrieval may only retrieve alarm thresholds for signals which are actually monitored via the computer system.
[0014] The computer system has a data processor which is configured to perform the computing tasks as described in the following. Firstly, the data processor computes, at every sampling time point, for each signal with associated alarm thresholds, a univariate distance to its associated alarm thresholds. In general, a univariate distance is the (simple) distance between the values of a single variable j for two observations i and I. In the present application, a univariate distance is the maximum of the distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for the respective at least one system component. The univariate distance d(t) for a particular signal at sampling time point t can be expressed by the following mathematical formula:
Figure imgf000005_0001
where x(t) is the sample of the signal at time t, xh is the high alarm threshold associated with the signal as defined in the alarm management system, xl is the low alarm threshold associated with the signal, and a is the normal value of the variable (xt < a < xh ).
[0015] For example, a can be chosen as Xfl^+Xl by default but other values between xl and xh can be chosen for example by estimating the normal operating value based on normal operation data.
[0016] In one embodiment, the univariate distance for a particular signal at a particular sampling time point can be computed as a piecewise linear index so that
d (t) e ]0,1[ when x(t) £ ]c0 ch[ (F2a)
d(t ) = 1 when x(t) > xh or when x(t) < xl (F2b)
d (t) = 0 when x(t) = a (F2c)
[0017] In other words, the distance value is between 0 and 1 if the sampled signal value is between the low alarm threshold and the high alarm threshold (F2a); the distance value is 1 if the sampled signal value is less than or equal to the low alarm threshold, or greater than or equal to the high alarm threshold (F2b); and the distance value is 0 if the sampled signal value corresponds to a predefined parameter value reflecting normal operation (F2c).
[0018] In an alternative embodiment, the univariate distance d(t) can be computed as a smoothened index instead of the piecewise linear computation above. [0019] For example, d(t) can be computed using exponential smoothing as: (typically a = 2) (F3)
Figure imgf000006_0001
[0020] For example, a = 2 relates to parabolic smoothing and a = 3 relates to hyperbolic smoothing.
[0021] Further, a real signal is noisy in that its“normal” value is fluctuating around this value “a” with a Gaussian distribution. Therefore, the computation of the univariate distances can be further improved by introducing an interval defining a“normal range” [%, a2\ of the signal, with the upper interval limit a2 being less than the respective high alarm threshold xh and the lower interval limit % being greater than the respective low alarm threshold x . Such an interval is used as a deadband for the normal range % and a2 (xi <
Figure imgf000006_0002
< a2 < xh ). A deadband (sometimes called a neutral zone or dead zone) is a band of input values in the domain of a transfer function in a control system or signal processing system where the output is zero (the output is 'dead' - no action occurs). Deadband regions can be used in control systems such as servo-amplifiers to prevent oscillation or repeated activation- deactivation cycles.
[0022] With such a deadband the univariate distance d(t) for a particular signal can be computed as the following index:
d(t) = O for (t) e [a , a2\ (F4a)
Figure imgf000006_0003
In other words, the distance value is 0 if the sampled signal value is inside the deadband interval for a particular signal at a particular sampling time point (F4a); the distance value is
^0G s'9na' values below the lower interval limit of the normal range (F4b), and the
Figure imgf000006_0004
distance value is signal values above the upper interval limit of the normal
Figure imgf000006_0005
range (F4c).
[0023] Once the univariate distances are determined by the data processor, a further computing step is executed. At every sampling time point the computer system now computes, based on the univariate distances at the respective sampling time points, an aggregate abnormality indicator reflecting the technical status of the entire technical system. [0024] In one embodiment, the aggregate abnormality indicator is computed as the Euclidian distance D(t) based on the univariate distances of the respective signals and the total number of signals:
(F5).
Figure imgf000007_0001
In an alternative embodiment, the aggregate abnormality indicator is computed as a weighted Euclidian distance Dw(t) based on the univariate distances of the respective signals and the total number of signals wherein each univariate distance contribution is weighted with a weighting factor corresponding to the severity of an alarm associated with the respective signal as defined in the alarm management system:
Figure imgf000007_0002
where dJt) corresponds to the univariate distance of signal i, and N the total number of received signals.
[0025] The aggregate abnormality indicator now reflects the technical status of entire technical system because it includes the technical status information with regards to all monitored system components. In other words, the presentation of the aggregate abnormality indicator to an operator provides to the operator visual indications about the internal state prevailing in said technical system. To enable the operator to quickly recognize abnormal system behavior and take corrective action the system provides a comparison of the aggregate abnormality indicator with a predetermined abnormality threshold. The abnormality threshold is chosen to ensure with a given probability (or confidence, e.g. 95%) that an aggregate abnormality indicator value being below the abnormality threshold reflects normal operation of the technical system. The given probability may be defined as a target probability by the user or it may be a predefined confidence value. For example, the abnormality threshold can be determined by using a cumulative distribution function of the aggregate abnormality indicator during normal operation of the technical system as know by a person skilled in the art. An abnormal technical status is determined when the aggregate abnormality indicator exceeds the abnormality threshold.
[0026] The aggregate abnormality indicator AAI provides simplified technical status information for the entire system which can easily be processed by the operator. For example, the moment the AAI exceeds the abnormality threshold in a respective graphical visualization the operator is alerted that the technical system shows abnormal behavior. In other words the AAI is a trigger for the operator to perform a more thorough system analysis to identify the root cause of the abnormal behavior. The trigger point in the AAI curve is typically reached even before an alarm is triggered by the alarm management system as alarms triggers typically depend on patterns in the signal behavior which can easily extend over a longer time period. The AAI does not need any pattern recognition but simply looks at the aggregate indicator for all signals. As a consequence, no high performance hardware and complex models for pattern recognition are not required since the claimed approach is a purely data driven approach which can readily be used for technical systems in plants without the need for adapting the hardware or OPC Alarm and Events (A&E) server.
[0027] For applying the method for determining an abnormal technical status of the technical system it is advantageous when the technical system is running in a steady-state. Therefore, prior to the computing steps for AAI computation, a steady-state detection algorithm can be used to determine whether the technical system operates in a steady-state process. If the process is not in a steady state the AAI computation can be suppressed. This optional switching function saves computing resources for periods where a meaningful AAI computation is not possible. Steady-state detection algorithms are well known in the art and disclosed in numerous papers, such as for example,“An efficient method for on-line identification of steady state” by Cao, S., & Rhinehart, R. R.,1995, in Journal of Process Control, 5(6), 363-374.
[0028] As mentioned earlier, the AAI can be interpreted as a trigger function for the operator to perform a root cause analysis for the technical system. The disclosed method can also support the operator in this task. In one embodiment, the computer system further provides to the operator a subset of the univariate distances at the respective sampling time points wherein the subset relates to such univariate distances with the highest contributions to the augmentation of the aggregate abnormality indicator. The size of this subset may be configurable by the operator. For example, the operator may define 5 or 10 to configure the computer system, to show, a drill down option for the AAI, the top 5 or the top 10 univariate distances. As a result, the operator immediately can see which signals - and therefore which system components - are primarily responsible for the All increase beyond the abnormality threshold.
[0029] In a further alternative embodiment, the support for root cause analysis is further improved. A component hierarchy of the technical system may define a plurality of functional blocks of the technical system. The functional blocks can be represented by child nodes of the technical system in the component hierarchy. Each functional block can again include a plurality of child nodes including further functional blocks and/or system components. That is the hierarchy can describe multiple levels of functional blocks (nested functional blocks). The computer system can now compute aggregate block abnormality indicators (BAI) for the respective functional blocks at every sampling time point. The computation for a particular functional block is thereby based on a subset of univariate distances associated with the particular functional block (at the respective sampling time points). The computed block abnormality indicator(s) (BAI) reflect the technical status of the functional blocks of the technical system. The computer system can now also provide a comparison of the block abnormality indicator (BAI) with a predetermined block abnormality threshold to the operator. Also the block abnormality threshold is chosen to ensure with a given probability that an aggregate block abnormality indicator value below the block abnormality threshold reflects normal operation of the particular functional block. By using such BAI in addition to the AAI and the univariate distances the operator receives simplified technical status parameters for each functional block defined in the component hierarchy. That is, the operator can quickly drill down to respective functional blocks of the technical system (e.g., boiler, pump, turbine, or area of the process) when the AAI exceeds the abnormality threshold and identify the functional blocks which contribute most to the abnormality. Similar as for the univariate distances the computer system can provide a ranking list with the functional blocks contributing most to the abnormal behavior. Of course, for each BAI a further drill down is possible to the respective univariate distances. Through this option, the operator can quickly identify the system components of the functional block which cause the malfunctioning of the entire system.
[0030] In one embodiment, a particular technical status parameter may be represented by multiple sensor signals providing redundant information in specifying the particular technical status. In such a scenario the computation of a univariate distance for said technical status parameter can be performed in a way which is robust against failure of a sensor providing redundant information. In other words, robustness against failure means that the failure of a single sensor does not significantly affect the reliability of the technical status parameter which is reflected by the corresponding univariate distance. This is achieved by aggregating the univariate distances associated with the multiple sensor signals to provide a robust univariate distance for the particular technical status parameter. Even if one of the signals disappears (e.g., because the battery or a data communication link of the sensor fails) the robust univariate distance still provides meaningful information about the normal/abnormal behavior of the respective system component.
[0031] In one embodiment, a computer program product is provided for determining an abnormal technical status of a technical system. The program comprises instructions that, when loaded into a memory of a computer system and being executed by at least one processor of the computer system, cause the computer system to perform the method steps as disclosed herein.
[0032] The computer system for executing said computer program can be described by functional modules which are configured to execute said method steps at system runtime. The computer system has an interface to receive, from the technical system, a plurality of signals wherein each signal is sampled over time and reflects the technical status of at least one system component. Further, via the interface, the computer system retrieves, from an alarm management system associated with the technical system, high alarm thresholds and low alarm thresholds associated with respective received signals. Signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective at least one system
component.
[0033] Further, the computer system has a data processor to compute for each signal with associated alarm thresholds, at every sampling time point, a univariate distance to its associated alarm thresholds as the maximum of the simple distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for the respective at least one system component; and to compute, at every sampling time point, based on the univariate distances at the respective sampling time points, an aggregate abnormality indicator reflecting the technical status of the entire technical system. The term “at every sampling time point”, as used herein, refers to each sampling time point which is used for said computational steps. That is, in cases with high sampling frequencies, it may be sufficient to perform the computational steps only for every second, third, etc. sampling time point. The skilled person will understand that it is not necessary to use each physical sampling time point under any circumstances.
[0034] A user interface of the computer system provides a comparison of the aggregate abnormality indicator with a predetermined abnormality threshold to an operator. The abnormality threshold ensures with a given probability (confidence) that an aggregate abnormality indicator value, when being below the abnormality threshold, reflects normal operation of the technical system. In other words, the technical system transitions into an abnormal technical status when the aggregate abnormality indicator exceeds the abnormality threshold.
[0035] In one embodiment, the computer system further includes a computation switch with a steady-state detection algorithm (SDA) configured to determine whether the technical system operates in a steady-state process, and to suppress subsequent computation steps when the process is not in a steady state.
[0036] In one embodiment, the computer system has a component hierarchy of the technical system. The hierarchy defines a plurality of functional blocks as child nodes of the technical system with each functional block comprising a plurality of child nodes comprising further functional blocks and/or system components. The processor of the computer system can compute, at every sampling time point (i.e., the sampling time points used for the computations), based on a subset of univariate distances associated with a particular functional block, at the respective sampling time points, an aggregate block abnormality indicator BAI for the particular functional block wherein the block abnormality indicator reflects the technical status of the functional block. The user interface can provide, to the operator, a comparison of the BAI with a predetermined block abnormality threshold. The block abnormality threshold ensures with a given probability that an aggregate block abnormality indicator value below is the block abnormality threshold, reflects normal operation of the particular functional block.
[0037] In one embodiment, the user interface further provides to the operator a subset of the univariate distances at the respective sampling time points wherein the subset relates to such distances with the highest contributions to the augmentation of the aggregate abnormality indicator or a respective block abnormality indicator. The subset has a size which is configurable (e.g., by the operator) or predefined.
[0038] Further aspects of the invention will be realized and attained by means of the elements and combinations particularly depicted in the appended claims. It is to be understood that both, the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as described.
Short description of the figures
[0039]
FIG. 1 includes a block diagram of a computer system for determining an abnormal technical status of a technical system according to an embodiment;
FIG. 2 is a simplified flow chart of a computer-implemented method for determining an abnormal technical status of a technical system according to an embodiment;
FIG. 3A illustrates univariate distances for example signals reflecting the technical status system components of the technical system;
FIG. 3B shows an aggregate abnormality indicator for the technical system as computed according to an embodiment;
FIG. 3C illustrates types of cumulative distribution functions which can be used for determining abnormality thresholds according to an embodiment;
FIG. 3D shows univariate distances for a subset of signals with high contributions to the aggregate abnormality indicator according to an embodiment;
FIG. 4 illustrates an example of a component hierarchy of the technical system including functional blocks; FIGs. 5A to 5C illustrate a real-world example scenario for which an aggregate abnormality indicator is determined;
FIG. 6 is a diagram that shows an example of a generic computer device and a generic mobile computer device, which may be used with the techniques described herein.
Detailed description
[0040] FIG. 1 is a block diagram of an example embodiment of a computer system 100 for determining an abnormal technical status of a technical system 200 according to an embodiment. The computer system 100 and the technical system 200 are communicatively coupled and the computer system 100 is configured to monitor the technical status of the technical system 200. For example, the technical system 200 can be a process plant, a power plant or any other equipment to execute an industrial process. Typically, the industrial processes in the plant (e.g., chemical, oil refineries, paper and pulp factories, etc.) are controlled by an automation system which uses a network to interconnect sensors, controllers, operator terminals and actuators. Such automation systems often use a control system architecture called supervisory control and data acquisition (SCADA). The computer system 100 has an interface 1 10 to receive from the technical system 200 a plurality of signals S1 to Sn. Each signal is sampled over time and reflects the technical status of at least one system component. For example, a temperature signal may reflect the technical status of a motor component by indicating the temperature of the motor (a too high temperature can be an indicator for overheating). At the same time, a further signal, such as a vibration sensor signal may also provide technical status information about the motor as too high vibrations may indicate a problem with the bearings of the motor. A person skilled in the art knows which types of sensors are suitable in a technical system to monitor the technical status of respective components or functional blocks of the technical system. A functional block can include multiple system components which together perform a certain function (e.g., cleaning of a gas).
[0041] FIG. 2 is a simplified flow chart of a computer-implemented method 1000 for determining an abnormal technical status of the technical system 200. The computer system 100 can execute the method when loading a computer program into a memory of the computer system 100 wherein the computer program has computer-readable instructions that, when loaded and being executed by at least one processor of the computer system 100, cause the computer system to perform the steps of the method 1000.
[0042] In the following, the computer system 100 of FIG. 1 is disclosed in the context of the flow chart of FIG. 2. For this reason, the following description uses reference numbers referring to FIG. 1 and FIG. 2. Optional components of the computer system 100 and optional method steps are illustrated by dashed lines in the respective figures.
[0043] To receive 1100 the sensor data S1 to Sn from the technical system 200 via interface 1 10 the computer system 100 can use any appropriate protocol standard for process automation protocols. For example, a person skilled in the art may select an appropriate protocol from the protocol standards listed in the Wikipedia list of automation protocols available at: https://en.wikipedia.org/wiki/List_of_automation_protocols.
[0044] In addition, the computer system 100 is communicatively coupled with an alarm management system 300 associated with the technical system 200. The alarm management system 300 can also be an integral part of the computer system 100, or it may be running on a remote computer which is accessible by the computer system 100 through a respective network. The alarm management system 300 stores or determines high alarm thresholds H1 to Hn and low alarm thresholds L1 to Ln associated with respective signals S1 to Sn of the technical system 200. Thereby, signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective system component which is monitored by said particular signal. Alarm management is typically used in a process manufacturing environment that is controlled by an operator using a supervisory control system, such as a DCS, a SCADA or a programmable logic controllers (PLC). Such a system may have hundreds of individual alarms that often are designed with only limited consideration of other alarms in the system. Since humans can only do one thing at a time and can pay attention to a limited number of things at a time, there needs to be a way to ensure that alarms are presented at a rate that can be assimilated by a human operator, particularly when the plant is upset or in an unusual condition. Advantageously, alarms should be capable of directing the operator's attention to the most important problem that he or she needs to act upon, using a priority to indicate degree of importance or rank, for instance. However, although alarm management systems include all the knowledge of alarm situations for the associated technical system (reflected by the low/high alarm thresholds) the systems do not provide aggregate indicators which reflect the overall technical status of the entire plant. Still, the information about the alarm
thresholds is valuable in this context because it includes the knowledge about the entire alarm history of the technical system. In an initialization step, the computer system 100 can retrieve 1050 the high alarm thresholds H1 to Hn and low alarm thresholds L1 to Ln associated with respective signals S1 to Sn of the technical system 200 from the alarm management system 300 and use such data for the following data processing steps to determine an indicator reflecting the technical status of entire technical system 200 based on the received signal data and alarm thresholds. This indicator will be referred to as aggregate abnormality indicator AAI of the technical system 200. Optionally, the computer system can perform update retrieval steps 1200 to accommodate for changes in the alarm management system during the operation of the technical system. Such update retrievals 1200 may be limited to alarm thresholds associated with signals which are actually monitored via the computer system 100.
[0045] The computer system 100 has a data processor 120 with various modules for performing data processing task with respect to the received input data (signals S1 to Sn and high/low alarm threshold pairs (H1/L1 to Hn/Ln). In the example, each signal S1 to Sn has an associated alarm threshold pair. In a real technical system, there may be signals with no associated alarm threshold pairs. Such signals can be ignored by the data processor when performing the following computations. The aggregate abnormality indicator is computed with alarms that have an associated limit like, for example, absolute alarms, deviation alarms, rate of change alarms as defined by the standard NAMUR NA 102 for the application of alarm management. A version dated 02.10.2018 of the NA 102 specification can be obtained at https://www.namur.net/de/empfehlungen-u-arbeitsblaetter/aktuelle- nena.html.
[0046] For each signal (e.g., signal S1 ) with an associated alarm threshold pair (e.g., H1/L1 ), a univariate distance module 121 of the data processor computes 1300 at every sampling time point a univariate distance (e.g., dS1 (t)) to the alarm thresholds associated with the respective signal. The univariate distance is determined as the maximum of the distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for system component(s) associated with the respective signal.
Thereby, a computation may be used in accordance with formulas F1 , F2a to F2c.
Alternatively, exponential smoothing may be used in accordance with formulas F3, F4a to F4c. The computed univariate distances are then provided as input to an abnormality indicator module 122 of the data processor.
[0047] Module 122 computes 1400, at every sampling time point, based on the univariate distances at the respective sampling time points, the aggregate abnormality indicator AAI reflecting the technical status of the entire technical system 200. For example, the aggregate abnormality indicator at a particular sampling time point may be computed as the Euclidian distance based on the univariate distances of the respective signals and the total number of signals in accordance with formula F5.
[0048] Alternatively, it may be computed as a weighted Euclidian distance based on the univariate distances of the respective signals and the total number of signals in accordance with formula F6. Thereby, each univariate distance contribution is weighted with a weighting factor corresponding to the severity of an alarm associated with the respective signal as defined in the alarm management system. In other words, alarms for signals whose associated components may have a lower impact on the overall technical performance of the technical system 200 may contribute less to the aggregate abnormality indicator.
[0049] The computer system 200 further has a user interface (Ul) component 130. The Ul 130 can be implemented as any kind of human machine interface (HMI) which allows an operator 10 of the technical system to communicate with the computer system 200. The Ul 130 can include respective input/output means including but not limited to audio-visual means including display/sound output means to convey information to the user and data input means (e.g., keyboard, mouse, touch screen, etc.) to receive input data from the user. The Ul 130 provides 1500 a comparison of the aggregate abnormality indicator AAI with a predetermined abnormality threshold to the operator 10. The abnormality threshold ensures with a given probability that an aggregate abnormality indicator value, when being below the abnormality threshold, reflects normal operation of the technical system 200. In other words, when the aggregate abnormality indicator value is less than the abnormality threshold then there is a given probability (e.g., with a confidence of 0.95) that the technical system 200 is in normal operation. By using a corresponding abnormality threshold this probability can even become higher (e.g. 0.99). Advantageously, the abnormality threshold is determined by using a cumulative distribution function of the aggregate abnormality indicator AAI during normal operation of the technical system 200. The computational tasks in steps 1300, 1400 and 1500 of the method 1000 are discussed in more details with the description of FIGs. 3A to 3C.
[0050] In an optional embodiment, the data processor 120 has a computation switch 123.
The computation switch is implemented as a steady-state detection algorithm SDA which can determine 1250 whether the technical system 200 operates in a steady-state process or not. If the technical system is not in a steady state (“no”) the computer system does not perform any of the computational tasks of steps 1300, 1400, 1500. Otherwise (“yes”) the method 1000 continues with step 1300. For said computational tasks it is advantageous that the process run by the technical system is in steady-state. Therefore, the computation switch 121 can switch off the computation of all indices (univariate distances and aggregate abnormality indicator) during transient stages. For example, a well-known steady-detection algorithm can be used to identify when the computation of the indices should be turned on again (e.g., Cao, S., & Rhinehart, R. R. (1995). An efficient method for on-line identification of steady state. Journal of Process Control, 5(6), 363-374).
[0051] In a further optional embodiment, the computer system 100 can access a component hierarchy of the technical system 200. Such a component hierarchy may either be stored by the computer system itself or it may be provided by the technical system or its associated automation system. The component hierarchy defines a plurality of functional blocks as child nodes of the technical system. Each functional block can include a plurality of child nodes which may either be further functional blocks and/or system components of the technical system. In other words, a functional block is used to group multiple system components together which can be associated with the same function of the technical system. Such functional blocks are sometimes also referred to as process blocks (e.g., boiler, pump, turbine, or area of the process). Details of the component hierarchy are discussed in the context of FIG. 4.
[0052] In this optional embodiment, the data processor 120 is further configured to compute 1450 aggregate block abnormality indicator(s) BAI at every sampling time point. The block abnormality indicator(s) BAI reflects the technical status of respective functional block(s). Based on a subset of univariate distances associated with a particular functional block (at the respective sampling time points) a corresponding aggregate block abnormality indicator BAI is computed for the particular functional block. The computation is performed in a similar manner as the computation of the AAI but only for the subset of univariate distances associated with the particular functional block. Further, the user interface 130 provides 1550, to the operator, a comparison of the particular block abnormality indicator BAI with a predetermined block abnormality threshold. Similar as for the AAI comparison, the block abnormality threshold ensures with a given probability that an aggregate block abnormality indicator value, when being below the block abnormality threshold, reflects normal operation of the particular functional block. In this embodiment, the operator can drill down from the original AAI to the BAIs of functional blocks of the technical system. This allows the operator to perform a root cause analysis at the level of functional blocks of the technical system and to quickly identify the functional block(s) which contribute most to an abnormal situation of the technical system as a whole as identified by the AAI.
[0053] In a further optional embodiment, a drilldown function to the level of system components is enabled. In this embodiment, the Ul 130 further provides 1600 a subset TOPm of the univariate distances at the respective sampling time points to the operator. Thereby, the subset TOPm relates to such distances with the highest contributions to the augmentation of the aggregate abnormality indicator with the size m of the subset TOPm being predefined. As each univariate distance is directly associated with a signal which again is associated with a system component, a drill down to the component level is enabled. For example, the operator may set the size m so that he receives an amount of technical status information which can still be handled with his cognitive capabilities. Different operators may select different sizes. The computer system may set a default value which can be chosen as the average size used by all users of the computer system. Based on the technical status information conveyed to the operator 10 through the AAI (and the optional drill down information about BAIs and/or system components) the operator can initiate a corrective action 20 in response to the determined abnormality indicator(s). As a consequence, the computer system assist the operator in performing the technical task of monitoring the technical system and interact with the technical system when required.
[0054] In a further optional embodiment, as elaborated earlier, a particular technical status parameter, such as the status of a chemical reactor, may be represented by multiple sensor signals such, as for example temperatures measured by a plurality of temperature sensors. The sensors provide redundant information in specifying the particular technical status of the reactor. Nonetheless, each of the temperature signals indicates normal or abnormal operation of the reactor. The data processor may aggregate the univariate distances associated with the multiple sensor signals to provide a robust univariate distance for the particular technical status parameter. In the reactor example, the univariate distances corresponding to the temperature signals of the respective temperature sensors can be aggregated. If one of the sensors fails, there is still a meaningful distance value available which characterizes the technical status of the reactor. For example, a“two over three” vote can be used to get the actual reactor temperature in the case of one sensor failure. In other cases, the sensor redundancy may be used, for example, with a first sensor used by the control system and a second sensor used by a safety system.
[0055] FIG. 3A illustrates univariate distances d1 to d34 for real world example signals reflecting the technical status system components of a technical system. Some of the signals show an abnormal behavior at certain points in time which is reflected by a raise of the respective univariate distances (e.g., d3, d4, d13, d15, d20, d21 , etc.) to the upper
(abnormality) limit of the univariate distance range. Some signals (e.g., d5 to d10) show no raise of the univariate distance at all. Some signals (e.g., d18, d19) show an intermediate raise of the univariate distances which normalizes again without reaching the upper limit.
[0056] FIG. 3B shows a view 360 with the aggregate abnormality indicator AAI for the technical system which is provided to the operator of the technical system. The view 360 further includes a visualization of the abnormality threshold AAI against which the AAI is compared. The AAI is computed on the base of the univariate distances of FIG. 3A in accordance with formulas F5 of F6. The abnormality threshold AAT is predetermined so that an aggregate abnormality indicator value, when being below the abnormality threshold AAT, reflects normal operation of the technical system with a given probability p (e.g., p=0.95). Advantageously, the abnormality threshold AAT is determined by using a cumulative distribution function of the aggregate abnormality indicator AAI during normal operation of the technical system. In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X1 , evaluated at x, is the probability that X1 will take a value less than or equal to x. In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x.
[0057] FIG. 3C illustrates CDF types of cumulative distribution functions which can be used for determining abnormality thresholds. Cumulative distribution functions are explained in detail in many publications, such as for example in“Introduction to Statistical Modelling” by Annette J. Dobson, Chapman and Hall, 1983. CDF type 371 shows the cumulative distribution function of a discrete probability distribution. CDF type 372 shows the cumulative distribution function of a continuous probability distribution. CDF type 373 shows the cumulative distribution function of a distribution which has both a continuous part and a discrete part. A person skilled in the art is able to select the appropriate CDF type for determining the abnormality threshold. In many cases CDF type 372 is appropriate.
[0058] FIG. 3D shows univariate distances d20, d21 , d25, d32, d33 for a subset of signals with high contributions to the aggregate abnormality indicator. In the example, the subset TOPm includes the top 5 distances amongst the univariate distances of FIG. 3A. The subset TOPm includes the predefined number m of univariate distances (in the example: m=5) making the highest contributions to the augmentation of the aggregate abnormality indicator AAI in FIG. 3B. The subset allows the operator to immediately drill down to the most relevant signals contributing to the abnormal system behavior indicated by the AAI when exceeding the abnormality threshold AAT in FIG. 3B. Therefore, the operator can focus on the potential root causes of the abnormal system behavior right away by focusing on status parameters which a potential high relevance for the abnormal behavior.
[0059] FIG. 4 illustrates an example of a component hierarchy 400 of the technical system 200 including functional blocks 210, 220, 230. As described in detail above, the technical status of the technical system 200 is reflected by the associated AAI. The technical system 200 typically includes a substantial number of system components which are monitored by respective sensor signals. Hierarchy 400 only shows a simplified view on technical system 200 with the system components 211 , 212, 221 , 231 , 232, 233 which are supposed to be representative of hundreds or even thousands of components of a real-world technical process system. Each system component is associated with a respective univariate distance d21 1 , d212, d221 , d231 , d232, d233 reflecting the technical status of the component.
Typically, certain functions of the technical system 200 are performed by sub-sets of components acting together to perform a respective function. In the example hierarchy 400, components 212, 213 are grouped into the functional block 210 for which the aggregate block abnormality indicator BAH is computed based on the subset of univariate distances d21 1 , d212. For example, the functional block 210 may be an additive supply for a reactor which includes a tank 21 1 monitored via a level meter for which the univariate distance d21 1 is determined, and further includes a pump 212 monitored via a flow meter for which the univariate distance d212 is determined. The overall technical status of block 210 is then reflected by BAI1. Functional blocks may also include functional sub-blocks as shown in the example of functional block 220 which includes functional block 230 one level down in the hierarchy 400. For example, the functional block 220 may reflect a reactor function of the technical system 200 which includes the functional block 230 representing the reactor itself and a component 221 representing a peripheral component (e.g., an output valve) of the reactor function. For example, a chemical reactor may include components such as valves, tanks, heaters, pumps, coolers, sensors, security devices such as emergency cut-off switches, and others. The technical status of the valve 221 may be monitored by a respective flow meter for which the univariate distance d232 is determined. The technical status of the reactor 230 may be characterized by the filling level, temperature, and pressure in the reactor. A corresponding level meter 231 , temperature meter 232 and pressure meter 233 are system components which are grouped into the reactor functional block 230. The associated univariate distances d231 , d232 and d233 are aggregated into the respective aggregate block abnormality indicator BAI3 reflecting the overall technical status of the reactor. BAI3 is then aggregated with d232 into aggregate block abnormality indicator BAI2 which reflects the technical status of the overall reactor function 220 including peripheral components.
[0060] The use of aggregate block abnormality indicators associated with functional blocks of a component hierarchy 400 of the technical system 200 allows the operator to quickly drill down to a more granular view of the technical system and identify potential functions causing an abnormal behavior of the technical system. Similar as to the TOPm view of univariate distances in FIG. 3D, the user interface for the operator may also present such a top ranking list of aggregate functional block indicators allowing the operator to quickly identify functions to be analyzed in detail because of the abnormal behavior contributions reflected by the associated BAIs.
[0061] FIGs. 5A to 5C illustrate a real-world example scenario (including two reactor tanks) for which an aggregate abnormality indicator is determined. Process alarms are a known methodology to indicate a required action to the operator. For example, when a tank level reaches a certain limit, a high alarm is raised that indicates that the tank reached a high level. The affected equipment (system component(s)) typically is sending an alarm and a message text which is shown to the operator in an alarm list. The operator can then act accordingly and, for example, open a valve and start a pump to decrease the level inside the tank. When an alarm appears, it is usually visualized in an alarm list which includes technical names of the respective component signals like shown in table 1.
Table 1 : traditional alarm list example
Figure imgf000020_0001
Additionally, in some cases the alarm is also visualized in the human machine interface directly at the device. The operator can now react on those alarms. However, it is very difficult to perform any root cause analysis on this type of alarm information because often an alarm is followed by several consequential alarms. In the example in table 1 , a plurality of system components to control the two reactors raised alarms. The operator is overwhelmed by the large number of process alarms (alarm floods) and is not able to decide on which alarm to react. The operator needs therefore a compact visualization of technical status information indicating the current process state and allowing to track the process state over time.
[0062] FIG. 5A shows a (simplified) part of a technical process system 500 which has two connected reactor tanks R1 , R2. A pump P can supply liquid to the tanks. The inflow of the tanks is controlled by valves VA and VB. There can be associated alarm visualizations AP, AVA and AVB implemented directly on the respective devices. Each reactor tank has a level meter L1 , L2 to control the fill level of the respective tank R1 , R2. The outflow of the tanks is controlled by valves Vc and VD in combination with the pumps Pc and PD. Again, associated alarm visualization AVc, AVD, APC ,and APD may be available at the respective system components. For the tanks R1 , R2 the level meter values may be visualized over time as a chart over time with a low level indicator LL (e.g., 5% of the tank level) and an upper level indicator UL (e.g., 95% of tank level) as boundaries of the normal operating range. For example, the LL boundary may correspond to the low alarm threshold in the alarm
management system of system 500 and UL may correspond to the high alarm threshold. On an actual (real-world) operator screen, typically only the current value of the monitored technical parameter is displayed. To get the time trend of a process variable, the operator usually opens another page of the monitoring application. Therefore, the visualization of the time trend of the level meters L1 , L2 in FIGs. 5A to 5C illustrates the concept of the visualization. In a real system, the data showing the time trend is typically retrieved in a multi- step interaction between the operator and the HMI.
[0063] As the figure is simplified, in reality, the reactors R1 , R2 may be connected to further pipes with further inflow valves (e.g., for adding additives to the liquid stored in the tanks). Further system components like temperature or pressure sensors for characterizing the technical status of the tanks are not shown in this figure. However, a person skilled in the art will understand that a real-world process system includes many more system components. However, for explaining the inventive concept, the simplified example of FIG. 5A is sufficient.
[0064] For both reactors, the actual fill levels raise over time and move above the average level indicated by the horizontal average line between UL and LL approaching the upper limit UL. The computer system can now determine the univariate distances for the level meter signals L1 , L2 and compute the AAI for the overall process system 500. The result can be visualized via a human machine interface HMI to the operator. The operator immediately sees that at time ti the AAI exceeds the AAT threshold indicating an abnormal system behavior.
[0065] FIG. 5B illustrates that, for both reactors R1 and R2, the traditional alarm threshold for the respective level meter signals is exceeded at time points t1’, t1” later than t1. In other words, the traditional alarm management raising alarms when signals exceed the high/low alarm thresholds indicates the abnormal situation in the system earliest at time point t1’ which occurs after time point t1. That is, the aggregate alarm indicator AAI raises the alert to the operator at an earlier point in time than individual alarms at the signal level. In this case, the operator is informed“early enough” (i.e. before an alarm flood is generated by the control system) that the process is evolving towards an abnormal situation. The operator can take anticipated action on the process to avoid the process to reach an abnormal situation. This can be advantageous in case an immediate shut down of some equipment is required to avoid damage to certain system components. It is to be noted that FIG.s 5B, 5C do not show univariate distances for the level meter parameters but show the signal values SR1 , SR2 in comparison to the high alarm thresholds HTR1 , HTR2. The respective univariate distances are then computed based on these values. The process variables SR1 , SR2 can exceed their alarm threshold HTR1 , HTR2 (i.e., reach a level above upper limit). The corresponding univariate distances d(t) are bounded by 1 ). The aggregate abnormality indicator is bounded VN with N being the number of process variables. This
Figure imgf000021_0001
value can be normalized, i.e. divided by N to have a bound of 1 for D(t). [0066] FIG. 5C illustrates a situation where a drill down of the AAI in FIG. 3A facilitates root cause analysis for the operator. In this example, only the level meter L1 of reactor R1 shows an abnormal behavior whereas L2 of R2 stays completely within the normal range. The operator can focus and react on the subset of process variables that are the most related to the deviation of the process abnormality indicator above its admissible limit.
[0067] FIG. 6 is a diagram that shows an example of a generic computer device 900 and a generic mobile computer device 950, which may be used with the techniques described here. In some embodiments, computing device 900 may relate to the system 100 (cf. FIG. 1 ). Computing device 950 is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, and other similar computing devices. In the context of this disclosure the computing device 950 may allow a human user to interact with the device 900. In other embodiments, the entire system 100 may be implemented on the mobile device 950. The components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.
[0068] Computing device 900 includes a processor 902, memory 904, a storage device 906, a high-speed interface 908 connecting to memory 904 and high-speed expansion ports 910, and a low speed interface 912 connecting to low speed bus 914 and storage device 906. Each of the components 902, 904, 906, 908, 910, and 912, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as
appropriate. The processor 902 can process instructions for execution within the computing device 900, including instructions stored in the memory 904 or on the storage device 906 to display graphical information for a GUI on an external input/output device, such as display 916 coupled to high speed interface 908. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 900 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
[0069] The memory 904 stores information within the computing device 900. In one implementation, the memory 904 is a volatile memory unit or units. In another
implementation, the memory 904 is a non-volatile memory unit or units. The memory 904 may also be another form of computer-readable medium, such as a magnetic or optical disk.
[0070] The storage device 906 is capable of providing mass storage for the computing device 900. In one implementation, the storage device 906 may be or contain a computer- readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. A computer program product can be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 904, the storage device 906, or memory on processor 902.
[0071] The high speed controller 908 manages bandwidth-intensive operations for the computing device 900, while the low speed controller 912 manages lower bandwidth- intensive operations. Such allocation of functions is exemplary only. In one implementation, the high-speed controller 908 is coupled to memory 904, display 916 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 910, which may accept various expansion cards (not shown). In the implementation, low-speed controller 912 is coupled to storage device 906 and low-speed expansion port 914. The low-speed expansion port, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.
[0072] The computing device 900 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 920, or multiple times in a group of such servers. It may also be implemented as part of a rack server system 924. In addition, it may be implemented in a personal computer such as a laptop computer 922. Alternatively, components from computing device 900 may be combined with other components in a mobile device (not shown), such as device 950. Each of such devices may contain one or more of computing device 900, 950, and an entire system may be made up of multiple computing devices 900, 950 communicating with each other.
[0073] Computing device 950 includes a processor 952, memory 964, an input/output device such as a display 954, a communication interface 966, and a transceiver 968, among other components. The device 950 may also be provided with a storage device, such as a microdrive or other device, to provide additional storage. Each of the components 950, 952, 964, 954, 966, and 968, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
[0074] The processor 952 can execute instructions within the computing device 950, including instructions stored in the memory 964. The processor may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor may provide, for example, for coordination of the other components of the device 950, such as control of user interfaces, applications run by device 950, and wireless communication by device 950.
[0075] Processor 952 may communicate with a user through control interface 958 and display interface 956 coupled to a display 954. The display 954 may be, for example, a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 956 may comprise appropriate circuitry for driving the display 954 to present graphical and other information to a user. The control interface 958 may receive commands from a user and convert them for submission to the processor 952. In addition, an external interface 962 may be provide in communication with processor 952, so as to enable near area communication of device 950 with other devices. External interface 962 may provide, for example, for wired
communication in some implementations, or for wireless communication in other
implementations, and multiple interfaces may also be used.
[0076] The memory 964 stores information within the computing device 950. The memory 964 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. Expansion memory 984 may also be provided and connected to device 950 through expansion interface 982, which may include, for example, a SIMM (Single In Line Memory Module) card interface. Such expansion memory 984 may provide extra storage space for device 950, or may also store applications or other information for device 950. Specifically, expansion memory 984 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, expansion memory 984 may act as a security module for device 950, and may be programmed with instructions that permit secure use of device 950. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing the identifying information on the SIMM card in a non-hackable manner.
[0077] The memory may include, for example, flash memory and/or NVRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in an information carrier. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 964, expansion memory 984, or memory on processor 952, that may be received, for example, over transceiver 968 or external interface 962. [0078] Device 950 may communicate wirelessly through communication interface 966, which may include digital signal processing circuitry where necessary. Communication interface 966 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur, for example, through radio-frequency transceiver 968. In addition, short-range communication may occur, such as using a
Bluetooth, WiFi, or other such transceiver (not shown). In addition, GPS (Global Positioning System) receiver module 980 may provide additional navigation- and location-related wireless data to device 950, which may be used as appropriate by applications running on device 950.
[0079] Device 950 may also communicate audibly using audio codec 960, which may receive spoken information from a user and convert it to usable digital information. Audio codec 960 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of device 950. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on device 950.
[0080] The computing device 950 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 980. It may also be implemented as part of a smart phone 982, personal digital assistant, or other similar mobile device.
[0081] Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs
(application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
[0082] These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be
implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms“machine-readable medium” “computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term“machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
[0083] To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
[0084] The systems and techniques described here can be implemented in a computing device that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
[0085] The computing device can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

Claims
1. A computer-implemented method (1000) for determining an abnormal technical status of a technical system (200), comprising: receiving (1 100), from the technical system (200), a plurality of signals wherein each signal (S1 to Sn) is sampled over time and reflects the technical status of at least one system component; computing (1300) for each signal (S1 ) with associated high and low alarm thresholds obtained from an alarm management system (300), at every sampling time point, a univariate distance to its associated alarm thresholds (H1/L1 ) as the maximum of the distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for the respective at least one system component; computing (1400), at every sampling time point, based on the univariate distances at the respective sampling time points, an aggregate abnormality indicator (AAI) reflecting the technical status of the entire technical system (200); and providing (1500), to an operator (10), a comparison of the aggregate abnormality indicator (AAI) with a predetermined abnormality threshold, the abnormality threshold ensuring with a given probability that an aggregate abnormality indicator value, when being below the abnormality threshold, reflects normal operation of the technical system wherein the abnormal technical status is determined when the aggregate abnormality indicator exceeds the abnormality threshold.
2. The method of claim 1 , wherein the abnormality threshold is determined by using a cumulative distribution function of the aggregate abnormality indicator (AAI) during normal operation of the technical system (200).
3. The method of claim 1 or 2, wherein the aggregate abnormality indicator at a particular sampling time point is computed as: the Euclidian distance based on the univariate distances of the respective signals and the total number of signals, or a weighted Euclidian distance based on the univariate distances of the respective signals and the total number of signals wherein each univariate distance contribution is weighted with a weighting factor corresponding to the severity of an alarm associated with the respective signal as defined in the alarm management system.
4. The method of any of the previous claims, wherein, prior to the computing steps (1300, 1400), a steady-state detection algorithm (SDA) is used to determine (1250) whether the technical system (200) operates in a steady-state process and the computing steps (1300, 1400) are suppressed when the process is not in a steady state.
5. The method of any of the previous claims, further comprising: further providing to the operator a subset (TOPm) of the univariate distances at the respective sampling time points wherein the subset relates to such univariate distances with the highest contributions to the augmentation of the aggregate abnormality indicator, with the size m of the subset (TOPm) being predefined.
6. The method of any of the previous claims, wherein the univariate distance for a particular signal at a particular sampling time point is computed so that: the distance value is between 0 and 1 if the sampled signal value is between the low alarm threshold and the high alarm threshold; the distance value is 1 if the sampled signal value is less than or equal to the low alarm threshold, or greater than or equal to the high alarm threshold; and the distance value is 0 if the sampled signal value corresponds to a predefined parameter value reflecting normal operation.
7. The method of any of the claims 1 to 5; wherein the univariate distance for a particular signal at a particular sampling time point is smoothened by exponential smoothing.
8. The method of claim 7, wherein the univariate distance for a particular signal at a
particular sampling time point is computed by introducing an interval defining a normal range [%, a2\ of the signal, with the upper interval limit a2 being less than the respective high alarm threshold xh and the lower interval limit % being greater than the respective low alarm threshold x so that: the distance value is 0 if the sampled signal value is inside the interval; the distance value is for x(t) < a1 and the distance value is
Figure imgf000028_0001
Figure imgf000028_0002
for x(t) > a2, where a > 1.
9. The method of any of the previous claims, wherein a component hierarchy (400) of the technical system defines a plurality of functional blocks (210, 220) as child nodes of the technical system (200) with each functional block (210, 220) comprising a plurality of child nodes including further functional blocks (230) and/or system components (211 , 212, 221 ), the method further comprising: computing (1450), at every sampling time point, based on a subset of univariate distances associated with a particular functional block, at the respective sampling time points, an aggregate block abnormality indicator (BAI) for the particular functional block wherein the block abnormality indicator (BAI) reflects the technical status of the functional block; and providing (1550), to the operator, a comparison of the block abnormality indicator (BAI) with a predetermined block abnormality threshold, the block abnormality threshold ensuring with a given probability that an aggregate block abnormality indicator value, when being below the block abnormality threshold, reflects normal operation of the particular functional block.
10. The method of any of the previous claims, wherein a particular technical status
parameter is represented by multiple sensor signals providing redundant information in specifying the particular technical status, the method further comprising: aggregating the univariate distances associated with the multiple sensor signals to provide a robust univariate distance for the particular technical status parameter.
1 1. A computer program product is provided for determining an abnormal technical status of a technical system (200), the computer program product comprising instructions that, when loaded into a memory of a computer system and being executed by at least one processor of the computer system, cause the computer system to perform the method steps according to any of the claims 1 to 10.
12. A computer system (100) for determining an abnormal technical status of a technical system (200), the computer system (100) comprising: an interface (1 10) configured to receive, from the technical system (200), a plurality of signals wherein each signal (S1 to Sn) is sampled over time and reflects the technical status of at least one system component; and to retrieve, from an alarm management system (300) associated with the technical system (200), high alarm thresholds (H1 to Hn) and low alarm thresholds (L1 to Ln) associated with respective received signals (S1 to Sn), wherein signal values of a particular signal in a range between the associated high alarm threshold and the associated low alarm threshold reflect normal operation of the respective at least one system component; and a data processor (120) configured to compute for each signal (S1 ) with associated alarm thresholds, at every sampling time point, a univariate distance to its associated alarm thresholds (H1/L1 ) as the maximum of the distances between the value of the respective signal and its associated alarm thresholds to quantify a degree of abnormality for the respective at least one system component; and to compute, at every sampling time point, based on the univariate distances at the respective sampling time points, an aggregate abnormality indicator (AAI) reflecting the technical status of the entire technical system (200); and a user interface component (130) configured to provide, to an operator (10), a comparison of the aggregate abnormality indicator (AAI) with a predetermined abnormality threshold (AAT), the abnormality threshold ensuring with a given probability that an aggregate abnormality indicator value, when being below the abnormality threshold, reflects normal operation of the technical system, wherein the abnormal technical status is determined when the aggregate abnormality indicator exceeds the abnormality threshold.
13. The computer system of claim 12, with the data processor (120) further comprising: a computation switch (123) with a steady-state detection algorithm (SDA) configured to determine whether the technical system (200) operates in a steady-state process and to suppress subsequent computation steps when the process is not in a steady state.
14. The computer system of claim 12 or 13, wherein a component hierarchy of the technical system defines a plurality of functional blocks as child nodes of the technical system (200) with each functional block comprising a plurality of child nodes comprising further functional blocks and/or system components, the processor (120) further configured to compute, at every sampling time point, based on a subset of univariate distances associated with a particular functional block, at the respective sampling time points, an aggregate block abnormality indicator (BAI) for the particular functional block wherein the block abnormality indicator (BAI) reflects the technical status of the functional block; and the user interface (130) further configured to provide, to the operator, a comparison of the block abnormality indicator (BAI) with a predetermined block abnormality threshold, the block abnormality threshold ensuring with a given probability that an aggregate block abnormality indicator value, when being below the block abnormality threshold, reflects normal operation of the particular functional block.
5. The computer system of any of the claims 12 to 14, the user interface (130) further configured to: provide to the operator a subset (TOPm) of the univariate distances at the respective sampling time points wherein the subset relates to such distances with the highest contributions to the augmentation of the aggregate abnormality indicator with the size m of the subset (TOPm) being predefined.
PCT/EP2019/073957 2018-09-24 2019-09-09 System and methods monitoring the technical status of technical equipment WO2020064309A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980062659.0A CN112740133A (en) 2018-09-24 2019-09-09 System and method for monitoring the technical state of a technical installation
US17/207,854 US20210209189A1 (en) 2018-09-24 2021-03-22 System and methods monitoring the technical status of technical equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP18196241.6A EP3627263B8 (en) 2018-09-24 2018-09-24 System and methods monitoring the technical status of technical equipment
EP18196241.6 2018-09-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/207,854 Continuation US20210209189A1 (en) 2018-09-24 2021-03-22 System and methods monitoring the technical status of technical equipment

Publications (1)

Publication Number Publication Date
WO2020064309A1 true WO2020064309A1 (en) 2020-04-02

Family

ID=63683658

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/073957 WO2020064309A1 (en) 2018-09-24 2019-09-09 System and methods monitoring the technical status of technical equipment

Country Status (4)

Country Link
US (1) US20210209189A1 (en)
EP (1) EP3627263B8 (en)
CN (1) CN112740133A (en)
WO (1) WO2020064309A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116802365A (en) * 2020-12-31 2023-09-22 施耐德电子系统美国股份有限公司 System and method for providing operator variation analysis for transient operation of continuous or batch continuous processes
CN114484732B (en) * 2022-01-14 2023-06-02 南京信息工程大学 Air conditioning unit sensor fault diagnosis method based on voting network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080234840A1 (en) 2005-07-30 2008-09-25 Curvaceous Software Limited Multi-Variable Operations
EP2720100A1 (en) * 2012-10-10 2014-04-16 General Electric Company Systems and methods for comprehensive alarm management
US20150095100A1 (en) * 2013-09-30 2015-04-02 Ge Oil & Gas Esp, Inc. System and Method for Integrated Risk and Health Management of Electric Submersible Pumping Systems
US9197658B2 (en) * 2010-11-18 2015-11-24 Nant Holdings Ip, Llc Vector-based anomaly detection

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050203881A1 (en) * 2004-03-09 2005-09-15 Akio Sakamoto Database user behavior monitor system and method
US7272531B2 (en) * 2005-09-20 2007-09-18 Fisher-Rosemount Systems, Inc. Aggregation of asset use indices within a process plant
JP2010165242A (en) * 2009-01-16 2010-07-29 Hitachi Cable Ltd Method and system for detecting abnormality of mobile body
IT1397489B1 (en) * 2009-12-19 2013-01-16 Nuovo Pignone Spa METHOD AND SYSTEM FOR DIAGNOSTICATING COMPRESSORS.
US8229999B2 (en) * 2010-01-05 2012-07-24 International Business Machines Corporation Analyzing anticipated value and effort in using cloud computing to process a specified workload
CN102200769B (en) * 2011-03-30 2013-03-27 北京三博中自科技有限公司 Real-time alarm system for industrial enterprise and method thereof
CN202394083U (en) * 2011-06-14 2012-08-22 北京三博中自科技有限公司 Process industry pipe network system, monitoring system of steam system and process industry steam system
CN102231081B (en) * 2011-06-14 2012-10-24 北京三博中自科技有限公司 Energy utilization state diagnosis method for process industrial equipment
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
CN102539154A (en) * 2011-10-16 2012-07-04 浙江吉利汽车研究院有限公司 Engine fault diagnosis method and device based on exhaust noise vector quantitative analysis
US9378112B2 (en) * 2012-06-25 2016-06-28 International Business Machines Corporation Predictive alert threshold determination tool
CN104598995A (en) * 2015-01-27 2015-05-06 四川大学 Regional water resource allocation bi-level decision-making optimization method based on water right
CN205880599U (en) * 2016-05-05 2017-01-11 华电国际电力股份有限公司技术服务中心 Unit exception monitored control system
US20170331844A1 (en) * 2016-05-13 2017-11-16 Sikorsky Aircraft Corporation Systems and methods for assessing airframe health
KR101857691B1 (en) * 2016-10-17 2018-05-15 고려대학교 산학협력단 Method and appratus for detecting anomaly of vehicle based on euclidean distance measure
CN106775929B (en) * 2016-11-25 2019-11-26 中国科学院信息工程研究所 A kind of virtual platform safety monitoring method and system
CN207123598U (en) * 2016-12-02 2018-03-20 Abb瑞士股份有限公司 Configurable state monitoring apparatus
CN108445865B (en) * 2018-03-08 2021-04-30 云南电网有限责任公司电力科学研究院 Method and system for dynamic alarm of main and auxiliary equipment of thermal power generating unit
CN109213654B (en) * 2018-07-05 2023-01-03 北京奇艺世纪科技有限公司 Anomaly detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080234840A1 (en) 2005-07-30 2008-09-25 Curvaceous Software Limited Multi-Variable Operations
US9197658B2 (en) * 2010-11-18 2015-11-24 Nant Holdings Ip, Llc Vector-based anomaly detection
EP2720100A1 (en) * 2012-10-10 2014-04-16 General Electric Company Systems and methods for comprehensive alarm management
US20150095100A1 (en) * 2013-09-30 2015-04-02 Ge Oil & Gas Esp, Inc. System and Method for Integrated Risk and Health Management of Electric Submersible Pumping Systems

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANNETTE J. DOBSONCHAPMANHALL, INTRODUCTION TO STATISTICAL MODELLING, 1983
CAO, S.RHINEHART, R. R.: "An efficient method for on-line identification of steady state", JOURNAL OF PROCESS CONTROL, vol. 5, no. 6, 1995, pages 363 - 374
KRESTAMACGREGORMARLIN: "Multivariate statistical monitoring of process operating performance", THE CANADIAN JOURNAL OF CHEMICAL ENGINEERING, vol. 69, no. 1, 1991, pages 35 - 47, XP055155402, doi:10.1002/cjce.5450690105

Also Published As

Publication number Publication date
CN112740133A (en) 2021-04-30
US20210209189A1 (en) 2021-07-08
EP3627263B8 (en) 2021-11-17
EP3627263A1 (en) 2020-03-25
EP3627263B1 (en) 2021-09-01

Similar Documents

Publication Publication Date Title
JP7071034B2 (en) Adjustment of features and boundaries for threat detection in industrial asset control systems
US10192170B2 (en) System and methods for automated plant asset failure detection
US10809704B2 (en) Process performance issues and alarm notification using data analytics
US9998487B2 (en) Domain level threat detection for industrial asset control system
US10476902B2 (en) Threat detection for a fleet of industrial assets
US20180137277A1 (en) Dynamic normalization of monitoring node data for threat detection in industrial asset control system
GB2506730B (en) Systems and methods to monitor an asset in an operating process unit
US20210209189A1 (en) System and methods monitoring the technical status of technical equipment
CA3041512A1 (en) Valve service detection through data analysis
EP4042340A1 (en) Dynamic monitoring and securing of factory processes, equipment and automated systems
US20150241304A1 (en) Method for the computer-assisted monitoring of the operation of a technical system, particularly of an electrical energy-generating installation
AU2017228729A1 (en) Data reliability analysis
US11150640B2 (en) Systems and methods for managing alerts associated with devices of a process control system
US11916940B2 (en) Attack detection and localization with adaptive thresholding
US10901406B2 (en) Method of monitoring and controlling an industrial process, and a process control system
EP3404589A1 (en) Computer system and method for improved monitoring of the technical state of industrial systems
WO2021211142A1 (en) Improved pattern recognition technique for data-driven fault detection within a process plant
EP3109806A1 (en) Method for generating prognostic alerts and a human machine interface (hmi) device therefor
EP3136194B1 (en) Device system, information processor, terminal device, and abnormality determining method
US20240005287A1 (en) Apparatuses, computer-implemented methods, and computer program products for improved asset degradation monitoring and prediction
US20240119342A1 (en) General reinforcement learning framework for process monitoring and anomaly/ fault detection
US20240085878A1 (en) Dynamic Prediction of Risk Levels for Manufacturing Operations through Leading Risk Indicators: Dynamic Risk Pattern Match Method and System
US20220382245A1 (en) Artificial intelligence alarm management
Harbud Process automation systems
CN113052320A (en) Equipment safety monitoring method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19762992

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19762992

Country of ref document: EP

Kind code of ref document: A1