WO2021194631A1 - System and method for ensuring that the results of machine learning models can be audited - Google Patents
System and method for ensuring that the results of machine learning models can be audited Download PDFInfo
- Publication number
- WO2021194631A1 WO2021194631A1 PCT/US2021/015802 US2021015802W WO2021194631A1 WO 2021194631 A1 WO2021194631 A1 WO 2021194631A1 US 2021015802 W US2021015802 W US 2021015802W WO 2021194631 A1 WO2021194631 A1 WO 2021194631A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- time series
- data
- values
- state estimation
- original time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0224—Process history based detection method, e.g. whereby history implies the availability of large amounts of data
- G05B23/024—Quantitative history assessment, e.g. mathematical relationships between available data; Functions therefor; Principal component analysis [PCA]; Partial least square [PLS]; Statistical classifiers, e.g. Bayesian networks, linear regression or correlation analysis; Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/10—Numerical modelling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- sensors are added to or associated with components of electrical, mechanical, distribution, and other systems to enable sensor-based automation. These sensors generate a flood of information describing the behavior of the components which is stored, at least for a time, in databases.
- Machine learning algorithms may be applied to the sensed information to enable prognostics, anomaly discovery/detection, and predictive maintenance for system components monitored by sensors.
- Examples of sensor-based automation can be found in the electrical utility, oil-and-gas, environmental and water quality monitoring, data processing, manufacturing, passenger and cargo transportation, and even financial service sectors.
- behavior of equipment or systems and/or decisions made by machine learning processes may be subject to review by regulators.
- Such review may be based on the stored sensor information, and hefty fines for the offending entity may result from stored sensor information that shows non- compliant behavior. It is therefore worthwhile for the regulated entity and the regulator to be able to ensure that stored sensor information is not corrupted or tampered with.
- a computer-implemented method for auditing the results of a machine learning model comprising: retrieving a set of state estimates for original time series data values from a database under audit, wherein the each of the state estimates is generated by a state estimation computation for one of the time series data values; reversing the state estimation computation for each of the state estimates to produce reconstituted time series data values for each of the state estimates; retrieving the original time series data values from the database under audit; comparing the original time series data values pairwise with the reconstituted time series data values to determine whether the original time series and reconstituted time series match; and generating a signal that the database under audit (i) has not been modified where the original time series and reconstituted time series match, and (ii) has been modified where the original time series and reconstituted time series do not match.
- the computer-implemented method for auditing the results of a machine learning model further comprising: training a multivariate state estimation model with a set of training values selected from the original time series values; decomposing a matrix associated with the multivariate state estimation module into a set of eigenvectors; selecting a subset of major eigenvectors from the set of eigenvectors; and creating a limited multivariate state estimation model from the subset of major eigenvectors; wherein the reversal of the state estimation computation further comprises generating the reverse of a computation by which the limited multivariate state estimation model forms state estimates.
- the computer-implemented method for auditing the results of a machine learning model further comprising: omitting training data values from the original time series data values from the database under audit; preprocessing remaining original time series data values to mitigate the effects of sensor disturbances and generating a model that records the changes to the remaining original time series data; and performing state estimation for the preprocessed, remaining original time series data values using the limited multivariate state estimation model to create a compressed time series database.
- the computer-implemented method for auditing the results of a machine learning model further comprising: preprocessing the original time series values to mitigate effects of sensor disturbances on quality of the original time series values; and generating a data preprocessing model that records one or more changes to the original time series values during the preprocessing; wherein the reversal of the state estimation computation further comprises: retrieving the data preprocessing model; and reversing the preprocess described by the data preprocessing model for each of the state estimates.
- the computer-implemented method for auditing the results of a machine learning model further comprising generating an electronic data report data structure that includes: a preprocessing model that records one or more changes to the original time series values during preprocessing to mitigate effects of sensor disturbances on quality of the original time series values; a compressed time series database generated by performing state estimation with a limited multivariate state estimation model and excluding from the compressed time series database those values that are not parameters of the limited multivariate state estimation model; and one or more of (i) a set of training values selected from the original time series values and used to train the limited multivariate state estimation model, and (ii) the limited multivariate state estimation model.
- the computer-implemented method for auditing the results of a machine learning model wherein the reversal of the state estimation computation further comprises: identifying a reverse state estimation computation that undoes the steps performed by the state estimation computation to form the state estimates from the original time series data values; generating a set of reverse state estimates for the original time series data from the set of state estimates, wherein each of the reverse state estimates is generated by performing the reverse state estimation for one of the set of state estimates; wherein the reconstituted time series data values for each of the state estimates is based on the reverse state estimates.
- the computer-implemented method for auditing the results of a machine learning model further comprising: in response to the signal that the database under audit (i) has not been modified, generating an electronic verification report message indicating that the database under audit is certified to be uncorrupted and not tampered with, and (ii) has been modified, generating an electronic verification report message indicating that the database under audit is either corrupted or tampered with; and transmitting the generated electronic verification report to a computing device to cause the verification report message to be stored by the computing device or displayed by the computing device.
- a non-transitory computer-readable medium storing computer-executable instructions for auditing the results of a machine learning model, that, when executed by at least a processor of a computer, cause the computer to: retrieve a set of state estimates for original time series data values from a database under audit, wherein the state estimates were generated by a state estimation computation for each of the time series data values; reverse the state estimation computation for each of the state estimates to produce reconstituted time series data values for each of the state estimates; retrieve the original time series data values from the database under audit; compare the original time series data values pairwise with the reconstituted time series data values to determine whether the original time series and reconstituted time series match; and generate a signal that the database under audit (i) has not been modified where the original time series and reconstituted time series match, and (ii) has been modified where the original time series and reconstituted time series do not match.
- the non-transitory computer readable medium wherein the instructions further cause the computer to: train a multivariate state estimation model with a set of training values selected from the original time series values; decompose a matrix associated with the multivariate state estimation module into a set of eigenvectors; select a subset of eigenvectors having the largest eigenvalues from the set of eigenvectors; create a limited multivariate state estimation model from the subset of eigenvectors; preprocess the original time series values to mitigate effects of sensor disturbances on quality of the original time series values; create a data preprocessing model that records one or more changes to the original time series values during the preprocess; perform state estimation for the original time series data values using the limited multivariate state estimation model to create a compressed time series database; and generate a report including the preprocessing model, the compressed time series database, and one or more of (i) the set of training values and (ii) the limited multivariate state estimation model;
- a computing system for auditing the results of a machine learning model comprising: a processor; a memory operably connected to the processor; a sensor interface operably connected to the processor and memory; a non-transitory computer-readable medium operably connected to the processor and memory and storing computer-executable instructions that when executed by at least a processor of a computer cause the computer to: retrieve a set of state estimates for original time series data values received through the sensor interface from a database under audit, wherein the state estimates were generated by a state estimation computation for each of the time series data values; reverse the state estimation computation for each of the state estimates to produce reconstituted time series data values for each of the state estimates; retrieve the original time series data values from the database under audit; compare the original time series data values pairwise with the reconstituted time series data values to determine whether the original time series and reconstituted time series match; and generate a signal that the database under audit (i) has not been modified where the original time series and reconstituted time series match, and (i
- the computing system for auditing the results of a machine learning model wherein the non-transitory computer-readable medium further comprises instructions that when executed by at least the processor cause the computing system to: train a multivariate state estimation model with a set of training values selected from the original time series values; decompose a matrix associated with the multivariate state estimation module into a set of eigenvectors; select a subset of major eigenvectors from the set of eigenvectors; and create a limited multivariate state estimation model from the subset of major eigenvectors; wherein the reversal of the state estimation computation further comprises generating the reverse of a computation by which the limited multivariate state estimation model forms state estimates.
- the computing system for auditing the results of a machine learning model wherein the non-transitory computer-readable medium further comprises instructions that when executed by at least the processor cause the computing system to: omit training data values from the original time series data values from the database under audit; preprocess remaining original time series data values to mitigate the effects of sensor disturbances and generating a model that records the changes to the remaining original time series data; and perform state estimation for the preprocessed, remaining original time series data values using the limited multivariate state estimation model to create a compressed time series database.
- the computing system for auditing the results of a machine learning model wherein the instructions for reversal of the state estimation computation further cause the computing system to generate an electronic data report data structure that includes: a preprocessing model that records one or more changes to the original time series values during preprocessing to mitigate effects of sensor disturbances on quality of the original time series values; a compressed time series database generated by performing state estimation with a limited multivariate state estimation model and excluding from the compressed time series database those values that are not parameters of the limited multivariate state estimation model; and one or more of (i) a set of training values selected from the original time series values and used to train the limited multivariate state estimation model, and (ii) the limited multivariate state estimation model; wherein the instructions for reversal of the state estimation computation further cause the computer to: undo the steps performed by the state estimation computation using a reverse state estimation computation; and reverse the preprocess described by the preprocessing model for each of the state estimates.
- a preprocessing model that records one or more changes to the original time series values during
- the computing system for auditing the results of a machine learning model wherein the non-transitory computer-readable medium further comprises instructions that when executed by at least the processor cause the computing system to: identify a reverse state estimation computation that undoes the steps performed by the state estimation computation to form the state estimates from the original time series data values; generate a set of reverse state estimates for the original time series data from the set of state estimates, wherein each of the reverse state estimates is generated by performing the reverse state estimation for one of the set of state estimates; wherein the reconstituted time series data values for each of the state estimates is based on the reverse state estimates.
- the computing system for auditing the results of a machine learning model wherein the non-transitory computer-readable medium further comprises instructions that when executed by at least the processor cause the computing system to: in response to the signal that the database under audit (i) has not been modified, generate an electronic verification report message indicating that the database under audit is certified to be uncorrupted and not tampered with, and (ii) has been modified, generate an electronic verification report message indicating that the database under audit is either corrupted or tampered with; and transmit the generated electronic verification report to a computing device to cause the verification report message to be stored by the computing device or displayed by the computing device.
- FIG. 1 illustrates one embodiment of a system associated with ensuring that the results of machine learning models can be audited.
- FIG. 2 illustrates one embodiment of a method associated with ensuring that the results of machine learning models can be audited.
- FIG. 3 illustrates a flowchart of one embodiment of an process associated with ensuring that the results of machine learning models can be audited.
- FIG. 4 illustrates a schematic of one embodiment of storing change records from intelligent data preprocessing in an example IDP model.
- FIG. 5 illustrates a schematic of one embodiment of an MSET model training process associated with ensuring that the results of machine learning models can be audited.
- FIG. 6 illustrates a schematic of one embodiment of an MSET model limiting process associated with ensuring that the results of machine learning models can be audited.
- FIG. 7 illustrates a schematic of one embodiment of a data compression process associated with ensuring that the results of machine learning models can be audited.
- FIG. 8A illustrates a first example data report format resulting from one embodiment of a data compression process.
- FIG. 8B illustrates a second example data report format resulting from another embodiment of a data compression process.
- FIG. 9 illustrates a schematic of one embodiment of a data reconstruction process using a first data report format.
- FIG. 10 illustrates a schematic of one embodiment of a data reconstruction process using a second data report format.
- FIG. 11 illustrates an example computing device that is configured and/or programmed with one or more of the example systems and methods described herein, and/or equivalents. DETAILED DESCRIPTION
- Sensors such as Internet-of-Things (loT) sensors may be added to physical devices to monitor the operation of those devices.
- These sensors can be numerous, especially in dense-sensor industries such as utilities, oil & gas, and manufacturing.
- an oil refinery can include over one million sensors.
- a utility grid can include well in excess of that number, especially when sensors for supervisory control and data acquisition (SCADA) on utility assets such as generating stations and transformer substations and sensors for advanced metering infrastructure (AMI) are taken into consideration.
- SCADA supervisory control and data acquisition
- AMI advanced metering infrastructure
- the data from these sensors may be stored as time-series — a series of data points indexed in time order, or pairs of values and associated time.
- the time-series data may be stored in time-series databases — database systems optimized for storing and serving time series. Accordingly, the utility industry can generate very large time-series databases of sensor readings, on the order of petabytes or greater.
- time-series data is used for product development or other scientific purposes
- accuracy of data analyses and in some cases the soundness of conclusions drawn, will be negatively impacted by the above types of anomalies.
- the systems and methods described herein also enable a solution to this problem by providing for signal validation and sensor operability validation for time-series databases that originate from sensors monitoring critical assets.
- Machine learning (ML) algorithms may be applied to sensor data stored in time-series databases to enable prognostics, anomaly discovery/detection, and predictive maintenance for devices monitored by sensors. This may be referred to as automated prognostic surveillance.
- ML Machine learning
- Techniques for extracting data for legal and regulatory purposes are often subject to some or all of the following disadvantages: (i) the technique may select data using ad hoc, as-needed, or arbitrary criteria without a rigorous, theoretical basis; (ii) the selected data sets for the technique are inadequate for recovering the original source data; and (iii) the technique may make it possible to hide violations of regulatory requirements because of the inability to recover the original source data. Indeed, regarding the third disadvantage, it is reported that this has actually occurred in practice.
- the PUC scrutinizes the time-series signals very carefully because the PUC automatically assumes that the utility has an incentive to adjust some time series values to make it appear that generation capacity never drops below the 10% threshold. Similarly, the utility automatically assumes that the PUC has an incentive to adjust some time series values to make it appear that generation capacity dropped a fraction of a percent below the 10% threshold, thereby extracting significant fines from the utility.
- Sensed time-series data for example, including time-stamped sensor readings and user associated with the sensor input, if any
- a ‘knowledge’ state at the time of the measurement for example, one or more particular machine learning models, particular conditions, and particular data used at the time of the analysis).
- the sensed time-series data may be very bulky.
- a relatively small, but salient (or most prominent or important, collection of data can be saved from which one can recover a close approximation to the original data, which is sufficient for legal and regulatory requirements.
- a collection of saved data with this property of recoverability — that is, recoverability of the approximation of the original data from the saved data — is referred to herein as a “tamper-proof” data set.
- the creation of a tamper-proof data set may involve one or more of the techniques described in “Intelligent Preprocessing of Multi-Dimensional Time- Series Data”, inventors D. Gawlick, K.
- the systems and methods described herein thus enable creation of a close approximation to the original source data with a relatively small amount of processed data — a tamper-proof data set — which can be used to satisfy legal and regulatory requirements. Further, the tamper-proof data set can be used to satisfy these requirements while still accommodating machine-learning signal validation and sensor operability validation techniques.
- the processing technique for extracting the tamper- proof data set exhibits three advantageous properties:
- the processed data of the tamper-proof data set is relatively small compared with the input data.
- the processed data of the tamper-proof data set may be several orders of magnitude smaller.
- Processing techniques such as the Multivariate State Estimation Technique (MSET) exhibit the properties of determinism, compression, and reversibility and accordingly is used in one embodiment of this invention.
- MSET Multivariate State Estimation Technique
- any processing technique for extracting a data set that satisfies the three properties above can be used for the creation of a tamper-proof data set in accordance with the systems and methods described herein.
- the systems and methods described herein improve existing ML surveillance systems to which they are applied, causing increased accuracy of alerts and reducing false alarm rates for ML prognostic anomaly discovery. Note that these improvements may be realized by the implementation of the systems and methods described herein, and do not require hardware upgrades anywhere in the systems in which they are implemented. The systems and methods described herein are therefore immediately backward compatible with any existing loT system. This is particularly advantageous in the power utility, oil & gas, manufacturing, and aviation industries where legacy sensor data collection systems are already in place and would require significant labor to upgrade.
- the systems and methods described herein are described with reference to the power utility sector, but clearly have application wherever loT sensor time series data is collected and used, for example in the oil & gas, manufacturing, and aviation sectors.
- the systems and methods described herein may be applied in processing of streaming digitized data for utility assets inside generating facilities (for example, coal power plants, oil power plants, nuclear power plants, wind turbines, geothermal generators, gas turbine power plants, and others, as well as critical assets in the power distribution grid, such as transformers, substations, and SCADA systems).
- a method and process for ensuring that the results of machine learning models can be audited includes features of (i) tamper- proofing, (ii) snapshot isolation, (iii) journaling, (iv) activity journaling, and (v) data provenance recording.
- tamper-proofing is a process or system configuration to ensure that malicious data modification (tampering) or accidental data modification (corruption) will be detected.
- compact records that will expose any changes to the original data can be included in an audit report.
- Anomalous values in time series data may be identified by ML processing of the raw time series data to produce estimates of what the values ‘ought to be’ in the context of surrounding data.
- Neural networks NNs
- SVMs support vector machines
- MSET and variants such as Oracle’s proprietary advanced MSET pattern recognition “MSET2”) may also be employed for anomaly detection. All three approaches (NNs, SVMs, and MSET) are, on a black-box level, Nonlinear nonparametric (NLNP) regression algorithms.
- NLNP regression is employed for prognostics, anomaly discovery/detection, and predictive maintenance in a time-series dataflow process primarily because a NLNP machine learning technique makes no assumptions about the linear or nonlinear relationships between/among the time series “signals,” but instead learns those relationships empirically.
- NNs and SVMs both employ stochastic (or apparently random) processes for optimization of the weights.
- stochastic optimization of weights occurs between perceptron layers.
- stochastic optimization occurs in convex quadratic programming optimization of the regularization parameter to keep a balance between bias and variance in the SVM estimates. For example, in both cases (NNs and SVMs), if the pattern recognition is trained with data from Monday, versus if it is trained with data from Tuesday, the relationship between the output estimates and the input raw signals will be extremely similar.
- MSET for anomaly detection (and associated prognostics and predictive maintenance) yields estimates that may be stored alongside the original raw time series telemetry values.
- MSET is a deterministic (but complex) mathematical algorithm, the MSET estimates are reversible as described above, which is key for tamper-proofing in auditability assurance.
- MSET is applied for anomaly detection in a time-series data process to enable tamper proofing as part of auditability assurance.
- the change to the original raw data values can be detected based on the accompanying MSET estimates. Or where the original raw data streams are not tampered with or are uncorrupted, the original raw data values can be validated or confirmed to be unchanged based on the accompanying estimates.
- This tamper-free certification may be performed at any time following the creation of the MSET estimates. This tamper-free certification is based on incorporation of the deterministic, reversible MSET algorithm into an anomaly detection process as described herein.
- a compact data set for estimating the original values of a time-series data may be captured at any point in time and stored along side other information about changes made over the life of the time-series database, and employed in an auditability process.
- snapshot isolation forms a part of the auditability process.
- a temporal database offers the ability to store and retrieve any version of a record. The versions are identified by a strictly ascending transaction time — the time a version was available or visible for queries and follow-on processing. Each specific version may be referred to as a snapshot.
- a tamper-proof data set created at a specific time may be used to validate the data values of the snapshots available at that creation time.
- journaling processes are configured to track all changes to the database. Each change in the database results in creation of a journal entry in a journal associated with the database. Journals are typically highly resistant to data loss. For example, in a journaled database, no data changes on permanent media for the database, and no external notifications regarding the database are permitted until the journal data describing the change are stored. This enables auditing of the database even in the presence of failures. Journals allow the system to reconstruct any snapshot of a database even without temporal support, at the cost of a tremendous performance burden in snapshot queries. Every database has a copy of the most current snapshot, and temporal databases provide many snapshots back in time. For immutable data such as sensor readings, snapshots and journals can be stored as a single copy.
- auditing may use additional knowledge about the activities of users, for example, answering questions about “who saw what information and when (in response to a query)?” Or, “who inserted, updated, or deleted what information and when?” And, “what actions were done in the same transactions (how are the actions related)?” Such information may be recorded in an activity journal entry.
- the entries of the activity journal are synchronized with the standard journal entries.
- the activity journals thus provide additional context information describing the circumstances that lead to a database change and the attendant creation of a journal entry.
- information describing the data provenance may also be included in or synchronized with the standard journal entries, providing still further context information describing the circumstances that lead to a database change and the attendant creation of a journal entry.
- Data provenance information associates derived data with corresponding inputs, processing steps, and physical-processing environment. For example, provenance information identifies the data that form the basis of a query result. In one embodiment, provenance processes may be configured to re-write queries in order to determine these data. In the context of an audit, provenance metadata substantially reduces the data that have to be considered, as it retains a record of all data that formed the basis of a query result. In one embodiment, the provenance information may include further information such as descriptions of sensors that provide the data, operability states of the sensor, assets associated with the sensors, version of the asset, relationships between assets.
- the tamper-proof data sets are created in response to the change to the database, contemporaneously with the creation of the journal entry.
- the tamper-proof data set is captured in the journal just like other metadata (change metadata including activity journaling as well as data provenance metadata) associated with the change.
- the tamper-proof data set is captured for every change to the database, assuring that any subsequent alteration of data will be detected in an audit process conducted in accordance with the systems and methods described herein, (and incidentally also assuring that the machine learning software configuration in operation at the time of the change are captured and auditable along with the rest of the journal information).
- the systems and methods disclosed herein therefore enable an audit to be performed on original data in any snapshot of the data, regardless of later temporal evolution of the data in the time-series database. Auditability of the results of machine learning in a time-series database is thus ensured by enabling capture of all necessary information at each change to the database — journal entries, activity journal entries, data provenance, and tamper-proof machine learning records — thus tracking the entire historical provenance of a database of signals.
- FIG. 1 illustrates one embodiment of a system 100 associated with ensuring that the results of machine learning models can be audited.
- the system 100 includes a time series data service 105 and an enterprise network 110 connected by a network 115 such as the Internet.
- the time series data service 105 is connected either directly to sensors (such as sensors 120) or remote terminal units (RTUs) through a network 125 or indirectly to sensors (such as sensors 130) or RTUs through one or one or more upstream devices 135.
- networks 115 and 125 are the same network, and in another embodiment, networks 115 and 125 are separate networks.
- time series data service 105 includes various systems which may include a machine learning audit assurance system 140, a sensor interface server 145, a prognostics, anomaly discovery, and predictive maintenance system 150, a web interface server 155, and data store 160.
- Each of these systems 140 - 160 is interconnected by server side network 165.
- Each of these systems 140 - 160 are configured with logic, for example by various software modules, for executing the functions they are described as performing.
- the systems 140 - 160 are implemented by dedicated computing devices.
- one or more of the systems 140 - 160 may be implemented by a common (or shared) computing device, even though represented as discrete units in FIG. 1.
- time series data service 105 may be hosted by a third party, and/or operated by a third party for the benefit of multiple account owners/tenants, each of whom is operating a business, and each of whom has an associated enterprise network 110.
- time series data service 105 is associated with a utility entity such as a power utility, or associated with a major utility asset such as a generation facility, substation, or other major power grid component.
- time series data service 105 is configured with logic, such as software modules, to operate the time series data service 105 to (i) create and export time-series databases and/or (ii) audit a time series database in accordance with the systems and methods described herein.
- the sensors 120, 130 can be affixed to or otherwise configured to detect the performance of one or more components of a device or system.
- the devices or systems generally include any type of machinery or facility with components that perform measurable activities.
- the sensors 120, 130 may include (but are not limited to): a voltage sensor, a current sensor, a temperature sensor, a pressure sensor, a rotational speed sensor, a flow meter sensor, a vibration sensor, a microphone, an electromagnetic radiation sensor, a proximity sensor, a gyroscope, an inclinometer, an accelerometer, a global positioning system (GPS) sensor, a torque sensor, a flex sensor, a nuclear radiation detector, or any of a wide variety of other sensors or transducers for generating electrical signals that describe detected or sensed physical behavior.
- GPS global positioning system
- the sensors 120, 130 are connected through network 125 to sensor interface server 145.
- sensor interface server 145 is configured with logic, such as software modules, to collect readings from sensors 120, 130 and store them as observations in a time series, for example in data store 160.
- the sensor interface server 145 is configured to interact with the sensors, for example by exposing one or more application programming interfaces (APIs) configured accept readings from sensors using sensor data formats and communication protocols applicable to the various sensors 120, 130.
- APIs application programming interfaces
- the sensor data format will generally be dictated by the sensor device.
- the communication protocol may be a custom protocol (such as a legacy protocol predating loT implementation) or any of a variety of loT or machine to machine (M2M) protocols such as Constrained Application Protocol (CoAP), Data
- DDS Distribution Service
- DPWS Devices Profile for Web Services
- HTTP/REST Hypertext Transport Protocol / Representational State Transfer
- MQTT MQ Telemetry Transport
- UDP Universal Plug and Play
- XMPP Extensible Messaging and Presence Protocol
- SCADA protocols such as OLE for Process Control Unified Architecture (OPC UA), Modbus RTU, RP-570, Profibus, Conitel, IEC 60870-5-101 or 104, IEC 61850, and DNP3 may also be employed when extended to operate over TCP/IP or UDP.
- OPC UA OLE for Process Control Unified Architecture
- Modbus RTU Modbus RTU
- RP-570 Profibus
- Conitel Conitel
- IEC 60870-5-101 or 104 IEC 61850
- DNP3 may also be employed when extended to operate over TCP/IP or UDP.
- the sensor interface server 145 polls sensors 120, 130 to retrieve sensor readings. In one embodiment, the sensor interface server passively receives sensor readings actively transmitted by sensors 120, 130.
- enterprise network 110 may be associated with a utility entity such as a power utility. In one embodiment, enterprise network 110 may be associated with a regulatory entity, such as a government.
- the enterprise network 110 is represented by an on-site local area network 170 to which one or more personal computers 175, or servers 180 are operably connected, along with one or more remote user computers 185 that are connected to the enterprise network 110 through the network 115 or other suitable communications network or combination of networks.
- the personal computers 175 and remote user computers 185 can be, for example, a desktop computer, laptop computer, tablet computer, smartphone, or other device having the ability to connect to local area network 170 or network 115 or having other synchronization capabilities.
- the computers of the enterprise network 110 interface with time series data service 105 across the network 115 or another suitable communications network or combination of networks.
- remote computing systems may access information or applications provided by the time series data service 105 through web interface server 155.
- computers 175, 180, 185 of the enterprise network 110 may request a time-series database from time series data series data service 105.
- computers 175, 180, 185 of the enterprise network 110 may perform an audit of a time series database in accordance with the systems and methods described herein.
- the remote computing system may send requests to and receive responses from web interface server 155.
- access to the information or applications may be effected through use of a web browser on a personal computer 175 or remote user computers 185.
- these communications may be exchanged between web interface server 155 and server 180, and may take the form of remote representational state transfer (REST) requests using JavaScript object notation (JSON) as the data interchange format for example, or simple object access protocol (SOAP) requests to and from XML servers.
- REST remote representational state transfer
- JSON JavaScript object notation
- SOAP simple object access protocol
- data store 160 includes one or more time-series databases configured to store and serve time series data received by sensor interface server 145 from sensors 120, 130.
- the time-series database is an Oracle® database configured to store and serve time-series data.
- data store(s) 160 may be implemented using a network-attached storage (NAS) device and/or other dedicated server device.
- NAS network-attached storage
- upstream device 135 may be a third-party service for managing loT connected devices.
- upstream device 135 may be a gateway device configured to enable sensors 130 to communicate with sensor interface server 145 (for example, where sensors 130 are not loT- enabled, and therefore unable to communicate directly with sensor interface server 145).
- each step of computer-implemented methods described herein may be performed by a processor (such as processor 1110 as shown and described with reference to FIG. 11 ) of one or more computing devices (i) accessing memory (such as memory 1115 and/or other computing device components shown and described with reference to FIG. 11) and (ii) configured with logic to cause the system to execute the step of the method (such as machine learning audit assurance logic 1130 shown and described with reference to FIG. 11 ).
- the processor accesses and reads from or writes to the memory to perform the steps of the computer-implemented methods described herein.
- These steps may include (i) retrieving any necessary information, (ii) calculating, determining, generating, classifying, or otherwise creating any data, and (iii) storing any data calculated, determined, generated, classified, or otherwise created.
- References to storage or storing indicate storage as a data structure in memory or storage/disks of a computing device (such as memory 1115, or storage/disks 1135 of computing device 1105 or remote computers 1165 shown and described with reference to FIG. 11 ).
- each subsequent step of a method commences in response to parsing a signal received or stored data retrieved indicating that the previous step has been performed at least to the extent necessary for the subsequent step to commence.
- the signal received or the stored data retrieved indicates completion of the previous step.
- FIG. 2 illustrates one embodiment of a method 200 associated with ensuring that the results of machine learning models can be audited.
- the steps of method 200 are performed by machine learning audit assurance system 140 or any of computers 175, 180, 185 in enterprise network 110 (as shown and described with reference to FIG. 1).
- machine learning audit assurance system 140 or any of computers 175, 180, 185 are special purpose computing devices (such as computing device 1105) configured with machine learning audit assurance logic 1130.
- the method 200 may be initiated based on various triggers, such as receiving a signal over a network or parsing stored data indicating that (i) a user (or administrator) of time series data service 105 or of computers 175, 180, 185 has initiated method 200, (ii) method 200 is scheduled to be initiated at defined times or time intervals, (iii) a user (associated with a utility or a regulatory entity) of time series data service 105 or of computers 175, 180, 185 has requested an audit of a time series database, or (iv) some other trigger indicating that method 200 should begin.
- the method 200 initiates at START block 205 in response to parsing a signal received or stored data retrieved and determining that the signal or stored data indicates that the method 200 should begin. Processing continues to process block 210.
- the processor retrieves a set of state estimates for original time series data values from a database under audit.
- the state estimates were generated by a state estimation computation for each of the time series data values. Processing at process block 210 completes, and processing continues to process block 215.
- process block 215 the processor reverses the state estimation computation for each of the state estimates to produce reconstituted time series data values for each of the state estimates. Processing at process block 215 completes, and processing continues to process block 220.
- the processor retrieves the original time series data values from the database under audit. Processing at process block 220 completes, and processing continues to process block 225.
- the processor comparing the original time series data values pairwise with the reconstituted time series data values to determine whether the original time series and reconstituted time series match. Processing at process block 225 completes, and processing continues to decision block 230. [0078] At decision block 230, the processor evaluates whether the original time series matches the reconstituted time series. If the original time series matches the reconstituted time series (YES), processing at decision block 230 completes and processing continues to process block 235. If the original time series does not match the reconstituted time series (NO), processing at decision block 230 completes and processing continues to process block 240.
- process block 235 the processor generates a signal that the database under audit has not been modified because the original time series and reconstituted time series match. Processing at process block 235 completes, and processing continues to END block 245, where process 200 ends.
- process block 240 the processor generating a signal that the database under audit has been modified because the original time series and reconstituted time series do not match. Processing at process block 240 completes, and processing continues to END block 245, where process 200 ends.
- FIG. 3 illustrates a flowchart of one embodiment of an auditability process 300 associated with ensuring that the results of machine learning models can be audited.
- MSET is incorporated in a data flow process, and the MSET operations are reversed to reconstitute the original time-series data for auditability. This figure demonstrates the auditability of this application of MSET.
- the auditability process 300 starts when the system retrieves MSET estimates from a time-series database. Next, the system reverses the MSET computation to produce reconstituted time-series data from the MSET estimates. As discussed above, MSET is a reversible process, which means that performing the reverse MSET computation on the MSET estimates will reconstitute the original time-series data. The system then is given the original time-series data from the database being audited. Finally, the system compares the reconstituted time-series data with the original time-series data to certify that there has been no deliberate tampering (or accidental data corruption) in original time-series data if the comparison indicates a match.
- actions described with reference to the auditability process 300 may be performed by a processor of one or more computing devices accessing memory, storage and/or other computing device components shown and described with reference to FIGs. 1 and 11.
- an archived time series database 305 is presented for audit to determine if the original time series data in the database is intact or if the original time series data in the database is corrupted or tampered with.
- the archived time series database 305 may include one or more time series signals, which is a sequence of time series values for all observations of a time series.
- the archived time series database 305 will be presented for pairwise difference analysis 310 between original time series data values from the archived time series database 305 and reconstituted time series values derived from MSET modeling of the archived database and MSET estimation of the values.
- archived time series database 305 is a snapshot of a database at a time when a change was made to the database. Additionally, at the time the change is made to the database, a journal entry describing the change is made, and a set of state estimates is created for later pairwise difference analysis 310 in the context of an audit of the archived time series database 305.
- the set of state estimates is stored in association with the journal entry for the database under audit, archived time series database 305.
- the set of state estimates may have been generated in response to one or more commands indicating a change to the database under audit, and stored in a data structure associated with a journal entry describing the change.
- Other metadata may also be stored in association with the journal entry, including provenance metadata that identify the data on which query results (that lead to the journaled change) are based.
- Sensor disturbances including de-calibration bias, intermittent stuck-at faults, change-of-gain drifts, and episodic spikiness degrade signal quality and are a primary cause of false-alarms (Type-1 errors) and missed-alarms (Type-ll errors) in ML prognostics for loT applications.
- the original time series data values in the archived time series database 305 may include some of these sensor disturbances.
- the archived time series database 305 is subjected to a series of intelligent data preprocessing 315 steps to cleanse the data and prepare it for MSET modeling and estimation. The preprocessing serves to mitigate the undesirable effects of the sensor disturbances.
- the intelligent data preprocessing 315 is iterated for one or more time series included in the archived time series database 305.
- all observations from all signals in archived time series database 305 are first preprocessed and then optimally re-sampled and “harmonized,” for example by using the Oracle® analytical resampling process (ARP), to produce an updated database of cleansed and optimally re-sampled/synchronized signals.
- ARP may involve one or more of the techniques described in “Automated Analytic Resampling Process for Optimally Synchronizing Time- Series Signals”, inventors K. C. Gross and G. C. Wang, U.S. Pat. App. Ser. No. 16/168,193, filed October, 23, 2018, which is hereby incorporated by reference herein in its entirety.
- the archived time series database 305 undergoes a missing value imputation process 320 during the intelligent data preprocessing 315.
- the missing value imputation process 320 parses the original time series signal with a missing value check.
- the missing value imputation process 320 fills in the missing value with an estimated value.
- the estimated value is a simple interpolation.
- the estimate is a highly accurate estimate based on MSET- derived serial correlation and cross-correlations with the existing values, rather than a simple interpolation.
- the updated, filled-in time series is stored for future processing.
- the missing value imputation process 320 may also store a record of the locations of missing values in the time series that have been filled with an estimate for later reversal.
- the ARP may involve one or more of the techniques described in “Missing Value Imputation to Facilitate Prognostic Analysis of Time-Series Sensor Data”, inventors G. C. Wang, K. C. Gross, and D. Gawlick, U.S. Pat. App. Ser. No. 16/005,495, filed Jun. 11, 2018, which is hereby incorporated by reference herein in its entirety.
- the intelligent data preprocessing 315 includes a despiking process 325.
- the despiking process 325 may occur after missing values in the original time series are replaced by the missing value imputation process 320.
- the updated timeseries signals are parsed through an outlier check to detect and remove data “spikes,” abrupt, short-lived variations that do not represent accurate sensor readings.
- the outlier check detects the spikes in the signals by iteratively characterizing (generating descriptive parameters to describe the characteristics and behavior of) a variety of statistical distributions for the signals. Time series data values that are outliers based on these characterizations are flagged as data value spikes. The captured spikes are replaced temporarily with the signal average.
- the updated, despiked time series is stored for later processing.
- the despiking process 325 may also store a record of the value and location within the time series of the detected spikes.
- the despiking process 325 may involve one or more of the techniques described in “Synthesizing High-Fidelity Signals with Spikes for Prognostic Surveillance Applications”, inventors G. C. Wang and K. C. Gross, U.S. Pat. App. Ser. No. 16/215,345, filed December 10, 2018, which is hereby incorporated by reference herein in its entirety.
- the intelligent data preprocessing 315 includes an un-quantizing process 330.
- the un-quantizing process 330 may occur after data spikes are detected and removed from the time series by the despiking process 325.
- the updated signals are parsed through a “quantization” check to determine whether data quantization — a lossy data compression technique in which intervals of data are grouped or binned into single representative values — has caused signal values to switch rapidly and repeatedly between adjacent representative values.
- the quantization check identifies sections of the time series where the observation points bounce back and forth between a certain numbers of observation caps.
- Data quantization can be caused by caused by low-resolution “quantized” transducers or sensors.
- the un-quantizing process 330 then converts quantized values detected in the time series to high-accuracy continuous signals. These continuous signals are very close approximations to what the signal values would have been if they had been detected using higher resolution transducers.
- the updated, un-quantized time series is stored for later processing.
- the un-quantizing process 330 may also store a record of the values and location within the time series of the detected quantized values.
- the un-quantizing process 330 may involve one or more of the techniques described in “Dequantizing Low-Resolution IOT Signals to Produce High-Accuracy Prognostic Indicators”, inventors M. Li and K. C. Gross, U.S. Pat. No. 10,496,084, granted Dec.
- the intelligent data preprocessing 315 includes an un-stairstepping process 335.
- the un-stairstepping process 335 may occur after quantized values are detected and replaced in the time series by the un-quantizing process 330.
- Stairstepping results from a mismatch in sampling rates between recording systems and detection systems where the slower sampling rate signals simply repeat their last measured values at a higher sampling rate, so that all measured signals result in a uniform sampling rate.
- the time series values for slower sampling rate sensors have sequences of flat segments, resembling stair steps. Stairstepping is a common problem with commercial data historical archives that use a simple algorithm for collecting low sampling rate data into higher sampling rate time series.
- the un-stairstepping process 335 parses the time series to identify any stairstepped values present.
- the un-stairstepping process 335 “fills in” the stairstepped portions of the signals with the higher-sampling rate signals.
- the higher sampling rate signal values may be derived, in one embodiment, using MSET estimates.
- the updated, filled-in time series is stored for future processing.
- the un stairstepping process 335 may also store a record of the values and location within the time series of the detected stairstepped values for later reversal.
- the un-stairstepping process 335 may involve one or more of the techniques described in “Replacing Stair-Stepped Values in Time-Series Sensor Signals With Inferential Values to Facilitate Prognostic Surveillance Operations”, inventors K. C. Gross and G. C. Wang, U.S. Pat. App. Ser. No. 16/128,071, filed September 11, 2018, which is hereby incorporated by reference herein in its entirety.
- the intelligent data preprocessing 315 includes a uniform sampling process 340.
- uniform sampling process 340 may occur after stairstepped values are detected and replaced in the time series by the un-stairstepping process 335.
- the signals are parsed with a sampling rate check to identify whether the sampling rates of signals differ. If the signals exhibit different sampling rates (such as having a different number of observations over the same period of time), then the observations of the slower signals will be resampled to match the highest sampling rate of the signals.
- the updated, resampled time series is stored for future processing.
- the uniform sampling process 340 may also store a record of the original, un-resampled values and their placement within the time series for later reversal.
- the intelligent data preprocessing 315 includes a phase synchronization process 345.
- phase synchronization process 345 may occur after the uniform sampling process 340.
- phase synchronization process 345 may occur in parallel with the uniform sampling process 340.
- the updated signals are parsed through a correlation check to detect out-of-phase observations.
- Out-of- phase observations (or time-signal values that are associated with an incorrect time index in the time series) may be due to, for example, clock synchronization disparities in measurement instrumentation such as sensors.
- the phase synchronization process 345 shifts the out-of-phase values in the time domain to align them with the correct time index.
- the updated, phase-synchronized time series is stored for future processing.
- the phase synchronization process 345 may also store a record of the original placement of the out-of-phase values within the time series for later reversal.
- one or more of the uniform sampling process 340 and phase synchronization process is performed using the Oracle® analytical resampling process (ARP), and may involve one or more of the techniques described in “Automated Analytic Resampling Process for Optimally Synchronizing Time-Series Signals,” incorporated by reference above.
- ARP Oracle® analytical resampling process
- signal sampling may vary between time series signals in the archived time series database 305.
- one time series may have a high, but regular sampling rate or interval between observations.
- Another time series may have a low, but regular sampling rate or interval between observations.
- Another time series may have an irregular or uneven interval between observations.
- the phase of time series signals may be adjusted so that a set of time series signals in the archived time series database 305 are aligned with respect to observation time.
- the data values of a time series signal may be re-sampled at new sampling interval by interpolating estimated data values for observations at the new sampling interval within the time series signal. This may be performed for multiple synchronized time series signals to result in a common sampling interval for the time series signals.
- the six intelligent data preprocessing 315 procedures described above result in a high-quality, “cleansed” or “enhanced” version of archived time series database 305 that may be stored and then retrieved and used to for subsequent machine learning processes. Note that other data preprocessing techniques may also be applied, or fewer than all six of these techniques may be performed in the intelligent data preprocessing 315. All observations from all signals are now preprocessed and optimally re-sampled and "harmonized” using ARP. [0098] The intelligent data preprocessing 315 procedures correct common issues in a typical machine learning dataset. But, each of the steps modifies some of the original data. A record should be kept for each modification so that the original data can be reconstructed.
- each procedure will store a record indicating the changes to the data set.
- the change records for all the intelligent data preprocessing 315 procedures applied to the archived time series database 305 will be stored in a single electronic data structure called an intelligent data preprocessing (IDP) model.
- IDP intelligent data preprocessing
- FIG. 4 illustrates a schematic 400 of one embodiment of storing change records from intelligent data preprocessing 315 in an example IDP model 405.
- IDP model 405 is a complete, ordered record of every action performed on the archived time series database in preparation for machine learning operations.
- Missing value imputation process 320 stores the record (also referred to as a mark or marker) of the locations of missing values in the time series that have been filled with an estimate in a missing value marks data structure 410 in IDP model 405.
- the missing value marks data structure includes a set of arrays associated with time series in the archived time series database 305.
- the missing value marks array may be an array of observation index (such as time) values, where each index or time value indicates an observation in the associated time series where a missing value was filled in with an estimated value.
- Despiking process 325 stores the record of the value and location within the time series of the detected spikes in a spikes data structure 415 in IDP model 405.
- the spikes data structure includes a set of arrays associated with time series in the archived time series database 305. There is a spikes array associated with each time series in which spikes were detected.
- the arrays may be arrays of tuples including an observation index value (such as time) and an associated amplitude value of the spike (that is, the erroneous sensor reading value) at that observation.
- Un-quantizing process 330 stores the record of the values and location within the time series of the detected quantized values in a smoothing model data structure 420 in IDP model 405.
- the smoothing model 420 includes a set of quantized observations arrays associated with time series in the archived time series database 305. There is a quantized observations array associated with each time series in which quantized values were detected. The arrays may be arrays of tuples including an observation index value (such as time) and an associated quantized value at that observation.
- Un-stairstepping process 335 stores the record of the values and location within the time series of the detected stairstepped values in the smoothing model data structure 420.
- the smoothing model 420 includes a set of stairstepped observations arrays associated with time series in the archived time series database 305. There is a stairstepped observations array associated with each time series in which stairstepped values were detected. The arrays may be arrays of tuples including an observation index value (such as time) and an associated stairstepped (original or not yet un-stairstepped) value at that observation.
- the quantized observations array and stairstepped observations array are a single array containing all the original quantized and stairstepped values indexed by their locations within the time series. In this situation, the un-stairstepping process may stairstepped value tuples into quantized observations arrays created by the un-quantizing process 330.
- Uniform sampling process 340 stores the record of the original, un resampled values and their placement within the time series in a timestamp sequence data structure 425 in IDP model 405.
- phase synchronization process 345 stores the record of the original placement of the out-of-phase values within the time series in the timestamp sequence data structure 425.
- the timestamp sequence data structure 420 includes a set of timestamp arrays associated with time series in the archived time series database 305. There is a timestamp array observations array associated with each time series in which out-of-phase observations were detected. The arrays include an observation index value (such as time) for each observation of the time series.
- a processor preprocesses the original time series values to mitigate effects of sensor disturbances on quality of the original time series values, as shown in intelligent data preprocessing 315. Over the course of that intelligent data preprocessing, the processor generates a data preprocessing model that records one or more changes to the original time series values during the preprocessing, as shown for example by IDP model 405.
- the reversal of the state estimation computation described with reference to process block 215, FIG. 2 also includes retrieving the data preprocessing model, such as IDP model 405, from storage or memory; and reversing the preprocess described by the data preprocessing model for each of the state estimates.
- Each of the six intelligent data preprocessing 315 procedures described above may be reversed or otherwise undone by retrieving the original time series values retained in IDP model 405 and replacing the corresponding enhanced values in the enhanced time series database with the original values.
- the reversal processes for the various processes of the intelligent data preprocessing 135 are performed in a reverse order from the order in which the intelligent data preprocessing 135 processes were performed.
- FIG. 5 illustrates a schematic of one embodiment of an MSET model training process 500 associated with ensuring that the results of machine learning models can be audited.
- Archived time series database 305 is provided to a select training data process 505.
- data is selected from the time series data to form a training data set.
- a subset of the observation vectors of the time series is selected to form the training set.
- the selected training data set is then stored in a training data structure 510.
- the intelligent data preprocessing 315 is commenced, and the IDP model 405 is created and stored, as described above with reference to FIGs. 3 and 4.
- the selection of the training data 505 precedes the intelligent data preprocessing 315, and the training data is selected from the unenhanced data.
- the selection of the training data 505 follows the intelligent data preprocessing 315, and the training data is selected from the enhanced data.
- a training vector selection process 515 and MSET model training process 520 loop is commenced.
- a set of training vectors is selected from the training data to provide to the MSET model for training. For example, 50 or 100 vectors may be selected.
- the preprocessed training data is split into two parts to improve the model. For example, even numbered observations may form a first part of the training set, and odd numbered observations may form a second part of the training set.
- the set of training vectors is selected from the odd numbered observations and then from the even numbered observations in an even/odd “hopscotch” vector selection.
- training an MSET model such as MSET model 350 is a deterministic mathematical procedure.
- the MSET training 520 uses time series signals that are representative of the time series recorded in the archived time series database to learn the correlations between the time series signals.
- the MSET model training process 520 uses the enhanced signals to identify all signals in the archived time series database 305 that have any degree of association with any other signals in the archived time series database 305. In one embodiment, identification of associations between signals is performed both for the full universe of signals in archived time series database 305 as a whole, and also performed separately for clusters of signals.
- the final output of the MSET training process 520 is a trained MSET model 350.
- the training process is an MSET2 training process
- the MSET model is an MSET2 model.
- the vector selection 515 and MSET training process 520 may be repeated in a loop (of one or more iterations) until MSET model 350 stabilizes, as indicated at decision block 525. If the MSET model 350 does not change much with a change to the training data set, the model 350 is stable (YES), and the training is complete, and the trained MSET model 350 is stored as a data structure for further use. If the MSET model 350 still changes significantly when the training data set is changed, the model 350 is not yet stable (NO), and an additional set of training vectors are selected at 515, and the MSET model 350 further trained with the additional set of training vectors at 520.
- the trained MSET model 350 is used to compute MSET state estimates 355 of each signal in the archived time series database 305, based upon the empirical correlation patterns learned during model training 520 between each signal and other signals in the archived time series database 305.
- the MSET estimates are highly accurate, although the degree of accuracy may differ based on the extent of the training of the MSET model.
- the MSET estimates, MSET parameters, and training vectors are saved to a report data structure.
- This report data structure may be associated in a database or data structure with a journal entry describing a change to the archived time series database 305. (For example, a change that resulted in the archived state of the archived time series database).
- process blocks 320 through 355 are initiated and performed in response to the change to the archived time series database in order to be included alongside the journal entry describing the change, in order to ensure that data tampering in the archived time series database 305 is detectable.
- activity journal metadata describing one or more queries preceding the change and provenance metadata describing the data underlying the query results is stored along with the journal entry and the report data structure.
- a report may be generated in response to any change to the database under audit, creating a trail of tamper-proof data sets associated with each change to the database.
- both the historical provenance information including the change, the activity journal describing activity that preceded the change, and the provenance information describing the basis of the information presented in response to queries associated with the change
- a record for reconstructing the original data are recorded at each journal entry.
- the complete history of the database under audit is recorded and can be reviewed for audit.
- the record is far more compact than maintaining an additional snapshot of the database, resulting in a significant performance improvement and improved portability of the data used for audit.
- the MSET estimates 355 are stored along with the original raw signals (the MSET parameters) in archived time series database 305. Also stored in archived time series database 305 are “sensor operability flags”, with “1” for fully validated, or “0” for signals for which anomalies were discovered in the sensor that measured the original raw signal. In one embodiment, these sensor operability flags are determined based on the results of fault detection estimation using a sequential probability ratio test (SPRT) to analyze the residual between the MSET estimate and original raw signal value, and label the value as either anomalous or non-anomalous. Where a threshold number of anomalous values occur in the time series data for a particular sensor, the sensor may be flagged as partially or completely inoperable (“0”) in the time series.
- SPRT sequential probability ratio test
- the inoperable flag may indicate signals from failing or degrading sensors / transducers, stuck-at faults, or intermittent problems with sensors / transducers or upstream data collection electronics or networks. Where a number of anomalous values occurring in the time series data for a particular sensor do not exceed the threshold number, the sensor may be flagged as validated as operable ( ⁇ ”) in the time series.
- the threshold may be as low as meeting or exceeding one anomalous reading in the time series, or may be higher as appropriate.
- the threshold may be set by machine learning analysis of the time series, such as MSET analysis. In one embodiment, these flags may be set in a data structure including the time series.
- the trained MSET model 350 may not be very compact. Thus, for portability, it may be desirable to limit the size of the MSET model 350.
- FIG. 6 illustrates a schematic of one embodiment of an MSET model limiting process 600 associated with ensuring that the results of machine learning models can be audited.
- the system initially trains a multivariate state estimation model, such as MSET model 350, with a set of training values selected from the original time series values, for example as shown and described with reference to FIG. 5.
- the complexity of the trained multivariate state estimation model may be reduced by principal component analysis of a matrix for the trained multivariate state estimation model, and limiting the trained multivariate state estimation model to major components of the matrix.
- MSET model 350 may be reduced by principal component analysis of the model 350’s MSET matrix, and limiting the model 350 to major components of the matrix.
- the MSET model consists of more than just the MSET matrix, but the MSET matrix is central to the MSET algorithm.
- the MSET model 350 may be an MSET2 model.
- the system decomposes the matrix associated with the multivariate state estimation module into a set of eigenvectors. For example, the complexity of the MSET model 350 is reduced by performing a singular value decomposition (SVD) 605 of the MSET matrix included in the MSET model 350.
- the SVD 605 consists of a sequence of eigenvectors and their associated eigenvalues.
- the eigenvectors resulting from the SVD 605 are sorted in decreasing order by their respective associated eigenvalues. The system then stores the sorted sequence of eigenvectors and their associated eigenvalues 610, for example as an SVD data structure for further processing.
- the system selects a subset of major eigenvectors from the set of eigenvectors stored in the SVD data structure. For example, the system selects the major eigenvectors 615 from the SVD 605 based on the sorted eigenvectors and eigenvalues 610. The eigenvectors with the largest eigenvalues are the major eigenvectors. These major eigenvectors account for the greatest variability in sensor data over the time series, and are therefore the most informative for state estimation modeling.
- the processor selects the “top” N eigenvectors having the largest associated eigenvalues to be the major eigenvectors with their associated major eigenvalues 620.
- the processor then deletes all eigenvectors and eigenvalues in the SVD data structure except for the top N eigenvectors, removing all eigenvectors (along with their associated eigenvalues) that are not major eigenvectors.
- the system creates a “limited” multivariate state estimation model from the subset of major eigenvectors. For example, the retained top N major eigenvectors and their associated major eigenvalues 620 are provided to an MSET model limiter 625.
- the MSET model 350 is also provided to the MSET model limiter 625.
- the MSET module limiter 625 operates to limit the MSET model 350 to the top N major eigenvectors and their eigenvalues 620. This substantially reduces the amount of data required for encoding the results of the MSET algorithm.
- the MSET model limiter 625 constructs a limited MSET matrix from the major eigenvectors and associated eigenvalues 620.
- the model limiter 625 replaces the original MSET matrix in MSET model 350 with the limited MSET matrix to create a limited MSET model 630.
- the limited MSET model 630 is stored as a data structure in memory or storage.
- the reversal of the state estimation computation also includes generating the reverse of a computation by which the limited multivariate state estimation model (such as limited MSET model 630) forms state estimates.
- the limited multivariate state estimation model such as limited MSET model 630
- the full archived time series data base 305 may be very large. Thus, for portability, it may be desirable to limit the size of the database to the parameters of the limited MSET model 630.
- FIG. 7 illustrates a schematic of one embodiment of a data compression process 700 associated with ensuring that the results of machine learning models can be audited. In data compression process 700, archived time series database 305 is reduced to a minimal size suitable for audit of the uncompressed or original archived time series database 305.
- the system omits training data values from the original time series values in the database under audit, archived time series database 305.
- the system creates a copy of the archived time series database 305 that does not include training data used to train an MSET model.
- training data for an MSET model such as training data 510 for MSET model 350, is removed from the copy of the archived time series database 305.
- all observation records used for training MSET model 350 may be deleted from the copy of the archived time series database 305.
- the copy of the archived time series database 305 is initially created without the training data, avoiding the need to delete it. The system stores the reduced copy of the archived time series database for subsequent processing.
- the system preprocesses the remaining original time series data values to mitigate the effects of sensor disturbances and generating a model that records the changes to the remaining original time series data.
- the intelligent data preprocessing 315 (as described above) may be performed for the reduced copy of the archived time series database 305.
- the system forms an IDP operation model 710 similar to IDP model 405, where the IDP operation model 710 is only for the remaining observation records remaining in the reduced copy of the archived time series database 305 after the training observations were omitted.
- the system stores the pre-processed, reduced copy of the archived time series data base for subsequent processing.
- the system performs state estimation (such as MSET state estimation) for the preprocessed, remaining original time series data values using the limited multivariate state estimation model to create a compressed time series database.
- state estimation such as MSET state estimation
- the limited MSET model 630 and the copy of the archived time series database 305 with the training data omitted are used to perform the MSET operation 715.
- the MSET operation 715 and limited MSET model 630 are an MSET2 operation and an MSET2 model.
- the MSET operation 715 (i) forms a state estimate for each observation based on the remaining parameters, and (ii) removes from the copy of the archived time series database those time series variables that are not parameters of the limited MSET model 630.
- data values that do not inform MSET state estimation are deleted from the copy of the archived time series database, further reducing the size of the data.
- the MSET state estimates formed for each observation and the remaining parameter values for each observation form compressed data 720.
- the compressed data 720 is stored as a data structure in memory or storage.
- the MSET algorithm is performed using the limited MSET model 630.
- the limited MSET model 630 along with the MSET parameters for the limited MSET model 630 constitutes a compressed version of original time series database (compressed data 720).
- the MSET estimates along with the MSET model are stored, representing the original timeseries data in reduced size.
- the system will be able to reverse the stored MSET estimates and the MSET model to reconstitute the original timeseries data, which can then used to validate or invalidate purported original time series data, as discussed below.
- FIGs. 8A and 8B illustrate two example data report formats. Each of these two formats contain sufficient information to perform an audit of the database under audit, such as original archived time series database 305.
- the system generates an electronic data report data structure following one of the two example data report formats.
- the system generates an electronic data report data structure that includes a preprocessing model that records one or more changes to the original time series values during preprocessing to mitigate effects of sensor disturbances on quality of the original time series values; a compressed time series database generated by performing state estimation with a limited multivariate state estimation model trained with a set of training values selected from the original time series values and excluding from the compressed time series database those values that are not parameters of the limited multivariate state estimation model; and one or more of (i) the set of training values and (ii) the limited multivariate state estimation model.
- the first data report format 800 specifies
- Both the first data report format 800 and the second data report format 850 formats specify the IDP operation model 710 and compressed data 720.
- the electronic data report data structure includes a preprocessing model (IDP operation model 710) that records one or more changes to the original time series values during preprocessing to mitigate effects of sensor disturbances on quality of the original time series values (as shown and described with reference to FIGs. 3, 4 and 7).
- the electronic data report data structure also includes a compressed time series database (compressed data 720) generated by performing state estimation with a limited multivariate state estimation model and excluding from the compressed time series database those values that are not parameters of the limited multivariate state estimation model (ash shown and described with reference to FIG. 7).
- the electronic data report data structure also includes one or more of (i) a set of training values selected from the original time series values and used to train the limited multivariate state estimation model, such as in first data report format 800, and (ii) the limited multivariate state estimation model, such as in second data report format 850.
- first data report format 800 is that the details of the MSET (or MSET2) auditability algorithm are not revealed.
- second data report format (850) is the reduced amount of data that must be provided in the report. Note that the amount or volume of compressed data 720 will generally dominate the size of the report in both the first data report format 800 and the second data report format 850, so in practice the two formats may not differ significantly in size.
- the system reverses the MSET computation to reconstitute the raw data as shown at process block 365.
- a data report following one of the data report formats 800, 850 is created and may be used to reconstruct data for audit of the archived time series database 305.
- the reports are stored, for example, alongside journal entries, to await evaluation in the context of an audit of archived time series database 305.
- the reports may be stored for a significant period of time before being used to reconstitute raw data.
- the audit of archived time series database 305 may not occur, and the data reports are never retrieved and used to reconstitute raw data at process block 365.
- the report may be distributed to third parties for external audit of the third party’s copy of the archived time series database.
- FIG. 9 illustrates a schematic of one embodiment of a data reconstruction process 900 using first data report format 800, the data reconstruction process 900 further being associated with ensuring that the results of machine learning models can be audited.
- a limited MSET (or MSET2) model 630 is first trained using the IDP model 405 and training data 510 (for example as shown and described with reference to FIG. 5), and then limited (for example as shown and described with reference to FIG. 6), as shown at process block 905.
- the IDP model 405 is retrieved from storage and provided to the training and limitation process 905.
- the training data 510 is read from the data report 800.
- Creation of the limited MSET model 630 is needed due to the absence of the limited MSET model 630 from the first data report format 800.
- the system identifies a reverse state estimation computation that undoes the steps performed by the state estimation computation to form the state estimates from the original time series data values.
- the MSET (or MSET2) algorithm is reversed and applied to the compressed data 720, as shown at process block 910.
- the steps to form an MSET estimate performed by the trained model 630 are parsed, and a sequence of discrete operations are recorded.
- the inverse operation which will undo the discrete operation is identified, and recorded in a sequence of reverse operations.
- the sequence of reverse operations should be in reverse order from the sequence of discrete operations.
- the inverse operation of the discrete operation is to be performed last in the sequence of reverse operations
- the inverse operation of the second discrete operation is to be performed second to last in the sequence of reverse operations
- so forth such that the inverse operations of the discrete operations are to be performed in the reverse order of the discrete operations.
- the system stores the sequence of reverse operations for subsequent processing.
- the system then generates a set of reverse state estimates for the original time series data from the set of state estimates.
- Each of the reverse state estimates is generated by performing the reverse state estimation for one of the set of state estimates, for example as indicated by the sequence of reverse operations.
- the system performs the reverse of the MSET computations used to create the state estimates stored in the compressed data 720 on those state estimates.
- Reverse MSET estimates of the original time series data values at each observation are created from the state estimates of those values and the observed values of the other parameters stored in the compressed data 720.
- the sequence of reverse operations is executed for the estimated data values at each observation to create reverse MSET estimates for each of the estimated data values.
- the intelligent data preprocessing 315 is also reversed and applied to the reverse MSET estimates, as shown at process block 915.
- the IDP operation model 710 is read from the data report
- the reconstituted data 920 is stored for subsequent use in an audit process, including for example the verification process shown and described with reference to FIG. 3, blocks 310 and 370-380.
- the reconstituted time series data values for each of the state estimates is based on the reverse state estimates.
- the resulting reconstituted time series data approximates the original data in archived time series database 305, and may differ slightly from the original data. Nevertheless, the approximation may be used to verify that the archived time series database 305 is not tampered with or corrupted. Thus, the approximation can be used to verify that an archived time series database 305 that indicates compliance with regulations, or violation of regulations, truly indicates such compliance or violation.
- FIG. 10 illustrates a schematic of one embodiment of a data reconstruction process 1000 using second data report format 850, the data reconstruction process 1000 further being associated with ensuring that the results of machine learning models can be audited.
- the reconstruction process 1000 follows generally the same process steps as process 900, except that the limited MSET model 630 does not need to be first computed, because it is already stored in second data report format 850.
- the order of performing intelligent data preprocessing 315 reversal 1010 and MSET reversal 1015 may be switched in process 1000 from the order of MSET reversal 910 preceding intelligent data preprocessing 315 reversal 915 in process 900.
- the intelligent data preprocessing 315 is first reversed and applied to the MSET estimates in the compressed data 720, as shown at process block 1010.
- the IDP operation model 710 is read from the data report 800 and the original time series values retained in IDP operation model 710 are substituted for any corresponding values in MSET estimates in the compressed data, thereby reversing the intelligent data preprocessing 315 for the compressed data 720.
- the reconstituted time series data values for each of the state estimates is based on reverse state estimates such as those described with reference to process block 1015.
- the MSET (or MSET2) algorithm is reversed and applied to the compressed data 720, as shown at process block 1015, in a similar manner as that described with reference to process block 910 above.
- the system performs the reverse of the MSET computations used to create the state estimates stored in the compressed data 720 on those state estimates.
- Reverse MSET estimates of the original time series data values at each observation are created from the state estimates of those values and the observed values of the other parameters stored in the compressed data 720.
- the reverse MSET estimates and replaced intelligent data preprocessing values from process block 1010 form reconstituted data, approximate original data 1020.
- the reconstituted data 1020 is stored for subsequent use in an audit process which may include, for example, the validation process shown and described with reference to FIG.
- the resulting time series data approximates the original data in archived time series database 305, and may differ slightly from the original data, but may be used to verify that the archived time series database 305 is not tampered with or corrupted.
- the system proceeds to perform an audit of the state (intact or corrupted/tampered with) of an original time series in the archived time series database 305.
- the correctness of a reported time series is verified using the reconstituted data: the original data is compared with reconstituted (or reconstructed) data, for example reconstituted data 920 resulting from process 900 or reconstituted data 1020 resulting from process 1000.
- the two data streams, original time series and reconstituted time series are compared pairwise, as shown at pairwise difference analyzer, process block 310.
- the pairwise difference analyzer 310 compares the original time series data values and the reconstituted time series data values for each observation of the two time series to see if the reconstituted value matches (or closely approximates within a threshold) the original value. [00151] In one embodiment, the pairwise difference analyzer compares each the original time series data value and the reconstituted time series data value for each observation to determine if it varies by more than a preset threshold amount, for example, a percentage amount. In another embodiment, the pairwise difference analyzer compares each original time series data value and the reconstituted time series data value for each observation to determine if it triggers a fault detection using a trained fault detection model included in the trained, limited MSET2 model 630. The fault detection model may employ a sequential probability ratio test (SPRT) to analyze the residuals between the original time series data value and the reconstituted time series data value for each observation to determine whether or not the purported original time series data is anomalous.
- SPRT sequential probability ratio test
- a ‘passed’ verification report indicating that the original time series data is intact is generated.
- the passed verification report is a signal that when parsed, indicates that the original time series data is intact.
- the passed verification report is a human readable document indicating that the original time series data is intact.
- the system in response to the passed verification report, the system generates and either executes or transmits for execution an instruction to display an indication that the original time series data is intact on a graphical user interface (GUI).
- GUI graphical user interface
- the GUI may be associated with a utility entity or a regulatory entity. Processing in process 300 then ends.
- a ‘failed’ verification report indicating that the original time series data is corrupted or tampered with is generated.
- the failed verification report is a signal that when parsed, indicates that the original time series data is corrupted or tampered with.
- the failed verification report is a human readable document indicating that the original time series data is corrupted or tampered with.
- the system in response to the failed verification report, the system generates and either executes or transmits for execution an instruction to display an indication that the original time series data is corrupted or tampered with on a GUI.
- the GUI may be associated with a utility entity or a regulatory entity. Processing in process 300 then ends.
- the system in response to the signal that the database under audit has not been modified, the system generates an electronic verification report message indicating that the database under verification is certified to be uncorrupted and not tampered with. In response to the signal that the database under audit has been modified, the system generates an electronic verification report message indicating that the database under audit is either corrupted or tampered with. The system may then transmit the generated electronic verification report to a computing device to cause the verification report message to be stored by the computing device or displayed by the computing device. In one embodiment, the computing device may be associated with a utility entity or a regulatory entity.
- software instructions are designed to be executed by a suitably programmed processor. These software instructions may include, for example, computer-executable code and source code that may be compiled into computer-executable code. These software instructions may also include instructions written in an interpreted programming language, such as a scripting language.
- Such instructions are typically arranged into program modules with each such module performing a specific task, process, function, or operation.
- the entire set of modules may be controlled or coordinated in their operation by an operating system (OS) or other form of organizational platform.
- OS operating system
- one or more of the components, functions, methods, or processes described herein are configured as modules stored in a non-transitory computer readable medium.
- the modules are configured with stored software instructions that when executed by at least a processor accessing memory or storage cause the computing device to perform the corresponding function(s) as described herein.
- [00160] Cloud System, Multi-Tenant, and Enterprise Embodiments —
- the present system is a computing/data processing system including an application or collection of distributed applications for enterprise organizations.
- the applications and computing system may be configured to operate with or be implemented as a cloud-based networking system, a software as a service (SaaS) architecture, or other type of networked computing solution.
- SaaS software as a service
- the present system is a centralized server-side application that provides at least the functions disclosed herein and that is accessed by many users via computing devices/terminals communicating with the computing system (functioning as the server) over a computer network.
- FIG. 11 illustrates an example computing device 1100 that is configured and/or programmed with one or more of the example systems and methods described herein, and/or equivalents.
- the example computing device may be a computer 1105 that includes a processor 1110, a memory 1115, and input/output ports 1120 operably connected by a bus 1125.
- the computer 1105 may include machine learning audit assurance logic 1130 configured to facilitate ensuring that the results of machine learning models can be audited, similar to logic and systems shown in FIGs. 1 through 10.
- the logic 1130 may be implemented in hardware, a non-transitory computer-readable medium with stored instructions, firmware, and/or combinations thereof.
- logic 1130 is illustrated as a hardware component attached to the bus 1125, it is to be appreciated that in other embodiments, the logic 1130 could be implemented in the processor 1110, stored in memory 1115, or stored in disk 1135.
- logic 1130 or the computer is a means (e.g., structure: hardware, non-transitory computer-readable medium, firmware) for performing the actions described.
- the computing device may be a server operating in a cloud computing system, a server configured in a Software as a Service (SaaS) architecture, a smart phone, laptop, tablet computing device, and so on.
- SaaS Software as a Service
- the means may be implemented, for example, as an ASIC programmed to ensure that the results of machine learning models can be audited.
- the means may also be implemented as stored computer executable instructions that are presented to computer 1105 as data 1140 that are temporarily stored in memory 1115 and then executed by processor 1110.
- Logic 1130 may also provide means (e.g., hardware, non-transitory computer-readable medium that stores executable instructions, firmware) for ensuring that the results of machine learning models can be audited.
- means e.g., hardware, non-transitory computer-readable medium that stores executable instructions, firmware
- the processor 1110 may be a variety of various processors including dual microprocessor and other multi-processor architectures.
- a memory 1115 may include volatile memory and/or non-volatile memory.
- Non-volatile memory may include, for example, ROM, PROM, and so on.
- Volatile memory may include, for example, RAM, SRAM, DRAM, and so on.
- a storage disk 1135 may be operably connected to the computer 1100 via, for example, an input/output (I/O) interface (for example, card, device) 1145 and an input/output port 1120.
- the disk 1135 may be, for example, a magnetic disk drive, a solid-state disk drive, a floppy disk drive, a tape drive, a Zip drive, a flash memory card, a memory stick, and so on.
- the disk 1135 may be a CD-ROM drive, a CD-R drive, a CD-RW drive, a DVD ROM, and so on.
- the memory 1115 can store a process 1150 and/or a data 1140, for example.
- the disk 1135 and/or the memory 1115 can store an operating system that controls and allocates resources of the computer 1105.
- the computer 1105 may interact with input/output (I/O) devices via the I/O interfaces 1145 and the input/output ports 1120.
- I/O input/output
- Input/output devices may be, for example, a keyboard 1180, a microphone 1184, a pointing and selection device 1182, cameras 1186, video cards, displays 1170, scanners 1188, printers 1172, speakers 1174, the disk 1135, the network devices 1155, and so on.
- the input/output ports 1120 may include, for example, serial ports, parallel ports, and USB ports.
- the computer 1105 can operate in a network environment and thus may be connected to the network devices 1155 via the I/O interfaces 1145, and/or the I/O ports 1120. Through the network devices 1155, the computer 1105 may interact with a network 1160. Through the network 1160, the computer 1105 may be logically connected to remote computers 1165. Networks with which the computer 1105 may interact include, but are not limited to, a LAN, a WAN, and other networks.
- a non-transitory computer readable/storage medium is configured with stored computer executable instructions of an algorithm/executable application that when executed by a machine(s) cause the machine(s) (and/or associated components) to perform the method.
- Example machines include but are not limited to a processor, a computer, a server operating in a cloud computing system, a server configured in a Software as a Service (SaaS) architecture, a smart phone, and so on).
- SaaS Software as a Service
- a computing device is implemented with one or more executable algorithms that are configured to perform any of the disclosed methods.
- the disclosed methods or their equivalents are performed by either: computer hardware configured to perform the method; or computer instructions embodied in a module stored in a non-transitory computer-readable medium where the instructions are configured as an executable algorithm configured to perform the method when executed by at least a processor of a computing device.
- references to “one embodiment”, “an embodiment”, “one example”, “an example”, and so on, indicate that the embodiment(s) or example(s) so described may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment or example necessarily includes that particular feature, structure, characteristic, property, element or limitation. Furthermore, repeated use of the phrase “in one embodiment” does not necessarily refer to the same embodiment, though it may.
- ASIC application specific integrated circuit
- CD compact disk.
- CD-R CD recordable.
- CD-RW CD rewriteable.
- DVD digital versatile disk and/or digital video disk.
- HTTP hypertext transfer protocol
- LAN local area network
- RAM random access memory
- DRAM dynamic RAM
- SRAM synchronous RAM.
- ROM read only memory
- PROM programmable ROM.
- EPROM erasable PROM
- EEPROM electrically erasable PROM.
- USB universal serial bus
- XML extensible markup language
- WAN wide area network
- a “data structure”, as used herein, is an organization of data in a computing system that is stored in a memory, a storage device, or other computerized system.
- a data structure may be any one of, for example, a data field, a data file, a data array, a data record, a database, a data table, a graph, a tree, a linked list, and so on.
- a data structure may be formed from and contain many other data structures (e.g., a database includes many data records). Other examples of data structures are possible as well, in accordance with other embodiments.
- Computer-readable medium refers to a non-transitory medium that stores instructions and/or data configured to perform one or more of the disclosed functions when executed. Data may function as instructions in some embodiments.
- a computer-readable medium may take forms, including, but not limited to, non-volatile media, and volatile media. Non-volatile media may include, for example, optical disks, magnetic disks, and so on. Volatile media may include, for example, semiconductor memories, dynamic memory, and so on.
- a computer-readable medium may include, but are not limited to, a floppy disk, a flexible disk, a hard disk, a magnetic tape, other magnetic medium, an application specific integrated circuit (ASIC), a programmable logic device, a compact disk (CD), other optical medium, a random access memory (RAM), a read only memory (ROM), a memory chip or card, a memory stick, solid state storage device (SSD), flash drive, and other media from which a computer, a processor or other electronic device can function with.
- ASIC application specific integrated circuit
- CD compact disk
- RAM random access memory
- ROM read only memory
- memory chip or card a memory chip or card
- SSD solid state storage device
- flash drive and other media from which a computer, a processor or other electronic device can function with.
- Each type of media if selected for implementation in one embodiment, may include stored instructions of an algorithm configured to perform one or more of the disclosed and/or claimed functions.
- Logic represents a component that is implemented with computer or electrical hardware, a non-transitory medium with stored instructions of an executable application or program module, and/or combinations of these to perform any of the functions or actions as disclosed herein, and/or to cause a function or action from another logic, method, and/or system to be performed as disclosed herein.
- Equivalent logic may include firmware, a microprocessor programmed with an algorithm, a discrete logic (e.g., ASIC), at least one circuit, an analog circuit, a digital circuit, a programmed logic device, a memory device containing instructions of an algorithm, and so on, any of which may be configured to perform one or more of the disclosed functions.
- logic may include one or more gates, combinations of gates, or other circuit components configured to perform one or more of the disclosed functions. Where multiple logics are described, it may be possible to incorporate the multiple logics into one logic. Similarly, where a single logic is described, it may be possible to distribute that single logic between multiple logics. In one embodiment, one or more of these logics are corresponding structure associated with performing the disclosed and/or claimed functions. Choice of which type of logic to implement may be based on desired system conditions or specifications. For example, if greater speed is a consideration, then hardware would be selected to implement functions. If a lower cost is a consideration, then stored instructions/executable application would be selected to implement the functions.
- An “operable connection”, or a connection by which entities are “operably connected”, is one in which signals, physical communications, and/or logical communications may be sent and/or received.
- An operable connection may include a physical interface, an electrical interface, and/or a data interface.
- An operable connection may include differing combinations of interfaces and/or connections sufficient to allow operable control.
- two entities can be operably connected to communicate signals to each other directly or through one or more intermediate entities (e.g., processor, operating system, logic, non- transitory computer-readable medium).
- Logical and/or physical communication channels can be used to create an operable connection.
- “User”, as used herein, includes but is not limited to one or more persons, computers or other devices, or combinations of these.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Operations Research (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Testing And Monitoring For Control Systems (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP21707123.2A EP4128086B1 (en) | 2020-03-23 | 2021-01-29 | System and method for ensuring that the results of machine learning models can be audited |
| CN202180015990.4A CN115176254B (zh) | 2020-03-23 | 2021-01-29 | 确保机器学习模型结果可以被审计的系统和方法 |
| JP2022557900A JP7644140B2 (ja) | 2020-03-23 | 2021-01-29 | 機械学習モデルの結果を確実に監査できるようにするためのシステムおよび方法 |
| JP2025029274A JP2025098009A (ja) | 2020-03-23 | 2025-02-26 | 機械学習モデルの結果を確実に監査できるようにするためのシステムおよび方法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/826,478 US11948051B2 (en) | 2020-03-23 | 2020-03-23 | System and method for ensuring that the results of machine learning models can be audited |
| US16/826,478 | 2020-03-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021194631A1 true WO2021194631A1 (en) | 2021-09-30 |
Family
ID=74672473
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/015802 Ceased WO2021194631A1 (en) | 2020-03-23 | 2021-01-29 | System and method for ensuring that the results of machine learning models can be audited |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11948051B2 (https=) |
| EP (1) | EP4128086B1 (https=) |
| JP (2) | JP7644140B2 (https=) |
| CN (1) | CN115176254B (https=) |
| WO (1) | WO2021194631A1 (https=) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11948051B2 (en) * | 2020-03-23 | 2024-04-02 | Oracle International Corporation | System and method for ensuring that the results of machine learning models can be audited |
| US20240012388A1 (en) * | 2020-10-21 | 2024-01-11 | Full Speed Automation, Inc. | System and apparatus for optimizing the energy consumption of manufacturing equipment |
| US20220156574A1 (en) * | 2020-11-19 | 2022-05-19 | Kabushiki Kaisha Toshiba | Methods and systems for remote training of a machine learning model |
| US20220261689A1 (en) * | 2021-02-12 | 2022-08-18 | Oracle International Corporation | Off-duty-cycle-robust machine learning for anomaly detection in assets with random down times |
| CN113778284B (zh) * | 2021-09-24 | 2024-06-04 | 北京字跳网络技术有限公司 | 审核信息显示方法、装置、设备和存储介质 |
| US20230214555A1 (en) * | 2021-12-30 | 2023-07-06 | PassiveLogic, Inc. | Simulation Training |
| CN116738143B (zh) * | 2023-05-05 | 2026-02-10 | 北京云星宇交通科技股份有限公司 | 一种用于桥梁健康监测系统的智能监测采集站 |
| CN116494816B (zh) * | 2023-06-30 | 2023-09-15 | 江西驴宝宝通卡科技有限公司 | 充电桩的充电管理系统及其方法 |
| TWI860054B (zh) * | 2023-08-22 | 2024-10-21 | 國立清華大學 | 訓練機器學習模型的方法、裝置和電腦程式產品 |
| US12519838B2 (en) | 2024-05-13 | 2026-01-06 | Florida Power & Light Company | Substantiating a compliance standard with secondary evidence |
| CN121098201B (zh) * | 2025-11-07 | 2026-03-27 | 中国科学院宁波材料技术与工程研究所 | 一种基于条件流匹配的电机控制参数数据生成方法 |
| CN121256649B (zh) * | 2025-12-03 | 2026-02-27 | 清华大学深圳国际研究生院 | 一种时间序列异常检测方法、电子设备及介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190197145A1 (en) * | 2017-12-21 | 2019-06-27 | Oracle International Corporation | Mset-based process for certifying provenance of time-series data in a time-series database |
| US20190286725A1 (en) * | 2018-03-19 | 2019-09-19 | Oracle International Corporation | Intelligent preprocessing of multi-dimensional time-series data |
| US10496084B2 (en) | 2018-04-06 | 2019-12-03 | Oracle International Corporation | Dequantizing low-resolution IoT signals to produce high-accuracy prognostic indicators |
Family Cites Families (51)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH05161265A (ja) * | 1991-11-28 | 1993-06-25 | Toshiba Corp | 電力系統監視装置 |
| US7020802B2 (en) | 2002-10-17 | 2006-03-28 | Sun Microsystems, Inc. | Method and apparatus for monitoring and recording computer system performance parameters |
| US7702485B2 (en) | 2006-12-06 | 2010-04-20 | Oracle America, Inc. | Method and apparatus for predicting remaining useful life for a computer system |
| US7613580B2 (en) | 2007-04-12 | 2009-11-03 | Sun Microsystems, Inc. | Method and apparatus for generating an EMI fingerprint for a computer system |
| US7613576B2 (en) | 2007-04-12 | 2009-11-03 | Sun Microsystems, Inc. | Using EMI signals to facilitate proactive fault monitoring in computer systems |
| US7680624B2 (en) | 2007-04-16 | 2010-03-16 | Sun Microsystems, Inc. | Method and apparatus for performing a real-time root-cause analysis by analyzing degrading telemetry signals |
| US8069490B2 (en) | 2007-10-16 | 2011-11-29 | Oracle America, Inc. | Detecting counterfeit electronic components using EMI telemetric fingerprints |
| US8055594B2 (en) | 2007-11-13 | 2011-11-08 | Oracle America, Inc. | Proactive detection of metal whiskers in computer systems |
| US8200991B2 (en) | 2008-05-09 | 2012-06-12 | Oracle America, Inc. | Generating a PWM load profile for a computer system |
| US8457913B2 (en) | 2008-06-04 | 2013-06-04 | Oracle America, Inc. | Computer system with integrated electromagnetic-interference detectors |
| US20100023282A1 (en) | 2008-07-22 | 2010-01-28 | Sun Microsystem, Inc. | Characterizing a computer system using radiating electromagnetic signals monitored through an interface |
| US7869977B2 (en) | 2008-08-08 | 2011-01-11 | Oracle America, Inc. | Using multiple antennas to characterize a computer system based on electromagnetic signals |
| US9152530B2 (en) * | 2009-05-14 | 2015-10-06 | Oracle America, Inc. | Telemetry data analysis using multivariate sequential probability ratio test |
| US8275738B2 (en) | 2009-05-27 | 2012-09-25 | Oracle America, Inc. | Radio frequency microscope for amplifying and analyzing electromagnetic signals by positioning the monitored system at a locus of an ellipsoidal surface |
| US8543346B2 (en) | 2009-05-29 | 2013-09-24 | Oracle America, Inc. | Near-isotropic antenna for monitoring electromagnetic signals |
| US10475754B2 (en) | 2011-03-02 | 2019-11-12 | Nokomis, Inc. | System and method for physically detecting counterfeit electronics |
| EP2790604B1 (en) | 2011-12-13 | 2022-10-19 | Koninklijke Philips N.V. | Distortion fingerprinting for electromagnetic tracking compensation, detection and error correction |
| US8548497B2 (en) | 2011-12-16 | 2013-10-01 | Microsoft Corporation | Indoor localization using commercial frequency-modulated signals |
| US9851386B2 (en) | 2012-03-02 | 2017-12-26 | Nokomis, Inc. | Method and apparatus for detection and identification of counterfeit and substandard electronics |
| US9305073B1 (en) | 2012-05-24 | 2016-04-05 | Luminix, Inc. | Systems and methods for facilitation communications among customer relationship management users |
| JP5530020B1 (ja) | 2013-11-01 | 2014-06-25 | 株式会社日立パワーソリューションズ | 異常診断システム及び異常診断方法 |
| US10395032B2 (en) | 2014-10-03 | 2019-08-27 | Nokomis, Inc. | Detection of malicious software, firmware, IP cores and circuitry via unintended emissions |
| US9594147B2 (en) | 2014-10-03 | 2017-03-14 | Apple Inc. | Wireless electronic device with calibrated reflectometer |
| JP7064333B2 (ja) * | 2015-03-23 | 2022-05-10 | オラクル・インターナショナル・コーポレイション | 知識集約型データ処理システム |
| US10116675B2 (en) | 2015-12-08 | 2018-10-30 | Vmware, Inc. | Methods and systems to detect anomalies in computer system behavior based on log-file sampling |
| US10331802B2 (en) * | 2016-02-29 | 2019-06-25 | Oracle International Corporation | System for detecting and characterizing seasons |
| US10859609B2 (en) | 2016-07-06 | 2020-12-08 | Power Fingerprinting Inc. | Methods and apparatuses for characteristic management with side-channel signature analysis |
| US10234492B2 (en) | 2016-08-31 | 2019-03-19 | Ca, Inc. | Data center monitoring based on electromagnetic wave detection |
| JP6715740B2 (ja) | 2016-10-13 | 2020-07-01 | 株式会社日立製作所 | 電力系統の潮流監視装置、電力系統安定化装置および電力系統の潮流監視方法 |
| WO2018101363A1 (ja) * | 2016-11-30 | 2018-06-07 | 日本電気株式会社 | 状態推定装置と方法とプログラム |
| US11429885B1 (en) * | 2016-12-21 | 2022-08-30 | Cerner Innovation | Computer-decision support for predicting and managing non-adherence to treatment |
| US10896064B2 (en) | 2017-03-27 | 2021-01-19 | International Business Machines Corporation | Coordinated, topology-aware CPU-GPU-memory scheduling for containerized workloads |
| CN107181543B (zh) | 2017-05-23 | 2020-10-27 | 张一嘉 | 一种基于传播模型和位置指纹的三维室内无源定位方法 |
| US10817803B2 (en) | 2017-06-02 | 2020-10-27 | Oracle International Corporation | Data driven methods and systems for what if analysis |
| US20190102718A1 (en) | 2017-09-29 | 2019-04-04 | Oracle International Corporation | Techniques for automated signal and anomaly detection |
| US10452510B2 (en) | 2017-10-25 | 2019-10-22 | Oracle International Corporation | Hybrid clustering-partitioning techniques that optimizes accuracy and compute cost for prognostic surveillance of sensor data |
| US10606919B2 (en) | 2017-11-29 | 2020-03-31 | Oracle International Corporation | Bivariate optimization technique for tuning SPRT parameters to facilitate prognostic surveillance of sensor data from power plants |
| JP2019117428A (ja) * | 2017-12-26 | 2019-07-18 | 株式会社クボタ | 管理サーバ、監視端末、管理サーバの制御方法、および制御プログラム |
| US10977110B2 (en) | 2017-12-27 | 2021-04-13 | Palo Alto Research Center Incorporated | System and method for facilitating prediction data for device based on synthetic data with uncertainties |
| US11392850B2 (en) | 2018-02-02 | 2022-07-19 | Oracle International Corporation | Synthesizing high-fidelity time-series sensor signals to facilitate machine-learning innovations |
| JP7021010B2 (ja) * | 2018-06-06 | 2022-02-16 | 株式会社Nttドコモ | 機械学習システム |
| US11775873B2 (en) | 2018-06-11 | 2023-10-03 | Oracle International Corporation | Missing value imputation technique to facilitate prognostic analysis of time-series sensor data |
| US11188691B2 (en) | 2018-12-21 | 2021-11-30 | Utopus Insights, Inc. | Scalable system and method for forecasting wind turbine failure using SCADA alarm and event logs |
| US11409992B2 (en) | 2019-06-10 | 2022-08-09 | International Business Machines Corporation | Data slicing for machine learning performance testing and improvement |
| US11055396B2 (en) | 2019-07-09 | 2021-07-06 | Oracle International Corporation | Detecting unwanted components in a critical asset based on EMI fingerprints generated with a sinusoidal load |
| US20210081573A1 (en) | 2019-09-16 | 2021-03-18 | Oracle International Corporation | Merged surface fast scan technique for generating a reference emi fingerprint to detect unwanted components in electronic systems |
| US11797882B2 (en) | 2019-11-21 | 2023-10-24 | Oracle International Corporation | Prognostic-surveillance technique that dynamically adapts to evolving characteristics of a monitored asset |
| US11367018B2 (en) | 2019-12-04 | 2022-06-21 | Oracle International Corporation | Autonomous cloud-node scoping framework for big-data machine learning use cases |
| CN110941020B (zh) | 2019-12-16 | 2022-06-10 | 杭州安恒信息技术股份有限公司 | 基于电磁泄漏的盗摄器材的检测方法及装置 |
| US11255894B2 (en) | 2020-02-28 | 2022-02-22 | Oracle International Corporation | High sensitivity detection and identification of counterfeit components in utility power systems via EMI frequency kiviat tubes |
| US11948051B2 (en) * | 2020-03-23 | 2024-04-02 | Oracle International Corporation | System and method for ensuring that the results of machine learning models can be audited |
-
2020
- 2020-03-23 US US16/826,478 patent/US11948051B2/en active Active
-
2021
- 2021-01-29 WO PCT/US2021/015802 patent/WO2021194631A1/en not_active Ceased
- 2021-01-29 CN CN202180015990.4A patent/CN115176254B/zh active Active
- 2021-01-29 JP JP2022557900A patent/JP7644140B2/ja active Active
- 2021-01-29 EP EP21707123.2A patent/EP4128086B1/en active Active
-
2024
- 2024-03-25 US US18/615,599 patent/US12367423B2/en active Active
-
2025
- 2025-02-26 JP JP2025029274A patent/JP2025098009A/ja active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190197145A1 (en) * | 2017-12-21 | 2019-06-27 | Oracle International Corporation | Mset-based process for certifying provenance of time-series data in a time-series database |
| US20190286725A1 (en) * | 2018-03-19 | 2019-09-19 | Oracle International Corporation | Intelligent preprocessing of multi-dimensional time-series data |
| US10496084B2 (en) | 2018-04-06 | 2019-12-03 | Oracle International Corporation | Dequantizing low-resolution IoT signals to produce high-accuracy prognostic indicators |
Also Published As
| Publication number | Publication date |
|---|---|
| US20240265308A1 (en) | 2024-08-08 |
| EP4128086B1 (en) | 2025-12-24 |
| US12367423B2 (en) | 2025-07-22 |
| JP2025098009A (ja) | 2025-07-01 |
| US11948051B2 (en) | 2024-04-02 |
| JP2023518866A (ja) | 2023-05-08 |
| JP7644140B2 (ja) | 2025-03-11 |
| EP4128086A1 (en) | 2023-02-08 |
| US20210295210A1 (en) | 2021-09-23 |
| CN115176254B (zh) | 2026-04-17 |
| CN115176254A (zh) | 2022-10-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12367423B2 (en) | System and method for ensuring that the results of machine learning models can be audited | |
| CN120672510B (zh) | 基于智能分级的电力数据安全合规管理方法及系统 | |
| CN120429162B (zh) | 食品安全抽检数据校验方法、系统及其计算机存储介质 | |
| CN120611161B (zh) | 基于边缘联邦的复杂环境下多源设备健康协同运维方法 | |
| CN121190204A (zh) | 一种基于人工智能的数字经济风险识别系统及方法 | |
| Leao et al. | Big data processing for power grid event detection | |
| CN120582692B (zh) | 一种光模块细粒度健康度评估方法、系统、装置及介质 | |
| CN120338287B (zh) | 三级触发式碳排放因子动态更新系统及方法 | |
| WO2025096601A1 (en) | Systems and methods for aggregating and generating a single incident profile | |
| WO2025096101A1 (en) | Systems and methods for aggregating and mapping incident characteristics into daily incident profiles | |
| CN120162716A (zh) | 基于多维特征的硬盘异常检测方法、装置、介质及设备 | |
| CN119396931A (zh) | 一种dcs健康管理数据仓库系统 | |
| CN120150957B (zh) | 基于数据挖掘的疫苗流通数据加密方法及系统 | |
| US20250138972A1 (en) | Systems and methods for aggregating and generating a daily incident profile | |
| US20250291900A1 (en) | Systems and methods for anomaly detection in network devices | |
| US20250086219A1 (en) | Systems and methods for identifying alert characteristics from early sequencing | |
| US20260037560A1 (en) | Systems and methods of incident resolution using chunking, vector embedding, and clustering techniques | |
| CN119070483B (zh) | 虚拟电厂控制系统 | |
| CN118568625B (zh) | 一种应用于审计系统的监控装置及方法 | |
| US20260037533A1 (en) | Systems and methods for determining historically similar incidents using multivariate embeddings | |
| Paes Leao et al. | Big Data Processing for Power Grid Event Detection | |
| CN120372636A (zh) | 基于智能电网的数据分析方法及系统 | |
| Rahman et al. | Hybrid CNN-LSTM Based Anomaly Detection for Energy Theft in Residential Smart Meter Data | |
| CN121116669A (zh) | 一种地球物理台网多算法融合数据处理方法及系统 | |
| CN120768532A (zh) | 用于延保服务的数据存储管理方法及平台 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21707123 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022557900 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2021707123 Country of ref document: EP Effective date: 20221024 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2021707123 Country of ref document: EP |