CN111985677A - Causal link analysis method and device and computer readable storage medium - Google Patents

Causal link analysis method and device and computer readable storage medium Download PDF

Info

Publication number
CN111985677A
CN111985677A CN202010618648.0A CN202010618648A CN111985677A CN 111985677 A CN111985677 A CN 111985677A CN 202010618648 A CN202010618648 A CN 202010618648A CN 111985677 A CN111985677 A CN 111985677A
Authority
CN
China
Prior art keywords
directed
link
causal
correlation
specified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010618648.0A
Other languages
Chinese (zh)
Inventor
唐建权
杨帆
金继民
张成松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202010618648.0A priority Critical patent/CN111985677A/en
Publication of CN111985677A publication Critical patent/CN111985677A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Manufacturing & Machinery (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a causal link analysis method, a causal link analysis device and a computer readable storage medium, wherein the method comprises the following steps: recording and analyzing a production process of a specified object to obtain a plurality of specified variables corresponding to the production process; determining undirected correlation links between the plurality of specified variables based on a correlation judgment rule; predicting the undirected correlation link according to a causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlation link; determining a directed causal link among the plurality of specified variables according to the directed causal probability value, wherein the directed causal link is used for representing a causal relationship of the specified object production process; the sequence of mutual influence among all the specified variables in the production process can be known.

Description

Causal link analysis method and device and computer readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a causal link analysis method and apparatus, and a computer-readable storage medium.
Background
In the industrial field, a plurality of process parameters exist in the production process, and the process parameters are strongly coupled with each other, and the change of one process parameter can cause the change of one or more other process parameters, thereby affecting the final product. When one empirically adjusts the process parameters in a targeted manner, a "chain reaction" may be caused, such as increasing the temperature at a, which may result in an excessive temperature at B. At present, in the process of modeling and analyzing process parameters, the process parameters are usually adjusted by analyzing the correlation among the process parameters, but the correlation analysis can only determine the correlation among the process parameters, and can not accurately predict what results can be caused after the process parameters are adjusted, so that the process parameters can not be accurately controlled, and the effects in the aspects of improving the product quality, intelligently early warning devices and the like are not ideal.
Disclosure of Invention
The embodiment of the invention provides a causal link analysis method, causal link analysis equipment and a computer readable storage medium, which can know the sequence of mutual influence among specified variables in a production process.
One aspect of the present invention provides a causal link analysis method, including: recording and analyzing a production process of a specified object to obtain a plurality of specified variables corresponding to the production process; determining undirected correlation links between the plurality of specified variables based on a correlation judgment rule; predicting the undirected correlation link according to a causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlation link; and determining a directed causal link among the plurality of specified variables according to the directed causal probability value, wherein the directed causal link is used for representing a causal relation of the specified object production process.
In an embodiment, the performing record analysis on the production process of the specified object to obtain a plurality of specified variables corresponding to the production process includes: recording the operation process of the specified object through a distributed control system to obtain an operation record; screening and supplementing the operation records to obtain associated information corresponding to the production process; wherein the production process is included in the operational process; and carrying out standardization processing on the associated information to obtain a specified variable.
In an embodiment, the determining the undirected correlation link between the plurality of specified variables based on the correlation determination rule includes: establishing a directed complete graph corresponding to the specified variables, wherein the directed complete graph comprises a plurality of bidirectional links for connecting the specified variables, and any bidirectional link comprises two directed opposite directed links; determining a correlation value corresponding to each directed link according to the plurality of designated variables; screening correlation values meeting a first threshold value, and determining directed links corresponding to the correlation values meeting the first threshold value as first directed links; and determining a directed correlation graph according to the plurality of specified variables and the first directed link, wherein the directed correlation graph is used for representing the undirected correlation link between the specified variables.
In one embodiment, the causal probability prediction model is a graphical neural network model; correspondingly, the predicting the undirected correlated link according to the causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlated link includes: determining a variable set and a link set according to the directed correlation graph; wherein the variable set comprises a plurality of specified variables, and the link set comprises a first directed link; predicting the variable set and the link set through a graph neural network model to obtain a causal probability set corresponding to the link set, wherein the causal probability set comprises a directed causal probability value corresponding to the first directed link.
In an embodiment, determining a directed causal link between the plurality of specified variables according to the directed causal probability values comprises: screening directed causal probability values meeting a second threshold value, and determining directed links corresponding to the directed causal probability values meeting the second threshold value as second directed links; determining a directed causal link graph from the plurality of specified variables and the second directed link, the directed causal link graph characterizing directed causal links between the specified variables.
In an embodiment, before the determining the undirected correlation link between the plurality of specified variables based on the correlation determination rule, the method further comprises: marking causal links between at least two specified variables, and determining known directed causal links; semi-supervised learning is performed on the causal probability prediction model based on the known directed causal link to obtain a causal probability prediction model.
Another aspect of the present invention provides a causal link analysis apparatus, comprising: the analysis module is used for recording and analyzing the production process of the specified object to obtain a plurality of specified variables; a determining module, configured to determine a non-directional correlation link between the plurality of designated variables based on a correlation determination rule; the prediction module is used for predicting the undirected correlation link according to a causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlation link; the determining module is further configured to determine a directed causal link between the plurality of specified variables according to the directed causal probability value.
In one embodiment, the analysis module includes: the recording submodule is used for recording the operation process of the specified object through a distributed control system to obtain an operation record; the screening and supplementing submodule is used for screening and supplementing the operation records to obtain associated information corresponding to the production process; wherein the production process is included in the operational process; and the processing submodule is used for carrying out standardization processing on the associated information to obtain the specified variable.
In an embodiment, the determining module includes: the establishing submodule is used for establishing a directed complete graph corresponding to the specified variables, the directed complete graph comprises a plurality of bidirectional links for connecting the specified variables, and any bidirectional link comprises two directed links with opposite directions; the determining submodule is used for determining a correlation value corresponding to each directed link according to the plurality of specified variables; the screening submodule is used for screening correlation values meeting a first threshold value, and determining directed links corresponding to the correlation values meeting the first threshold value as first directed links; the determining submodule is further configured to determine a directed correlation graph according to the plurality of specified variables and the first directed link, where the directed correlation graph is used to characterize undirected correlation links between the specified variables.
In one embodiment, the causal probability prediction model is a graphical neural network model; correspondingly, the predicting the undirected correlated link according to the causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlated link includes: determining a variable set and a link set according to the directed correlation graph; wherein the variable set comprises a plurality of specified variables, and the link set comprises a first directed link; predicting the variable set and the link set through a graph neural network model to obtain a causal probability set corresponding to the link set, wherein the causal probability set comprises a directed causal probability value corresponding to the first directed link.
In an implementation manner, the screening sub-module is further configured to screen a directed causal probability value meeting a second threshold, and determine a directed link corresponding to the directed causal probability value meeting the second threshold as a second directed link; the determining submodule is further configured to determine a directed causal link map according to the plurality of specified variables and the second directed link, where the directed causal link map is used to characterize directed causal links between the specified variables.
In an embodiment, the apparatus further comprises: the marking module is used for marking causal links between at least two specified variables and determining known directed causal links; a learning module configured to perform semi-supervised learning on the causal probability prediction model based on the known directed causal link to obtain the causal probability prediction model.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform any of the causal link analysis methods described above.
In the embodiment of the present invention, the causal link analysis method provided in the embodiment of the present invention is used for analyzing a plurality of specified variables in a production process to determine a causal relationship between the plurality of variables, and the obtained causal relationship is favorable for determining an influence of any specified variable in the production process on the whole production process, so that stability of the production process can be ensured by accurately controlling each specified variable, quality of a product generated in the production process can be ensured, yield of the product can be effectively improved, and purposes of energy saving and emission reduction can be achieved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic flow chart illustrating an implementation of a causal link analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a flow chart of a causal link analysis method for analyzing specified variables according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a flow of determining the correlation of a causal link analysis method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an implementation flow of a causal link analysis method for predicting a directional causal probability value according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart illustrating an implementation of a causal link analysis method for determining a directed causal link according to an embodiment of the present invention;
FIG. 6 is a schematic view of a scenario in which a causal link analysis method according to an embodiment of the present invention establishes a directed complete graph;
FIG. 7 is a diagram illustrating a scenario of a causal link analysis method model update according to an embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating a scenario of a causal link graph obtained by a causal link analysis method according to an embodiment of the present invention;
fig. 9 is a schematic flow chart of an implementation of a causal link analysis device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of a causal link analysis method according to an embodiment of the present invention.
Referring to fig. 1, in one aspect, an embodiment of the present invention provides a causal link analysis method, where the method includes: operation 101, performing record analysis on a production process of a specified object to obtain a plurality of specified variables corresponding to the production process; an operation 102 of determining a non-directional correlation link between a plurality of specified variables based on a correlation degree judgment rule; operation 103, predicting the undirected correlated link according to the causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlated link; and operation 104, determining a directed causal link among the specified variables according to the directed causal probability value, wherein the directed causal link is used for representing a causal relation of the production process of the specified object.
The causal link analysis method provided by the embodiment of the invention is used for analyzing a plurality of specified variables in the production process to determine the causal relationship among the variables, and the obtained causal relationship is favorable for determining the influence of any specified variable in the production process on the whole production process, so that the stability of the production process can be ensured by accurately controlling each specified variable, the quality of a product generated in the production process can be ensured, the yield of the product can be effectively improved, and the purposes of energy conservation and emission reduction can be achieved.
The production process of the method can be a production process in various fields, such as the petrochemical field, the new material field, the bioengineering field and the like. The method is particularly suitable for the production process with long production period, a plurality of specified variables in the production process and strong mutual coupling between the specified variables, namely, the change of one specified variable can cause the change of a plurality of specified variables, thereby influencing the final product. For example, when the method is applied to a catalytic cracking reaction, a causal link relation between a plurality of designated variables, namely a plurality of process parameters and the product yield is researched, the process parameters can be accurately controlled, and the yield is effectively improved; meanwhile, intelligent detection and early warning of device faults can be achieved, and enterprise benefits are improved. In the method, the specified variables include at least one of: variables used to characterize the process parameters, variables used to characterize the product produced. The process parameters referred by the method comprise at least one of the following parameters: including variables for characterizing raw material parameters, variables for characterizing equipment parameters, and variables for characterizing environmental parameters.
Specifically, in the method, first, a production process of a specified object is recorded and analyzed to obtain a plurality of specified variables corresponding to the production process. Wherein a designated object is used to refer to a production process within a certain range. The specific range may be divided based on the production product, may be divided based on the production cycle, and may be divided based on the specific apparatus. For example, in one case, the production process of the specified object of the method may be a production process in which a specific raw material is produced to obtain a specific product, wherein the specific raw material and the specific product may be determined according to actual conditions. In another case, the designated object production process of the method may be a production process within one of the time phases of a particular product. In still another case, the specified object generation process of the method may be a production process corresponding to a specific device and/or system. The process is illustrated with reference to a production process in which a specific starting material is produced to obtain a specific product. It is to be understood that the parameters corresponding to a particular starting material and a particular product are included in a plurality of specified variables.
After the production process of the specified object is determined to be a production process of obtaining a specific product by producing a specific raw material, the specified variable is used for representing parameter information which can change in the production process and also used for representing parameter information which can cause other variables to change. The specified variables in the production process can be determined by analyzing the production record corresponding to the production process. It will be appreciated that some process parameters that are not related to the production process may also be changed, and that in this operation, these process parameters that do not affect the production process do not belong to the specified variables. Such as maintenance parameters that are not related to the production process.
After determining the plurality of designated variables, the method determines undirected correlation links between the plurality of designated variables based on the relevancy judgment rule. Wherein, the relevancy judging rule is used for determining whether the specified variables are correlated or not. It should be explained that whether a given variable is related or not is defined as when one variable changes, another variable also changes, and this is defined as the correlation between the given variables. For example, when the second variable is changed, the first variable and the third variable are also changed, and it may be determined that there is a correlation between the second variable and the first variable, and between the second variable and the third variable. And determining whether all the specified variables have correlation according to the correlation judgment rule. The undirected correlation link is determined based on the specified variables having correlation. The non-directional related links are used for connecting two specified variables with correlation respectively. That is, non-directional correlation links are used to characterize the correlation between specified variables at both ends of the link.
After the undirected correlated link is determined, the method further comprises the step of predicting the undirected correlated link according to a causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlated link. The causal probability prediction model is used for predicting undirected related links and designated variables corresponding to two ends of the undirected related links as input, outputting causal probability values according to prediction results, and the directed causal probability values are used for predicting whether causal relations exist between the designated variables positioned at two ends of the undirected related links. According to the above manner, each undirected correlated link and the corresponding designated variable can be predicted to determine the directed causal probability value of the corresponding undirected correlated link.
And determining a directed causal link among the specified variables according to the directed causal probability value, wherein the directed causal link is used for representing the causal relationship of the production process of the specified object.
Wherein the direction of the directional causal probability value can be represented by positive and negative. By means of the directional causal probability values, it can be determined whether the directionality, i.e. the causal relation, between the specified variables corresponding to the two ends of the non-directional correlated link is present. For example, a directed causal probability value defined as a value satisfying a preset range may determine that there is a causal relationship between specified variables corresponding to two ends of an undirected correlation link, and define a direction for representing a causal relationship direction between the two specified variables, where the preset range is satisfied, when a directed causal probability value is a positive number, it is determined that a first variable causes a change in a second variable, and when the directed causal probability value is a negative number, it is determined that the second variable causes the change in the first variable. In the above manner, each directed causal probability value is determined, thereby obtaining directed causal links between a plurality of specified variables. The causal relationship among the designated variables can be determined according to the directed causal link, so that each designated variable can be accurately controlled based on the directed causal link, and the stability of the production process is ensured.
Fig. 2 is a schematic diagram of an implementation process of analyzing a specified variable by using a causal link analysis method according to an embodiment of the present invention.
Referring to fig. 2, in an implementation, the operation 101, performing record analysis on the specified object production process to obtain a plurality of specified variables corresponding to the production process, includes: operation 1011, recording the operation process of the specified object through the distributed control system, and obtaining an operation record; operation 1012, performing screening and supplementing on the operation records to obtain associated information corresponding to the production process; wherein the production process is included in the operation process; in operation 1013, the associated information is normalized to obtain the designated variable.
The method for analyzing the production process of the specified object comprises the steps of setting a Distributed Control System (DCS) in the production process to collect and record all production information in the production process and obtain operation records. It is understood that the operation records include production process related records and production process unrelated records.
Therefore, screening supplement is needed to be carried out on the operation records, and particularly, the method comprises a plurality of screening supplement methods. It will be appreciated that the method employs at least one of the following screening supplements:
in a first screening supplement method, the method deletes production process independent records based on production process run daily. For example, the association information corresponding to the production process is obtained by deleting all data during the damage of the device and the maintenance of the device based on the device operation daily report.
In the second screening and supplementing method, the method counts the number of missing values of a plurality of specified variables, deletes all variables whose missing values are greater than a preset threshold, for example, sets the preset threshold to 10%, deletes all variables greater than 10%, and fills the missing values existing in the variables by using an upward filling method to obtain associated information corresponding to the production process.
In a third complementary screening method, the method obtains the associated information corresponding to the production process by performing outlier processing on a plurality of specified variables, for example, processing the outliers by using a 3 σ method.
It is understood that the screening supplement methods can be combined according to actual needs. For example, in an implementation scenario, the method first collects and summarizes DCS data through a distributed control system; then, deleting all data of the device during damage and maintenance according to the device running daily report to obtain a screening variable; then, counting the number of missing values of each variable in the screened variables, deleting all variables with missing values accounting for more than 10%, and filling the missing values existing in each variable by using an upward filling method to obtain a supplementary variable; then, for the supplementary variable, the outlier is processed by using a 3 sigma method, and the processed variable is determined as the associated information corresponding to the production process.
Then, the associated information is normalized to obtain a specified variable. It is understood that the correlation information includes the yield of the product in the production process. The specific content of the normalization process is to perform unit conversion on each designated variable according to a preset conversion unit so that all designated variables are represented by the same conversion unit, and determine the variables represented by the same conversion unit as the designated variables.
Fig. 3 is a schematic flow chart illustrating the implementation of the correlation determination of the causal link analysis method according to the embodiment of the present invention.
Referring to FIG. 3, in one possible embodiment, the operation 102 of determining a non-directional correlation link between a plurality of specified variables based on a correlation determination rule includes: operation 1021, building a directed complete graph corresponding to the plurality of specified variables, where the directed complete graph includes a plurality of bidirectional links for connecting the plurality of specified variables, and each bidirectional link includes two directed opposite directional links; an operation 1022 of determining a correlation value corresponding to each of the directed links based on the plurality of specified variables; operation 1023, screening the correlation values meeting the first threshold, and determining the directional link corresponding to the correlation value meeting the first threshold as a first directional link; at operation 1024, a directed correlation graph is determined based on the plurality of specified variables and the first directed link, the directed correlation graph being used to characterize the undirected correlation links between the specified variables.
When the relevance is judged, a chart is adopted to represent the undirected correlation link among the specified variables, and specifically, the method adopts a directed correlation graph to represent. The method takes the designated variables as the vertexes, and connects each vertex through a bidirectional link to form a directed complete graph. On the directed complete graph, the correlation degree between every two specified variables is calculated through Pearson correlation coefficient, mutual information and the like, and correlation values are obtained and are used for corresponding to the bidirectional links between every two specified variables. And when the correlation value does not meet the first threshold value, screening out the bidirectional links which do not meet the first threshold value, namely when the correlation value meets the first threshold value, reserving the bidirectional links corresponding to the correlation value, and determining the reserved bidirectional links and the vertexes as the directional correlation paths. Namely, the directed correlation graph includes vertices corresponding to a plurality of specified variables, bidirectional links corresponding to correlation values satisfying a first threshold, and each bidirectional link is used to connect two of the specified variables having a correlation. The first threshold is used for representing a threshold meeting the correlation relation and is preset according to actual conditions. A directed dependency graph may characterize undirected dependency links between specified variables.
Fig. 4 is a schematic flow chart illustrating an implementation process of predicting a directional causal probability value by a causal link analysis method according to an embodiment of the present invention.
Referring to FIG. 4, in one possible embodiment, the causal probability prediction model is a graphical neural network model; correspondingly, operation 103, predicting the undirected correlated link according to the causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlated link, includes: operation 1031, determining a variable set and a link set according to the directed correlation diagram; the variable set comprises a plurality of specified variables, and the link set comprises a first directed link; operation 1032 predicts the variable set and the link set through the graph neural network model, and obtains a causal probability set corresponding to the link set, where the causal probability set includes a directed causal probability value corresponding to the first directed link.
The causal probability prediction model of the method is a Graph Neural Network (GNN) model, and when a plurality of designated variables are involved in the method, the production process is predicted through the Graph Neural Network model, so that a frame result and a prediction result can be more intuitively and clearly embodied. When the causal probability prediction model is used for prediction, the variable set and the link set are used as input, and the directional causal probability value corresponding to the first directional link is used as output.
Specifically, the method establishes a graph model G (V, E) corresponding to a specified variable set and an undirected correlation link set, wherein V is a set of all vertexes corresponding to the specified variable in a directed correlation graph, namely a variable set; e is the set of all first directed links, i.e., the set of links, used to connect all vertices in the directed dependency graph. It should be noted that the first directional link includes a first positive link and a first negative link. And predicting the variable set and the link set by the graph neural network model, and particularly obtaining a directed causal probability value. Further, each directional causal probability value corresponds to a directional link.
Fig. 5 is a schematic flow chart illustrating an implementation of determining a directed causal link according to a causal link analysis method in the embodiment of the present invention.
Referring to fig. 5, in an alternative embodiment, the operation 104 of determining a directed causal link between a plurality of specified variables according to a directed causal probability value includes: operation 1041, screening a directed causal probability value meeting the second threshold, and determining a directed link corresponding to the directed causal probability value meeting the second threshold as a second directed link; operation 1042, determining a directed causal link map from the plurality of specified variables and the second directed link, the directed causal link map characterizing directed causal links between the specified variables.
The second threshold is used to characterize a threshold condition that satisfies a causal relationship. That is, when the directed causal probability value satisfies the second threshold, it may be determined that the directed link corresponding to the directed causal probability value has a causal relationship between the characterizing specified variables, and thus, the directed link corresponding to the directed causal probability value satisfying the second threshold is determined as the second directed link. And the second directed link is used for representing the causal relationship and the direction of the causal relationship among the specified variables. And determining a plurality of specified variables as vertexes, determining a second directed link for connecting the vertexes to form a directed causal link graph, and determining the causal relationship among the specified variables through the directed causal link graph. Thereby determining a directed causal link between all specified variables of the directed production process in the production process of the specified object.
In an implementation, before determining the undirected correlation link between the plurality of specified variables based on the correlation determination rule at operation 102, the method further comprises: firstly, marking causal links between at least two specified variables, and determining known directed causal links; then, the causal probability prediction model is subjected to semi-supervised learning based on the known directed causal link to obtain the causal probability prediction model.
Specifically, a schematic diagram of the updating of the graphical neural network model designed by the method is shown in fig. 7: wherein by setting
Figure RE-GDA0002688797560000121
Performing convolution, pooling operation by setting rhov→eStacking the vertex data of two ends of a directed link by setting
Figure RE-GDA0002688797560000122
For a single-layer LSTM network, the output is a causal relationship probability value. Furthermore, the model also sets a loss function as cross entropy loss, sets iteration times and performs semi-supervised learning. The training samples of semi-supervised learning can invite experts to label a small number of direct causal relationships and a small number of irrelevant relationships by using domain knowledge corresponding to the production process of a specified object. And a small number of direct causal relationships, as well as a small number of unrelated relationships, are standardized. The schematic diagram of the graphical neural network model update is designed, including but not limited to, setting a function of vertex update, edge update, or network structure.
To facilitate understanding of the above embodiments, a specific implementation scenario is provided below for description. In this scenario, the causal link analysis method provided by the embodiment of the present invention is applied to a causal link analysis device, which is used for analyzing a production process in the field of petrochemical industry, such as a petroleum refining process, in which the production process is complex, the process parameters are many, and the process parameters have strong coupling with each other.
During the analysis, the device first collects summary distributed control system data (DCS data) through the distributed control system. And then deleting all data during the damage of the device and the maintenance of the device according to the running daily report of the device used in the production process to obtain production data, and analyzing the production data to determine production variables. Wherein the operational journal is contained in the distributed control system data.
And then counting the number of missing values in the production variables, deleting variables with the missing values accounting for more than 10%, and filling the missing values existing in the deleted production variables by using an upward filling method to obtain filling variables. The outliers are processed using a 3 sigma method on the fill variables to obtain process variables. And standardizing the processing variables, and determining specified variables corresponding to the production process, wherein the specified variables are multiple.
Then, the device uses the designated variables as vertices, connects the designated variables through bidirectional links, and establishes a directed complete graph as shown in fig. 6, where the dots in fig. 6 represent the vertices, and two line segments connecting the two circular dots and having opposite arrow directions represent the bidirectional links. And calculating the correlation degree corresponding to the bidirectional link by adopting methods such as Pearson correlation coefficient, mutual information and the like to represent the correlation degree between the specified variables. The correlation degree is evaluated through a threshold th1, and when the correlation degree between two vertexes does not meet a preset formula, a bidirectional link between the two vertexes is deleted, wherein the formula is as follows: abs (corr) > th1, where corr represents the correlation between two vertices, abs represents the absolute value, th1 represents the set threshold, and the graph obtained by deleting the bidirectional links corresponding to the correlation that does not satisfy the preset formula is determined as a directed correlation graph.
Then, as shown in fig. 7, establishing a graph model G ═ V, E, where V is a set of all vertices in the directed correlation graph, and an initial value of each vertex is all time series values of variables corresponding to the vertex; and E is a set of all edges in the directed correlation diagram, and an initial value is set to be an all-1 matrix. Designing the update schematic diagram of the GNN model, setting the functions or network structures of vertex update and edge update, and setting
Figure BDA0002562100500000131
And performing convolution and pooling operations on the vertex set V and the edge set E. Setting rhov→eAnd stacking the data of the two end vertices of one edge, namely stacking the data of the two end vertices of one edge in the edge set after convolution and pooling to obtain stacked data. Can be provided with
Figure BDA0002562100500000141
The corresponding causal relationship probability value E' is output for a single-layer LSTM network, i.e., based on the input stack data and the edge set. The expert can be invited to label a small amount of direct causal relationships and a small amount of irrelevant relationships by utilizing domain knowledge, the causal relationships and the irrelevant relationships labeled by the expert are standardized and then used as training samples, a loss function is set to be cross entropy loss, iteration times are set, and semi-supervised learning is carried out on the GNN model. And predicting the vertex set and the edge set corresponding to the directed correlation graph through a GNN model to obtain a directed causal probability value corresponding to each edge.
Finally, a threshold th2 is set, and as shown in fig. 8, an edge th2 with a directional causal probability value larger than the threshold in the directional correlation graph is reserved, so that a causal link graph is obtained. The circle points in fig. 8 and the circle points in fig. 6 refer to the same designated variables, and the directed line segment located between the two circle points in fig. 8 is used to represent a causal relationship between the two designated variables, specifically, when the designated variable at the end opposite to the arrow of the line segment changes, the designated variable toward the arrow of the line segment changes.
Fig. 9 is a schematic flow chart of an implementation of a causal link analysis device according to an embodiment of the present invention.
Referring to fig. 9, another aspect of the present invention provides a causal link analysis device, including: the analysis module 601 is configured to perform record analysis on a production process of a specified object to obtain a plurality of specified variables; a determining module 602, configured to determine a non-directional correlation link between multiple specified variables based on a correlation determination rule; the prediction module 603 is configured to predict the undirected correlated link according to the causal probability prediction model, and obtain a directed causal probability value corresponding to the undirected correlated link; the determining module 602 is further configured to determine a directional causal link between the plurality of specified variables according to the directional causal probability value.
In one embodiment, the analysis module 601 includes: a recording submodule 6011, configured to record an operation process of the designated object through the distributed control system, and obtain an operation record; a screening and supplementing submodule 6012 configured to screen and supplement the operation records to obtain associated information corresponding to the production process; wherein the production process is included in the operation process; the processing submodule 6013 is configured to perform normalization processing on the association information to obtain a designated variable.
In one embodiment, the determining module 602 includes: the establishing submodule 6021 is configured to establish a directed complete graph corresponding to the plurality of specified variables, where the directed complete graph includes a plurality of bidirectional links for connecting the plurality of specified variables, and each bidirectional link includes two directed links with opposite directions; a determination submodule 6021 configured to determine a correlation value corresponding to each directional link according to a plurality of designated variables; the screening submodule 6022 is configured to screen correlation values meeting a first threshold, and determine a directed link corresponding to the correlation value meeting the first threshold as a first directed link; the determining sub-module 6021 is further configured to determine a directed correlation graph according to the plurality of specified variables and the first directed link, where the directed correlation graph is used to characterize the undirected correlation link between the specified variables.
In one embodiment, the causal probability prediction model is a graphical neural network model; accordingly, the prediction module 603 includes: determining a variable set and a link set according to the directed correlation graph; wherein, the variable set comprises a plurality of specified variables, and the link set comprises a first directed link; and predicting the variable set and the link set through a graph neural network model to obtain a causal probability set corresponding to the link set, wherein the causal probability set comprises a directed causal probability value corresponding to the first directed link.
In an embodiment, the screening submodule 6022 is further configured to screen a directional causal probability value meeting a second threshold, and determine a directional link corresponding to the directional causal probability value meeting the second threshold as a second directional link; the determining sub-module 6021 is further configured to determine a directed causal link graph from the plurality of specified variables and the second directed link, the directed causal link graph being used to characterize the directed causal links between the specified variables.
In one embodiment, the apparatus further comprises: a labeling module 604, configured to label a causal link between at least two specified variables, and determine a known directed causal link; a learning module 605 configured to perform semi-supervised learning on the causal probability prediction model based on the known directed causal link to obtain the causal probability prediction model.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform any of the causal link analysis methods described above.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the various embodiments or examples described in this specification, as well as features of the various embodiments or examples, may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A causal link analysis method, comprising:
recording and analyzing a production process of a specified object to obtain a plurality of specified variables corresponding to the production process;
determining undirected correlation links between the plurality of specified variables based on a correlation judgment rule;
predicting the undirected correlation link according to a causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlation link;
and determining a directed causal link among the plurality of specified variables according to the directed causal probability value, wherein the directed causal link is used for representing a causal relation of the specified object production process.
2. The method of claim 1, wherein the performing record analysis on the specified object production process to obtain a plurality of specified variables corresponding to the production process comprises:
recording the operation process of the specified object through a distributed control system to obtain an operation record;
screening and supplementing the operation records to obtain associated information corresponding to the production process; wherein the production process is included in the operational process;
and carrying out standardization processing on the associated information to obtain a specified variable.
3. The method according to claim 1 or 2, wherein the determining the undirected correlation link between the plurality of specified variables based on the correlation determination rule comprises:
establishing a directed complete graph corresponding to the designated variables, wherein the directed complete graph comprises a plurality of bidirectional links for connecting the designated variables, and any bidirectional link comprises two directed opposite directional links;
determining a correlation value corresponding to each directed link according to the plurality of designated variables;
screening correlation values meeting a first threshold value, and determining directed links corresponding to the correlation values meeting the first threshold value as first directed links;
and determining a directed correlation graph according to the plurality of specified variables and the first directed link, wherein the directed correlation graph is used for representing the undirected correlation link between the specified variables.
4. The method of claim 3, wherein the causal probability prediction model is a graphical neural network model;
correspondingly, the predicting the undirected correlated link according to the causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlated link includes:
determining a variable set and a link set according to the directed correlation graph; wherein the variable set comprises a plurality of specified variables, and the link set comprises a first directed link;
predicting the variable set and the link set through a graph neural network model to obtain a causal probability set corresponding to the link set, wherein the causal probability set comprises a directed causal probability value corresponding to the first directed link.
5. The method of claim 1 or 4, wherein determining the directed causal link between the plurality of specified variables according to the directed causal probability values comprises:
screening directed causal probability values meeting a second threshold value, and determining directed links corresponding to the directed causal probability values meeting the second threshold value as second directed links;
determining a directed causal link graph from the plurality of specified variables and the second directed link, the directed causal link graph characterizing directed causal links between the specified variables.
6. The method of claim 1, wherein prior to said determining a non-directional correlation link between a plurality of specified variables based on a correlation decision rule, the method further comprises:
marking causal links between at least two specified variables, and determining known directed causal links;
semi-supervised learning is performed on the causal probability prediction model based on the known directed causal links to obtain a causal probability prediction model.
7. A causal link analysis device, characterized in that it comprises:
the analysis module is used for recording and analyzing the production process of the specified object to obtain a plurality of specified variables;
a determining module, configured to determine a non-directional correlation link between the plurality of designated variables based on a correlation determination rule;
the prediction module is used for predicting the undirected correlation link according to a causal probability prediction model to obtain a directed causal probability value corresponding to the undirected correlation link;
the determining module is further configured to determine a directed causal link between the plurality of specified variables according to the directed causal probability value.
8. The apparatus of claim 7, wherein the analysis module comprises:
the recording submodule is used for recording the operation process of the specified object through a distributed control system to obtain an operation record;
the screening and supplementing submodule is used for screening and supplementing the operation records to obtain associated information corresponding to the production process; wherein the production process is included in the operational process;
and the processing submodule is used for carrying out standardization processing on the associated information to obtain the specified variable.
9. The apparatus of claim 7, wherein the determining module comprises:
the establishing submodule is used for establishing a directed complete graph corresponding to the specified variables, the directed complete graph comprises a plurality of bidirectional links for connecting the specified variables, and any bidirectional link comprises two directed opposite directed links;
the determining submodule is used for determining a correlation value corresponding to each directed link according to the plurality of specified variables;
the screening submodule is used for screening correlation values meeting a first threshold value, and determining directed links corresponding to the correlation values meeting the first threshold value as first directed links;
the determining submodule is further configured to determine a directed correlation graph according to the plurality of specified variables and the first directed link, where the directed correlation graph is used to characterize undirected correlation links between the specified variables.
10. A computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the causal link analysis method of any of claims 1-7.
CN202010618648.0A 2020-06-30 2020-06-30 Causal link analysis method and device and computer readable storage medium Pending CN111985677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010618648.0A CN111985677A (en) 2020-06-30 2020-06-30 Causal link analysis method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010618648.0A CN111985677A (en) 2020-06-30 2020-06-30 Causal link analysis method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111985677A true CN111985677A (en) 2020-11-24

Family

ID=73438474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010618648.0A Pending CN111985677A (en) 2020-06-30 2020-06-30 Causal link analysis method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111985677A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7088427B1 (en) 2022-01-20 2022-06-21 富士電機株式会社 Driving support equipment, driving support methods and programs

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1740934A (en) * 2004-08-27 2006-03-01 株式会社日立制作所 Quality control system for manufacturing industrial products
US20060066617A1 (en) * 2004-09-30 2006-03-30 John Antanies Computerized method and software for data analysis
JP2008084039A (en) * 2006-09-28 2008-04-10 Hitachi Ltd Method for analyzing manufacturing process
CN107563596A (en) * 2017-08-03 2018-01-09 清华大学 A kind of evaluation index equilibrium state analysis method based on Bayes's causal network
CN108983710A (en) * 2017-06-02 2018-12-11 欧姆龙株式会社 Procedure analysis device, procedure analysis method and storage medium
CN109754158A (en) * 2018-12-07 2019-05-14 国网江苏省电力有限公司南京供电分公司 A method of generating the big data Causal model under corresponding operation of power networks environment
CN110555047A (en) * 2018-03-29 2019-12-10 日本电气株式会社 Data processing method and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1740934A (en) * 2004-08-27 2006-03-01 株式会社日立制作所 Quality control system for manufacturing industrial products
US20060066617A1 (en) * 2004-09-30 2006-03-30 John Antanies Computerized method and software for data analysis
JP2008084039A (en) * 2006-09-28 2008-04-10 Hitachi Ltd Method for analyzing manufacturing process
CN108983710A (en) * 2017-06-02 2018-12-11 欧姆龙株式会社 Procedure analysis device, procedure analysis method and storage medium
CN107563596A (en) * 2017-08-03 2018-01-09 清华大学 A kind of evaluation index equilibrium state analysis method based on Bayes's causal network
CN110555047A (en) * 2018-03-29 2019-12-10 日本电气株式会社 Data processing method and electronic equipment
CN109754158A (en) * 2018-12-07 2019-05-14 国网江苏省电力有限公司南京供电分公司 A method of generating the big data Causal model under corresponding operation of power networks environment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7088427B1 (en) 2022-01-20 2022-06-21 富士電機株式会社 Driving support equipment, driving support methods and programs
JP2023106043A (en) * 2022-01-20 2023-08-01 富士電機株式会社 Driving assist system, driving assist method, and program

Similar Documents

Publication Publication Date Title
CN105491599B (en) Predict the novel regression system of LTE network performance indicator
CN112699113B (en) Industrial manufacturing process operation monitoring system driven by time sequence data stream
CN106682835B (en) Data-driven complex electromechanical system service quality state evaluation method
CN114444231B (en) Online self-adaptive prediction method, device, equipment and medium for residual life of mold
CA3157588A1 (en) Methods and systems for the estimation of the computational cost of simulation
CN114757307B (en) Artificial intelligence automatic training method, system, device and storage medium
CN114666224A (en) Dynamic allocation method, device, equipment and storage medium for business resource capacity
CN113869521A (en) Method, device, computing equipment and storage medium for constructing prediction model
CN111813618A (en) Data anomaly detection method, device, equipment and storage medium
CN116484269B (en) Parameter processing method, device and equipment of display screen module and storage medium
CN112002114A (en) Electromechanical equipment wireless data acquisition system and method based on 5G-ZigBee communication
CN117494292A (en) Engineering progress management method and system based on BIM and AI large model
CN111985677A (en) Causal link analysis method and device and computer readable storage medium
CN115481726A (en) Industrial robot complete machine health assessment method and system
CN117635219B (en) Intelligent analysis system and method for big data of metal mine production
CN117668743A (en) Time sequence data prediction method of association time-space relation
CN117578715A (en) Intelligent monitoring and early warning method, system and storage medium for power operation and maintenance
CN115935285A (en) Multi-element time series anomaly detection method and system based on mask map neural network model
CN111222203A (en) Method for establishing and predicting service life model of bearing
Magro et al. A confirmation technique for predictive maintenance using the Rough Set Theory
CN115577295A (en) Data detection method and device, computer equipment and storage medium
CN110956675B (en) Method and device for automatically generating technology maturity curve
CN112596391B (en) Deep neural network large time lag system dynamic modeling method based on data driving
CN115348293A (en) Intelligent control remote operation and maintenance method and platform for industrial internet equipment
CN113191306A (en) Equipment abnormal state prediction method based on edge calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination