CN112434790B

CN112434790B - Self-interpretation method for distinguishing part of black box problem of convolutional neural network

Info

Publication number: CN112434790B
Application number: CN202011249200.2A
Authority: CN
Inventors: 赵金伟; 王启舟; 邱万力; 黑新宏; 答龙超; 王伟; 谢国; 胡潇
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2024-03-29
Anticipated expiration: 2040-11-10
Also published as: CN112434790A

Abstract

The invention discloses a self-interpretation method for distinguishing part of black box problems of a convolutional neural network, which is implemented according to the following steps: step 1, providing an interpretable distance as an interpretability index of a measurement model; step 2, forming a feature extractor by a convolution layer and a pooling layer of the CNN network; step 3, taking the output of the feature extractor obtained in the step 2, namely the feature map, as the input of the CNN network full-connection layer; step 4, classifying and labeling input image samples in the CNN network by using the feature map in the step 3 in the full connection layer; step 5, using the full connection layer to form a judging part of the CNN network; step 6, constructing a DCLM model of the three-layer node; and 7, providing a novel game method to perform game training of the DCLM model and the CNN constructed in the step 6 so as to improve the interpretability of the judging part of the CNN. The invention solves the problem of large error of the interpretation method in the prior art.

Description

Self-interpretation method for distinguishing part of black box problem of convolutional neural network

Technical Field

The invention belongs to the technical field of deep learning in the computer field, and particularly relates to a self-interpretation method for distinguishing part of black box problems of a convolutional neural network.

Background

In recent years, convolutional neural networks (hereinafter CNN) have significantly exceeded the ability of humans as representative algorithms for deep learning, on certain tasks such as computer vision, computer games, and the like. However, the manner in which the neural network handles the problem is difficult to understand and interpret, and this ubiquitous difficulty in interpretation, in turn, leads to many problems in terms of fairness, privacy, robustness, and credibility in the application; many researchers currently attempt to explain the black box problem around CNNs in different ways.

The so-called black box problem of CNN has major difficulties in the unexplained problem of its discrimination part, and there are two main methods for dealing with such problems currently: pre-and post-hoc methods (Holzinger et al, 2019); since the prior art approach is generally a transparent modeling approach (architaa et al 2020), it is difficult to obtain an explanation about the discriminant part, and thus the common technique is mainly focused on the second post-process approach.

Early post hoc approaches have mainly obtained global interpretations of neural networks by extracting a predictable model. (Craven & Shavlik,1999; krishnan et al, 1999; boz, 2002; johansson & Niklasson, 2009) proposes a method to find a decision tree for interpreting neural networks by maximizing gain ratio and estimating fidelity of the current model;

(Craven & Shavlik,1994; johansson & Niklasson, 2003; augasta & Kathovalavakumar, 2012; sebastin et al, 2015; zilke et al, 2016) a specific rule extraction method for searching for optimal interpretable rules from a neural network has been proposed;

in recent years, some feature association methods have also been generated, for example, a network classification decision decomposition method based on deep taylor decomposition proposed by montason et al (montason et al, 2017); a deep lift algorithm for calculating the importance scores of a multi-layer neural network proposed by Shrikumar et al (Shrikumar et al 2016) interprets the differences in output of certain reference outputs by interpreting the differences between their inputs and their reference inputs;

in addition, che et al have proposed a simple rectification method using gradient enhancement trees to extract an interpretable simple model called interpretable simulation learning, thiagarajan et al build a tree-like view representation of a complex model through hierarchical division of feature space to improve interpretability; still other scholars have proposed a rectification method for converting knowledge from a set of models to a single model, wu et al have proposed a tree regularization method based on knowledge rectification to represent neural network output feature space based on multi-layer perception;

however, these approaches only address the unexplained problem that exists with trained neural networks or trained deep neural networks with explicit input features; moreover, because data deviation and noise data exist in the training data set, the traditional machine learning method is difficult to ensure the interpretation performance and generalization performance of the identification part to be consistent and convergent; in addition, the interpretable feature is a feature that represents the distance between the discriminating portion and its optimal interpretable model, and when the discriminating portion has better generalization, it tends to deviate from its optimal interpretable model, thereby producing an erroneous interpretable model.

Disclosure of Invention

The invention aims to provide a self-interpretation method for the black box problem of a convolutional neural network discrimination part, which solves the problem of overlarge interpretation method error in the prior art.

The technical scheme adopted by the invention is that the self-interpretation method for judging the black box problem of the part of the convolutional neural network is implemented according to the following steps:

step 1, providing an interpretable distance as an interpretability index of a measurement model;

step 2, forming a feature extractor by a convolution layer and a pooling layer of the CNN network;

step 3, taking the output of the feature extractor obtained in the step 2, namely the feature map, as the input of the CNN network full-connection layer;

step 4, classifying and labeling input image samples in the CNN network by using the feature map in the step 3 in the full connection layer;

step 5, using the full connection layer to form a judging part of the CNN network;

step 6, constructing a DCLM model of the three-layer node;

and 7, providing a novel game method to perform game training of the DCLM model and the CNN constructed in the step 6 so as to improve the interpretability of the judging part of the CNN.

The present invention is also characterized in that,

in step 1, it is assumed that under the same input data set, the similarity is measured by the variance of the difference between the outputs of the original CNN network model and the constructed DCLM model, and this measurement is primarily named as the interpretation distance, the smaller the interpretation distance, the closer the shape of the discriminating portion of the CNN network is to the shape of the optimal interpretable model, so that the better the interpretable performance of the interpretation portion is;

assuming that Q is a compact metric space, Z is the sample set, v is the Borel metric, such as the Lebesgue metric or edge metric, in the metric space Q, in the square integrable function space over the metric space QIn the determination section f (x), the interpretation distance phi between the best interpreted models P (x) _d (P x, f) is expressed as:

wherein the method comprises the steps of

The step 6 is specifically as follows:

the DCLM three-layer network structure is respectively: the feature predicate layer is formed by feature predicate nodes as a first layer, wherein each feature predicate node is provided with a feature predicate function Z (Γ) used for representing whether a neuron of a judging part of a first full-connection layer in the CNN network has the capability of capturing a feature, and the feature predicate function Z (Γ) is expressed as follows:

where i e {1,2,., k }, τ _i Is the i-th feature map in Γ, wi is its corresponding weight vector in the first fully connected layer, called feature capture layer, where x represents convolution.

In the step 6, the decision result predicate layer is the bottommost layer, and each result predicate node has a result predicate function which indicates whether the output neuron of the judging part is larger than 0; the corresponding function is expressed as follows:

in the step 6, all characteristic predicate layer nodes and result predicate layer nodes are connected with extraction nodes of one or more middle layers, and the layers are called extraction layers and represent true or false truth conditions; each disjunct node represents disjunct relations of all characteristic predicate-layer nodes and result predicate-layer nodes, and the disjunct relations are represented by a disjunct paradigm; if a predicate layer node is connected with a disjunctive layer node by a true or false boundary, its predicate function follows the non-operation in a disjunctive operation, and a potential function of a disjunctive normal form is obtained by the Lukasiewicz method as follows:

φ _c (y)＝min(1，T(Γ，y)) (3)

wherein:

wherein N is the node number of the characteristic predicate layer; Γ is a feature map, Z _j () Is a characteristic predicate layer function, D () is a result layer function; if the logic network parameter a _j If the predicate node and the disjunctive node are false boundaries, if the logical network parameter a is =1 _j The value of 0 indicates that this is a true boundary.

Step 6, setting a DCLM model to include a disjunctive relation:

equal toWherein Z is _j () Is a characteristic predicate layer function, j is E (1, 4), y is the neuron output of the CNN network; the potential function of this disjunctive paradigm is:

wherein a is ₁ ＝a ₄ =1 and a ₂ ＝a ₃ ＝0；

The ground trunk conditional probability distribution of DCLM is:

where G is the number of all group functions, and the partition function, xi, in the formula is expressed as follows:

output value y of neuron output layer of CNN network _dclm The method comprises the following steps:

y _dclm ＝(y _dclm，1 ，y _dclm，2 ，...，y _dclm ，G) (7)

wherein lambda is _i The optimal output value in the DCLM model can be obtained by maximizing the likelihood function for the weight of the i-th group function: y is _dclm Logical network parameter a _j And lambda is _i The method comprises the steps of carrying out a first treatment on the surface of the The process of maximizing the likelihood function is shown in the following formula:

in phi _ci As a potential function, a _i Is a logical network parameter;

for DCLM, firstly, constructing a logic network, and then adopting a maximum a posterior MAP algorithm to extract a model, wherein the method comprises the following steps:

inputting a feature map, a feature capture matrix, a label corresponding to input and the number of inputs;

initializing an authenticity edge set and an initial value of y-dclm, and setting the number of initialization iterations as 1; entering the outer loop body performs the following operations on all feature maps:

calculating and storing the similarity of cosine by using the current feature map and the weight matrix corresponding to the current feature map, calculating and storing the product of the models of the current feature map and the weight matrix;

traversing each dimension of the output vector, solving the latest edge value in the extraction corresponding to the current dimension by using the similarity result obtained by the upper part of the dimension, the product of the modes and the corresponding output y-dclm, and adding 1 to the iteration count by using the existing edge value set and the current edge value set;

and when the iteration times are equal to the initial input number, namely the algorithm ending condition is met.

The iterative optimization thought of the game in the step 7 is as follows:

input: inputting a data set and outputting a target;

initializing: the logic network is named LN and the CNN network is named CN;

first,: putting a sample in a CN to obtain the output of a feature map FM and the CN, using the feature map FM in a logic network model LN to obtain the output of a logic network, using the output of the logic network, a label and the output of the CN to obtain a Loss function Loss, and updating the CN;

substituting a sample into the updated CN to obtain a new feature map, a weight matrix corresponding to the full-connection layer and the feature map and the output of the CN, and continuing to update and construct a logic network LN;

forming a logic network to continuously correct the CN, and constructing the logic network by the CN; the loop is thus performed until the end condition is satisfied.

The step 7 is specifically as follows:

as known from the measurement method of interpretation distance, when the distinguishing part of CNN is sufficiently similar to the shape of its optimal interpretable model, the distinguishing part has good interpretable performance, but its generalization performance tends to decrease, which is mainly due to the fact that two conditions for consistent convergence of performance are difficult to be ensured, as follows:

φ _d (P x, f) =0, f (x) is the CNN discrimination part, and P (x) is the best interpretation model;

since this sufficient requirement is difficult to meet, the equalization problem is always present and its optimal interpretable model P and its optimal discriminating portion are unknown, it may be a viable solution to automatically extract one interpretable model from the discriminating portion during training and then iteratively reduce the interpretation distance between the two models when considering the equalization problem, as follows:

to avoid degrading the generalization performance, the highest probability of p (wX, y) _t ) Wherein X is a training sample, w is a parameter set of CNN, y _t Target direction of XThe amount can be obtained by:

wherein,

when DCLM is known, y _dclm Is an optimal solution thereof, and

p(y*d _clm |w，X)＝1 (14)

based on this, it is obtained:

likewise, the parameter set w, y of the training samples X, CNN of known input values _t As the target vector of X, fnn is the optimal solution of the CNN network, which can be obtained:

p(y _t |w，X)＝p(y _t |f _nn ，w，X)p(f _nn |y* _dclm ，w，X) (16)

if the parameter set w of CNN and training sample X are given, and the loss function is

The conditional probability distribution function is:

at the same time:

wherein, the xi 1, the xi 2 is the partitioning function; by maximizing the likelihood function of p (w|x, yt), the optimal parameter set w of CNN can be obtained, assuming that w obeys gaussian distribution, it is possible to obtain:

where α is a meta-parameter determined by the variance of the selected gaussian distribution, and converting this equation to a minimization problem yields:

the self-interpretation method for the black box problem of the convolutional neural network discrimination part has the beneficial effects that on the premise of not damaging the generalization performance of the convolutional neural network, the causal relationship of the CNN without the prior structure is actively extracted, the interpretation is finally obtained through the causal relationship, and meanwhile, the interpretable performance of the discrimination part of the CNN can be improved. In the process, the invention provides an interpretable model, namely a depth perception learning model (DCLM), for expressing the causal relationship of the judging part, and provides a greedy algorithm for automatically extracting the DCLM from the judging part by solving the maximum satisfiability problem of the judging part. A new game method is provided, the distance between two models is reduced through iteration, the two models are corrected, and the interpretable performance of the judging part is improved on the premise that the generalization performance of the judging part is not greatly reduced. Meanwhile, the generalization performance of the DCLM is improved. An interpretable distance is proposed for evaluating and measuring the distance between the discriminating portion and the interpretable model on the unexplained problem.

Drawings

Fig. 1 is a specific structural representation of a three-layer network structure of a deep perception learning model DCLM;

FIG. 2 is a representation of a disjunctive relationship in a deep-aware learning model DCLM;

FIG. 3 is the accuracy of DCLMs and CNN-DCLMs for each epoch in the gaming process;

FIG. 4 is an explanatory distance of DCLMs from CNNs and CNN-DCLMs;

FIG. 5 is the information entropy of DCLM;

fig. 6 is the precision of CNN and CNN-DCLM.

Detailed Description

The invention will be described in detail below with reference to the drawings and the detailed description.

The invention discloses a self-interpretation method for distinguishing part of black box problems of a convolutional neural network, which is implemented according to the following steps:

wherein the method comprises the steps of

The step 6 is specifically as follows:

as shown in fig. 1, when constructing the DCLM model, since the relationship between the feature captured by the weight matrix of the first layer and the output vector thereof is an internal relationship of the CNN network structure, it is not related to the input data, but not to the abstract degree of the feature, so that the distinguishing portion can be considered to be better explained by a logical relationship, and the representation of the logical relationship is represented by the DCLM three-layer network structure, which is respectively: the feature predicate layer is formed by feature predicate nodes as a first layer, wherein each feature predicate node is provided with a feature predicate function Z (Γ) used for representing whether a neuron of a judging part of a first full-connection layer in the CNN network has the capability of capturing a feature, and the feature predicate function Z (Γ) is expressed as follows:

where i e {1,2,., k }, τ _i Is the ith feature map in Γ, w _i Is its corresponding weight vector in the first fully connected layer, called the feature capture layer, where x represents the convolution.

step 6, constructing a DCLM model of the three-layer node;

φ _c (y)＝min(1，T(Γ，y)) (3)

wherein:

As shown in fig. 2, the DCLM model set in step 6 includes a disjunctive relation:

wherein a is ₁ ＝a ₄ =1 and a ₂ ＝a ₃ ＝0；

The ground trunk conditional probability distribution of DCLM is:

y _dclm ＝(y _dclm，1 ，y _dclm，2 ，...，y _dclm ，G) (7)

in phi _ci As a potential function, a _i Is a logical network parameter;

The iterative optimization thought of the game in the step 7 is as follows:

input: inputting a data set and outputting a target;

initializing: the logic network is named LN and the CNN network is named CN;

The step 7 is specifically as follows:

due to thisTo prove that this conclusion, one can first focus on only one neuron of the CNN network, because each input channel f (x) of the neuron can be considered as a kernel function K (x, w), where w is the weight vector containing the offset in the neuron, it can be Zhang Chenghe Hilbert space, Q is a compact metric space, Z is the sample set, v is the Borel metric, such as Lebesgue metric or edge metric, in the square integrable function space on metric space QIn (a):

H _K is one arranged atThe linear function above, however, due to data bias and noise data in the training dataset, its optimal interpretable model is +.>The above is often a nonlinear function because the continuous linear function is collected +.>Is not dense. Therefore, the adequate requirements are difficult to satisfy, and the trade-off problem always exists, which also exists in the CNN discrimination section; however, its optimal interpretable model P x and its optimal discriminant part are unknown, so when considering the trade-off problem, it may be a viable solution to extract one interpretable model from the discriminant part during training and then iteratively reduce the interpretation distance between the two models, as follows: if the minimized loss function of CNN can guarantee the interpretabilityThe learning algorithm of CNN can improve its performance interpretability with the lowest performance. If not guaranteed, the trade-off between generalization performance and interpretability performance will not exist. To prove the existence of a problem, only one neuron of the CNN may be of interest; if an input channel f (x) of a neuron is considered a kernel function K (x, w) (w is a weight vector containing an offset of the neuron), it will open up a kernel hilbert space for the neuron. />

H _K Can be regarded as one inThe upper set of linear functions is the solution space of the neurons. In the following theory->The conditions for consistent convergence of generalization performance and interpretable performance are discussed:

the lemma 1, the continuous linear set of functional on the separable Hilbert space is not dense in the square integrality function space;

the lemma 2, a set of consecutive nonlinear functions of separable hilbert space is dense everywhere in the L2 theorem.

The discussion is as follows:

at H _K Wherein, when the optimal input channel f (x) approximates a linear function, the optimal interpretable model P (x) is at H _K In the absence of any linear function, according to the argument 1, the conventional training procedure cannot guarantee that f (x) approximates P (x). From the quotients 2, it can be found that convergence cannot be achieved until P (x) approximates f (x). Here, if the approximation is defined as the similarity of f (x) and P (x) function curve shapes, the conditions for consistent convergence of the two properties are _d (p×f=0). For the judgment part of CNN, the sufficiency requirement is still satisfied. However, in most engineering applications, this condition is difficult to preserve in most cases due to data bias and noise data in the training datasetSyndrome/pattern. There is always a trade-off between the two properties of the discriminating part.

From the above conclusion, to completely solve the trade-off problem, φ should be reduced _d (p×f) =0 (f (x) is the CNN discrimination section, and p×x is the best interpretation model). But P and f are unknown, so that during training, an interpretable model P (x) is extracted from the discriminant part f (x) and then iteratively reduced by phi _d (P, f) is a more convenient method.

To avoid degrading the generalization performance, the highest probability of p (wX, y) _t ) Wherein X is a training sample, w is a parameter set of CNN, y _t As the target vector of X, it is possible to obtain:

wherein,

when DCLM is known, y _dclm Is an optimal solution thereof, and

p(y* _dclm |w，X)＝1 (14)

based on this, it is obtained:

p(y _t |w，X)＝p(y _t |f _nn ，w，X)p(f _nn |y* _dclm ，w，X) (16)

The conditional probability distribution function is:

at the same time:

and (3) experimental verification:

experiments were designed in the present invention to verify the effectiveness of the invention. The first experiment verifies whether the self-explanatory method can improve the interpretable performance of CNN without degrading the generalization performance of CNN. A second experiment verifies whether the method can tend to stabilize and converge during the gaming process.

The structures of CNN3 (including 3 convolutional layers, 3 max pooling layers, 3 Fully Connected Layers (FCLs), and 1 output layer), CNN5 (including 5 convolutional layers, 5 max pooling layers, 3 fully connected layers, and 1 output layer), and CNN8 (including 8 convolutional layers, 8 max pooling layers, 3 fully connected layers, and 1 output layer) were used in the experiments. The final structures obtained after training these CNNs by this method are named CNN3-DCLM, CNN5-DCLM and CNN8-DCLM, respectively. All experiments were performed using Mnist (Lecunet al, 1998), fashion Mnist (Zalando, 2017) and Emnist (cohenetal, 2017) baseline data sets. All algorithms were implemented in Python using Pytorch libraries (paszkey et al 2019), and experiments were run on a server equipped with Intel Xeon 4110 (2.1 GHz) processor, 20GBRAM and nvidia telsat 4.

Table 1 classification accuracy and interpretation distance for three gaming methods on DCLM-CNN

Experiment 1 Performance verification of the methods herein on CNN

Experiments also compared Soft Decision Trees (SDT). (Hinton, 2017) is based on trained CNN3, CNN5 and CNN8, i.e. CNN3-sdt, CNN5-sdt and CNN8-sdt. The baseline methods also included SDT (frost & (hiton, 2017) based on trained CNN3-dclm, CNN5-dclm and CNN8-dclm were CNN3 x-SDT, CNN5 x-SDT and CNN8 x-SDT, respectively.

As can be seen from table 1, the accuracy of CNN-DCLMs is higher than both the interpretable models SDT and DCLM and about 1.4 percent lower than that of CNN on all baseline data sets. It is noted that, except for CNN3-SDT on the Emnist dataset, the interpretation distance of all CNN-DCLMs on most datasets is about 5 percent lower than the interpretation distance of most SDT from their CNNs, which may prove that the self-interpretation method can improve the interpretable performance of the resolved part of CNN without significantly degrading the generalization performance of CNN;

on the Mnist dataset, except for CNN3-SDT, on CNN5-SDT and CNN8-SDT, the accuracy of DCLMs was only 0.7 percent lower than that of SDT. From the Emnist dataset and the FashionMnist dataset, DCLMs can be found to be 2 percent less accurate than SDT CNN 3-SDT; it was also found that the accuracy of DCLMs for CNN5-SDT and CNN8-SDT was about 4.3 percent higher than for SDTs on both datasets. This is mainly because the feature graphs output by the feature extraction sections of CNN5 and CNN8 are more abstract than CNN3 for the Emnist dataset and the FashionMnist dataset. The abstract feature map can easily represent the undistorted nature of the DCLM of the predicate so it does not interfere with or affect the generation of the DCLM;

experiment 2 Convergence test of the proposed method

The convergence of the method is verified through experiments: CNN3, CNN5, CNN8 were compared with CNN3-dclm, CNN5-dclm, CNN8-dclm, respectively. 25 epochs are required for each training and all results are measured at each epoch as shown in figures 3, 4, 6, 5. Each graph includes 9 secondary graphs. The three subgraphs in the left column are experiments of the Mnist dataset. The three subgraphs in the middle column are the FashionMnist dataset and the three subgraphs in the right column are the Emnist dataset. Figure 3 shows the accuracy of the DCLMs and CNN-DCLMs for each epoch in the gaming process. It is evident from these data that the accuracy of dclm and that of these cnn-dclm are steadily increasing in the early stages. In the next stage, their accuracy tends to stabilize. This illustrates that the gaming method does not affect the improvement in dclm and cnn generalization performance;

FIG. 4 shows the interpretation distance of DCLMs from CNNs and CNN-DCLMs. It can be seen from these subgraphs that most of the time, CNN-DCLMs that are not involved in gaming are interpreted more than CNN-DCLMs, especially at the end of training. The result shows that the game method can effectively improve the interpretable performance of CNN-DCLMs.

As can be seen from fig. 5, dclm of CNN3, CNN5, CNN8 all have a stable entropy value at the end of the game. On the Mnist dataset, the entropy eventually converges between 120 and 145. On the fashionmist dataset, the entropy value eventually converges between 125 and 136. On the Emnist dataset, the entropy eventually converges between 100 and 120. The result shows that the game algorithm can ensure that the DCLM converges to a stable state

FIG. 6 shows the accuracy of the epochs CNN and CNN-DCLMs in the gaming process; from these sub-graphs, it can be seen that the accuracy of CNNs and CNN-DCLMs steadily increased in the early stages, while the accuracy of CNN-DCLMs was lower than CNNs. But in the last stage its accuracy tends to be stable and consistent. The main reason is that in the early stage, the trade-off problem between generalization performance and interpretive performance CNN-DCLMs must reduce the performance of CNN-DCLMs whose generalization performance increases the performance of its interpretive performance. The proposed gaming method can effectively reduce the gap between the two performances. This suggests that this approach is effective against trade-off problems

From the sub-graphs of fig. 3 and 4, it can also be seen that one DCLM model is available on each epoch. After 15 epochs, their interpretation distance tends to converge. This phenomenon suggests that the gaming process can be reduced in distance, while the training process of the network neural network identification section can be interpreted in real time with the dclm model after 15 epochs.

Claims

1. A self-interpretation method for distinguishing part of black box problems of a convolutional neural network is characterized by comprising the following steps:

in the step 1, it is assumed that the similarity is measured by the variance of the difference between the outputs of the original CNN network model and the constructed DCLM model under the same input data set, and the measurement is primarily named as an interpretation distance, and the smaller the interpretation distance, the closer the shape of the discriminating portion of the CNN network is to the shape of the optimal interpretable model, so that the better the interpretable performance of the interpretation portion is;

φ _d (P*，f)＝∫ _z (f(x)-P*(x)-μ ^P* (f)) ² dv (9)

wherein mu ^P* (f)＝∫ _z (f(x)-P ^* (x))dv (10)；

step 6, constructing a DCLM model of the three-layer node;

the step 6 specifically comprises the following steps:

where i e {1,2,., k }, τ _i Is the ith feature map in Γ, w _i Is the corresponding weight vector in the first full connection layer, called the feature capture layer, where x represents convolution;

all characteristic predicate layer and result predicate layer nodes in the step 6 are connected with extraction nodes of one or more intermediate layers, and the layers are called extraction layers and represent true or false true value conditions; each disjunct node represents disjunct relations of all characteristic predicate-layer nodes and result predicate-layer nodes, and the disjunct relations are represented by a disjunct paradigm; if a predicate layer node is connected with a disjunctive layer node by a true or false boundary, its predicate function follows the non-operation in a disjunctive operation, and a potential function of a disjunctive normal form is obtained by the Lukasiewicz method as follows:

φ _c (y)＝min(1，T(Γ，y)) (3)

wherein:

wherein N is the node number of the characteristic predicate layer; Γ is a feature map, Z _j () Is a characteristic predicate layer function, D () is a result layer function; if the logic network parameter a _j If the predicate node and the disjunctive node are false boundaries, if the logical network parameter a is =1 _j A value of 0 indicates that this is a true boundary;

in the step 6, setting a DCLM model includes a disjunctive relation:

wherein a is ₁ ＝a ₄ =1 and a ₂ ＝a ₃ ＝0；

The ground trunk conditional probability distribution of DCLM is:

y _dclm ＝(y _dclm，1 ，y _dclm，2 ，...，y _dclm ，G) (7)

in phi _ci As a potential function, a _i Is a logical network parameter;

when the iteration times are equal to the initial input number, namely the algorithm ending condition is met;

step 7, providing a novel game method to perform game training of the DCLM model and the CNN constructed in the step 6 so as to improve the interpretability of the judging part of the CNN;

the step 7 game iterative optimization thinking is as follows:

input: inputting a data set and outputting a target;

initializing: the logic network is named LN and the CNN network is named CN;

forming a logic network to continuously correct the CN, and constructing the logic network by the CN; cycling is performed in this way until the end condition is met;

the step 7 specifically comprises the following steps:

since this sufficient requirement is difficult to meet, the trade-off problem is always present and its optimal interpretable model P and its optimal discriminating portion are unknown, it may be a viable solution to extract one interpretable model from the discriminating portion and then iteratively reduce the interpretation distance between the two models during the training process when considering the trade-off problem, as follows:

wherein,

p(y _t |w，X)=∫p(y _t |f，w，X)∫p(f|y _dclm ，w，X)p(y _dclm |w，X)dy _dclm df (13)

when DCLM is known, y _dclm Is an optimal solution thereof, and

p(y* _dclm |w，X)=1 (14)

based on this, it is obtained:

∫p(f|y _dclm ，w，X)p(y _dclm |w，X)dy _dclm =p(f|y* _dclm ，w，X) (15)

p(y _t |w，X)=p(y _t |f _nn ，w，X)p(f _nn |y* _dclm ，w，X) (16)

The conditional probability distribution function is:

at the same time:

where α (alpha) is a meta-parameter determined by the variance of a selected gaussian distribution, and after converting this equation to a minimization problem, it is possible to obtain: