CN115019061A - Entropy optimization method based on deep neural network information entropy estimation - Google Patents

Entropy optimization method based on deep neural network information entropy estimation Download PDF

Info

Publication number
CN115019061A
CN115019061A CN202210924688.7A CN202210924688A CN115019061A CN 115019061 A CN115019061 A CN 115019061A CN 202210924688 A CN202210924688 A CN 202210924688A CN 115019061 A CN115019061 A CN 115019061A
Authority
CN
China
Prior art keywords
entropy
neural network
deep neural
layer
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210924688.7A
Other languages
Chinese (zh)
Inventor
张新钰
张世焱
李骏
杨昊波
杨卓异
吴新刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210924688.7A priority Critical patent/CN115019061A/en
Publication of CN115019061A publication Critical patent/CN115019061A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses an entropy optimization method based on deep neural network information entropy estimation, which comprises the following steps: step 1) modeling input data and output data of a deep neural network to be optimized based on a communication theory to obtain expectation and constraint on information entropy; the deep neural network comprises n network layers, and the nth network layer is an output layer; step 2) establishing a probability model for the training process of the deep neural network according to each layer of network structure of the deep neural network; step 3) calculating the information entropy output by each layer of the deep neural network in the training process by adopting a K-near entropy estimation method; and 4) establishing a loss function of the information entropy according to the expectation and the constraint of the information entropy, and guiding the training process and the optimization direction of the deep neural network. The invention improves the interpretability of the deep neural network training process, makes the training process more transparent and can carry out quantitative evaluation.

Description

Entropy optimization method based on deep neural network information entropy estimation
Technical Field
The invention belongs to the technical field of deep neural network optimization and interpretability, and particularly relates to an entropy optimization method based on deep neural network information entropy estimation.
Background
With the development of artificial intelligence, the algorithm of neural network deep learning in machine learning shows outstanding performance from a multilayer perceptron (MLP), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN) to more and more modifications, improvement and optimization of a neural network structure in recent years and the like under different application scenes in various industries. The deep neural network has the problem of irrevocability, namely poor interpretability, which is always existed while the deep neural network is proved to have strong functions. The neural network is similar to a black box, and the training process of the deep neural network, how to train, the quantitative observation and the quantitative guidance and optimization of the training process of the network are further lacked.
Deep learning algorithms have found widespread use in the field of autonomous driving, which has been rapidly developed in recent years, particularly in the task of perception of the environment by autonomous vehicles. At present, the environmental perception of the automatic driving vehicle mostly utilizes information and data (such as point cloud data captured by a laser radar and RGB image data obtained by a vehicle-mounted camera) of a plurality of sensors to perform tasks such as target detection and the like. After the information of the sensors is acquired, the data of different types and structures needs to be subjected to feature extraction, so that subsequent tasks can be performed by utilizing the features. In this feature extraction process, the deep neural network plays an important role. However, whether the process of extracting features by the deep neural network is reasonable or not and whether the extracted features are effective or not are difficult questions to answer.
The current deep neural network carries out special extraction on multi-modal data: (1) because the neural network is poor in interpretability and the characteristic extraction process is opaque, the optimization of the neural network training process is very difficult; (2) feature extraction of multi-modal data is important for feature fusion of the next step, and the feature extracted through the deep neural network is difficult to quantitatively evaluate the effectiveness and the rationality of the feature.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an entropy optimization method based on deep neural network information entropy estimation.
In order to achieve the above object, the present invention provides an entropy optimization method based on deep neural network information entropy estimation, the method including:
step 1) modeling input data and output data of a deep neural network to be optimized based on a communication theory to obtain expectation and constraint on information entropy; the deep neural network comprises an n-layer network layer, and the n-layer network layer is an output layer;
step 2) establishing a probability model for the training process of the deep neural network according to each layer of network structure of the deep neural network;
step 3) calculating the information entropy output by each layer of the deep neural network in the training process by adopting a K-near entropy estimation method;
and 4) establishing a loss function of the information entropy according to the expectation and the constraint of the information entropy, and guiding the training process and the optimization direction of the deep neural network.
As an improvement of the above method, the step 1) of information entropy expectation and constraint includes: in each round of training, the entropy value of the output layer of the deep neural network is decreased progressively; and the output of each layer of the trained network layer is the same as the information entropy input by the deep neural network.
As an improvement of the above method, the probabilistic model of step 2) includes:
for the deep neural network with n layers in total, the output of each layer of the network layer is taken as a multi-dimensional continuous random variable
Figure 773573DEST_PATH_IMAGE001
The ith channel of each layer is taken as a multi-dimensional continuous random variable
Figure 346637DEST_PATH_IMAGE001
The number d of pixel points of each channel is the dimension d of xi, and each layer has the same dimensionmA sample is sampled.
For an improvement of the above method, the K-near entropy estimation method in step 3) includes:
calculating the sphere neighborhood radius of the sampling sample xi according to the following formula
Figure 424183DEST_PATH_IMAGE002
Figure 158921DEST_PATH_IMAGE003
Wherein the content of the first and second substances,
Figure 298915DEST_PATH_IMAGE004
is the Euclidean distance, V, between the d-dimensional sample xi and the nearest k-th sample point d Is the volume of the d-dimensional unit sphere,
Figure 122340DEST_PATH_IMAGE005
is composed of
Figure 664180DEST_PATH_IMAGE006
A function;
calculating a correction term for the entropy estimate according to
Figure 38661DEST_PATH_IMAGE007
Comprises the following steps:
Figure 56164DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 564506DEST_PATH_IMAGE009
denotes the radius around the sample xi is
Figure 836218DEST_PATH_IMAGE010
The volume of the neighborhood sphere of (a),
Figure 506234DEST_PATH_IMAGE011
representing random variables
Figure 745454DEST_PATH_IMAGE001
The boundary constraint of (2);
obtaining the information entropy output by each layer of the network according to the following formula
Figure 526329DEST_PATH_IMAGE012
Comprises the following steps:
Figure 918127DEST_PATH_IMAGE013
wherein, the first and the second end of the pipe are connected with each other,
Figure 759044DEST_PATH_IMAGE015
Figure 485560DEST_PATH_IMAGE016
is a Digamma function, ψ (1) = -gamma, gamma is Euler-Marseoni constant, ψ (m) </g (m-1) </or < lambda > means approximately equal to.
As a modification of the above method, the step 3) includes:
step 3-1) traversing n multi-dimensional continuous random variables
Figure 70125DEST_PATH_IMAGE017
Step 3-2) for each
Figure 316430DEST_PATH_IMAGE017
Traversing each sampling sample xi of the network layer where the sampling sample xi is located, and determining an ellipsoid neighborhood of each sample xi; the radii of the d-dimensional ellipsoids of the sampling samples xi are sorted from large to small to obtain correction terms of the sampling samples xi
Figure 328248DEST_PATH_IMAGE018
Combining information entropy output by each layer of network layer
Figure 542061DEST_PATH_IMAGE019
Obtaining a modified entropy estimate
Figure 805683DEST_PATH_IMAGE020
As a modification of the above method, the step 3-2) includes:
step 3-2-1) selecting k sample points which are close to xi and are close to xi, carrying out PCA (principal component analysis) treatment on k +1 sample points including xi, calculating a covariance matrix of d-dimensional random variables by using the k +1 sample points, and calculating d eigenvectors of the covariance matrix;
step 3-2-2) taking the directions of the d eigenvectors as the axes of the d-dimensional ellipsoid, searching a sample point with the farthest distance along the direction of each eigenvector in the selected k +1 samples, and taking the distance of the sample point in the direction as the radius of the ellipsoid on the axis, thereby determining the ellipsoid neighborhood of the sampling sample xi;
step 3-2-3) sequentially sorting the radii of the d-dimensional ellipsoids from large to small, so that the correction term of the sample xi
Figure 31128DEST_PATH_IMAGE018
Comprises the following steps:
Figure 338482DEST_PATH_IMAGE021
wherein the radii of the D-dimension ellipsoid are sequentially from large to small
Figure 914956DEST_PATH_IMAGE022
According to the information entropy of each layer of network layer
Figure 982270DEST_PATH_IMAGE023
Obtaining a modified entropy estimate
Figure 62221DEST_PATH_IMAGE020
Comprises the following steps:
Figure 806055DEST_PATH_IMAGE025
as a modification of the above method, the step 4) includes:
designing a loss function
Figure 604247DEST_PATH_IMAGE026
Figure 475251DEST_PATH_IMAGE027
Wherein the content of the first and second substances,
Figure 534342DEST_PATH_IMAGE028
as the information entropy of the original input data,
Figure 324444DEST_PATH_IMAGE029
the information entropy output by the j layer of the deep neural network is obtained, and n is the number of network layers;
according to the decreasing of entropy value of the output layer of the deep neural network in each round of training, the judgment result is that the entropy value is the first one
Figure 485298DEST_PATH_IMAGE030
The entropy of the information output after the secondary training is larger than that of the information output after the secondary training
Figure 753468DEST_PATH_IMAGE031
Then, the loss function is increased
Figure 204084DEST_PATH_IMAGE032
Figure 165087DEST_PATH_IMAGE033
Wherein the content of the first and second substances,
Figure 547658DEST_PATH_IMAGE034
are respectively the first
Figure 619519DEST_PATH_IMAGE030
Second and third
Figure 918782DEST_PATH_IMAGE031
The information entropy output after the secondary training;
will be provided with
Figure 785107DEST_PATH_IMAGE026
Figure 920554DEST_PATH_IMAGE032
As an auxiliary term, the cross entropy loss of the combined network forms a loss function of the deep neural network training.
Compared with the prior art, the invention has the advantages that:
1. the invention improves the interpretability of the deep neural network training process, makes the training process more transparent and can carry out quantitative evaluation;
2. the invention establishes expectation to neural network information entropy, adds a network loss function based on entropy, and better guides the network gradient descending process;
3. the invention increases the verification of the network training result in the aspect of information entropy, and better ensures the validity and rationality of the network training result;
4. the method and the thought provided by the invention are not limited to the optimization of a single deep learning task or a single neural network, are not restricted by different neural network structures, and can be applied to various deep neural networks.
Drawings
FIG. 1 is a flow chart of an entropy optimization method based on deep neural network information entropy estimation according to the present invention;
FIG. 2 is a neural network probabilistic model of the present invention;
FIG. 3 is a block diagram of the entropy optimizer of the present invention.
Detailed Description
Before describing the embodiments of the present invention, the related terms related to the embodiments of the present invention are first explained as follows:
information entropy: the magnitude of the average amount of information used to represent the network output is an expectation of the amount of information the network output has, a measure of the amount of information the network output has, and also a measure of the average uncertainty and complexity of the output.
Differential entropy: the information entropy is obtained by calculating the continuous random variable from the popularization of the information entropy calculated by the discrete random variable.
Source coding: source coding is a transformation of source symbols for the purpose of increasing communication efficiency, or for reducing or eliminating source redundancy.
Point cloud: a series of discrete three-dimensional point data obtained for the surface profile of an object in space, by a lidar or like device, contains (x, y, z) coordinate information.
RGB image: the image data collected by the camera is a three-channel image.
A convolutional neural network: is a type of feedforward neural network that contains convolution calculations and has a depth structure.
The invention aims to perform segmentation and interpretation on a network layer of a deep neural network, quantitatively evaluate a network training process and a final result, guide a network gradient descent process and selection of network structure hyper-parameters, prevent overfitting and approach a model performance boundary.
The invention relates to a method for quantitatively explaining the training process of a neural network by utilizing a mode of calculating entropy under the scene of multi-modal data feature extraction, and guiding the training and optimizing directions of the neural network, comprising the following two aspects of work: one aspect is probabilistic and communicative modeling of deep neural network models and processes of multi-modal data feature extraction, and obtaining expectations about model information entropy. Another aspect is to calculate and estimate the entropy of information during network training and to use it as a direction for guiding neural network training and optimization based on the desired to entropy-related loss function of the above entropy.
The invention provides a universal entropy optimization method based on deep neural network information entropy estimation, which comprises the following steps:
1. and modeling input and output data of the deep neural network based on a communication theory to obtain corresponding expectation and constraint on information entropy.
2. And establishing a probability model for the calculation and training process of the deep neural network according to the network structure of each layer of the deep neural network.
3. And calculating the information entropy output by each layer of network in the training process and the information entropy finally output by the network.
4. And according to the expectation and the constraint of the entropy, establishing an entropy loss function and guiding the process of network training and the direction of optimization.
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, embodiment 1 of the present invention provides an entropy optimization method based on deep neural network information entropy estimation, which utilizes a way of calculating entropy to quantitatively explain a neural network training process, so as to guide the direction of neural network training and optimization, and apply the method in a multi-modal data feature extraction scenario.
The specific implementation steps are as follows:
to summarize: when the deep neural network is used for feature extraction of multi-modal data, the interpretability of the network is poor, and the feature extraction process is not transparent. And (3) carrying out segmentation and interpretation on the network layer by adopting an entropy optimization method, quantitatively evaluating the process and the final result of feature extraction, guiding the gradient descent process of the network and the selection of the super-parameters of the network structure, preventing overfitting and approaching the performance boundary of the model.
Step 1) obtaining expectation and constraint on information entropy.
For the neural network, the neural network has strong learning capability, and the complex nonlinear relation between an input layer and expected output is learned through continuous iteration and adjustment of parameters in the network training process, so that the requirements of different tasks are met. Therefore, the training process of the neural network can be regarded as continuously searching the relation between the input data and the expected output under the constraint condition according to the input data, thereby achieving the learning effect. It is believed that before the neural network is untrained, the "connections" it can learn are uncertain, i.e., the final output is uncertain, and when the neural network is trained with data, the "connections" between the inputs and outputs it learns are continuously strengthened, and the certainty of the network output is continuously improved.
In order to quantify and measure this uncertainty of neural networks, the concept of information entropy is introduced to represent the size of the average information volume output by the network. It is an expectation of the amount of information that the network output has, the size of which is a measure of the amount of information that the network outputs, and also of the average uncertainty and complexity of the output. Therefore, the final output uncertainty of the neural network is the largest when the neural network is not trained, namely, the information entropy is the largest. The 'connection' between the input and the output is continuously learned in the network training process, and the final output uncertainty is continuously reduced, namely, the process of information entropy reduction is realized. One of the expectations about entropy in the neural network training process is thus derived: and the entropy value of the output layer of the training neural network is decreased.
In the task of multi-modal information feature extraction, it is expected that abstract features can be extracted, structured data is formed to represent all information of original input, and the effect of removing redundant parts in original data is achieved. With reference to a communication model, the concept of source coding is introduced into the task, and the feature extraction of the multi-modal data is regarded as a source coding process. To ensure that the network can extract all the information in the data completely, the source coding should be lossless coding without losing information amount, i.e. the original input of the network and the output of the feature extraction should have the same information amount. Thus, another expectation of feature extraction on information entropy is derived: the output of the network layer is the same as the information entropy of the original input.
Step 2) establishing a neural network probability model
As shown in fig. 2, calculating the entropy-optimized loss function according to the expectation of entropy requires information entropy estimation for each layer of output in the feature extraction network. Finding the method of computing entropy for the output of each layer of the network requires probabilistic modeling of the neural network and is applicable to the general neural network. In the multi-modal information feature extraction task, the condition that multi-modal input information is an image and a point cloud is considered, the point cloud image is projected to a two-dimensional generation Front View (FV) and a Bird's Eye View (BEV), and the image and the point cloud image are respectively input into a convolution network for feature extraction. The following settings are now made for the convolutional neural network:
the channel generated by convolution of the image by the convolution check is regarded as a multi-dimensional continuous random variable X, each channel { X1, X2.., xi } is regarded as a sample (i is the number of channels) of the random variable X, and the number of pixels of each channel is the dimension d of the continuous random variable X. The proposed probabilistic modeling method for the neural network can be applied to other deep neural networks in an extended way, namely, the output of each layer of the network is regarded as a continuous random variable, and the actual output of the layer is taken as the sampling of the continuous random variable. (taking MLP as an example, in an MLP network, a continuous random variable X is the output of each hidden layer, and each neuron sees a sample of the random variable X (i.e., the random variable X is a continuous random variable of 1 dimension).
Step 3) calculating the information entropy of the network layer and the final output
Through the modeling of the neural network probability, the problem of entropy estimation of each layer of the neural network is converted into that: the entropy is estimated from the samples without knowing the probability distribution of the continuous random variable X.
There are many ways to solve the above problem, which essentially calculates the differential entropy of continuous random variables. Differential entropy (also known as continuous entropy) is a concept in information theory that originates from shannon's attempt to extend his concept of shannon entropy to a continuous probability distribution. Let a random variable X whose domain of the probability density function f is the set of X. The differential entropy h (x) is defined as:
Figure 530526DEST_PATH_IMAGE036
since in this problem the probability distribution of the random variable is not known in advance, the probability density function is not known, only a limited number of sample values are sampled for it. The information entropy is calculated by selecting a K-near entropy estimation method.
The "K-near entropy estimation" method is described below:
discretizing a continuous variable by sampling, in order to make n samples sampled approximately represent the entire sample space, each sample point is extended as a d-dimensional hyper-sphere, the radius of which is the distance between the sample point and its nearest sample point. When the variables are completely evenly distributed in the sample space, the probability for each sample point can be approximately 1/n. Since the distribution of the random variable in the sample space is unknown, and there may be a large difference from the uniform distribution, the distribution of the random variable in the sample space is corrected by using the position distribution of the sample in the space. That is, when the sample point is closer to the nearest sample point in the vicinity, the tendency is closer to the distribution of the variables in the vicinity of the sample point area. Conversely, when the sample point is further away from the nearest surrounding sample point, the trend is more sparse than the distribution of the variables around the sample point area. The density and sparseness of the variables in different sample space regions directly affect the probability density near each sample point. The discrete probability for each sample point is estimated as:
Figure 684296DEST_PATH_IMAGE037
wherein m is the number of samples, r d (x i ) Is a sample x i The d-dimensional euclidean distance to its nearest sample point,
Figure 721522DEST_PATH_IMAGE038
is the unit sphere volume in d-dimensional space.
The estimate of the entropy of the random variable X is then:
Figure 344265DEST_PATH_IMAGE040
wherein
Figure 757928DEST_PATH_IMAGE041
Is Euler-Marshall constant and is equal to about 0.5772.
The K-neighbor entropy estimation method is such that when the distance of each sample point from its nearest sample point is extended to the distance from its nearest kth sample point, the estimation of the entropy of the random variable X becomes:
Figure 766205DEST_PATH_IMAGE043
where ψ (·) is a Digamma function, ψ (1) = - γ, ψ (n) _ lg (n-1), and (phi) denotes approximately equal.
Figure 239911DEST_PATH_IMAGE044
Is the d-dimensional euclidean distance between the sample xi to the k-th sample point closest thereto. It can be shown that when k =1,
Figure 84370DEST_PATH_IMAGE045
and
Figure 36146DEST_PATH_IMAGE046
and equivalence.
The K-nearest neighbor entropy estimation has better estimation on the entropy when the dimension of the variable is low, but the deviation of the entropy estimation is larger as the dimension is increased continuously. The reason is mainly as follows:
1. if the distribution of the random variable is limited by the boundary, the neighborhood of the sample near the boundary may exceed the boundary (i.e., the estimated range of the sample is larger than the distribution range of the actual variable), resulting in the entropy being overestimated.
2. k-nearest estimates assume a uniform distribution for the probability distribution in the neighborhood of the sample, which may be more distorted as the dimension of the sample increases, thereby producing larger errors.
The improvements and corrections for the above deviations are as follows:
1.
Figure 430087DEST_PATH_IMAGE047
neighborhood sphere volume representing sample xi,
Figure 684482DEST_PATH_IMAGE048
Representing random variables
Figure 140871DEST_PATH_IMAGE049
The boundary constraint of (b) represents a boundary constraint of the random variable X. The modified entropy estimate is:
Figure 20971DEST_PATH_IMAGE050
Figure 879206DEST_PATH_IMAGE051
Figure 570081DEST_PATH_IMAGE052
wherein d is the dimension of the sample,mis the total number of sampled samples per layer,
Figure 513767DEST_PATH_IMAGE053
is the Euclidean distance, V, between the d-dimensional sample xi and the nearest kth sample point d Is the volume of the d-dimensional unit sphere,
Figure 931978DEST_PATH_IMAGE054
is composed of
Figure 644720DEST_PATH_IMAGE055
A function;
Figure 506496DEST_PATH_IMAGE056
for the purpose of the correction term of the entropy estimation,
Figure 937478DEST_PATH_IMAGE057
denotes the radius around the sample xi is
Figure 159380DEST_PATH_IMAGE058
A neighborhood sphere volume of;
Figure 726628DEST_PATH_IMAGE059
the entropy of the information output for each layer of the network,
2. the sphere neighborhood of the traditional k-nearest neighbor estimation is changed into the ellipsoid neighborhood of each sample point. Taking the ellipsoid neighborhood of the sample point xi as an example, first, k sample points around xi that are closest to xi are selected. A total of k +1 sample points, including xi, are PCA processed. Namely, a covariance matrix of the d-dimensional random variable is calculated by using k +1 sample points, and d eigenvectors of the covariance matrix are calculated. And taking the directions of the d eigenvectors as the axes of the d-dimensional ellipsoid, searching a sample point which is farthest away along the direction of each eigenvector in the selected k +1 samples, and taking the distance of the sample point in the direction as the radius of the ellipsoid on the axis. Thus determining the ellipsoid neighborhood of xi. The radii of the d-dimensional ellipsoids are r from large to small 1 (xi), r 2 (xi), ..., r D (xi) And (4) dividing. The correction term for sample xi is:
Figure 759306DEST_PATH_IMAGE060
adding correction terms for the two errors to obtain corrected entropy estimation
Figure 677584DEST_PATH_IMAGE061
Comprises the following steps:
Figure 720755DEST_PATH_IMAGE063
step 4) establishing a loss function of entropy
According to the two expectations and the entropy estimation calculation result of the network layer, a loss function of model training can be designed to optimize the feature extraction process. According to the final output and input information entropy unchanged, design loss function
Figure 142509DEST_PATH_IMAGE064
Figure 346089DEST_PATH_IMAGE065
Wherein
Figure 486083DEST_PATH_IMAGE066
As the information entropy of the original input data,
Figure DEST_PATH_IMAGE067
and n is the number of network layers.
According to the decreasing of the information entropy output by the network in the training process, the judgment of the first time
Figure 846526DEST_PATH_IMAGE068
The entropy of the information output after the secondary training is larger than that of the information output after the secondary training
Figure 529311DEST_PATH_IMAGE069
Then, the loss function is increased
Figure 28426DEST_PATH_IMAGE070
Figure 780350DEST_PATH_IMAGE071
Wherein the content of the first and second substances,
Figure 757533DEST_PATH_IMAGE072
is as followsqThe information entropy output after the secondary training;
will be provided with
Figure 294825DEST_PATH_IMAGE064
Figure 964841DEST_PATH_IMAGE070
As an auxiliary item, a loss function (such as cross entropy loss) formed by combining other constraints of the network forms a loss function of the whole neural network training.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. An entropy optimization method based on deep neural network information entropy estimation is characterized by comprising the following steps:
step 1) modeling input data and output data of a deep neural network to be optimized based on a communication theory to obtain expectation and constraint on information entropy; the deep neural network comprises an n-layer network layer, and the n-layer network layer is an output layer;
step 2) establishing a probability model for the training process of the deep neural network according to each layer of network structure of the deep neural network;
step 3) calculating the information entropy output by each layer of the deep neural network in the training process by adopting a K-near entropy estimation method;
and 4) establishing a loss function of the information entropy according to the expectation and the constraint of the information entropy, and guiding the training process and the optimization direction of the deep neural network.
2. An entropy optimization method based on deep neural network information entropy estimation according to claim 1, wherein the step 1) information entropy expectation and constraint comprises: in each round of training, the entropy value of the output layer of the deep neural network is decreased progressively; and the output of each layer of the trained network layer is the same as the information entropy input by the deep neural network.
3. An entropy optimization method based on deep neural network information entropy estimation according to claim 2, wherein the probability model of the step 2) comprises:
for the deep neural network with n layers in total, the output of each layer of the network layer is taken as a multi-dimensional continuous random variable
Figure 793530DEST_PATH_IMAGE001
The ith channel of each layer is taken as a multi-dimensional continuous random variable
Figure 179512DEST_PATH_IMAGE001
The number d of the pixels of each channel is the dimension d of xi, and each layer has m sampling samples.
4. An entropy optimization method based on deep neural network information entropy estimation according to claim 3, wherein the K-near entropy estimation method of the step 3) comprises:
calculating the sphere neighborhood radius of the sampling sample xi according to the following formula
Figure 558584DEST_PATH_IMAGE002
Figure 422634DEST_PATH_IMAGE003
Wherein the content of the first and second substances,
Figure 830482DEST_PATH_IMAGE004
is the Euclidean distance, V, between the d-dimensional sample xi and the nearest k-th sample point d Is the volume of the d-dimensional unit sphere,
Figure 121786DEST_PATH_IMAGE005
is composed of
Figure 716715DEST_PATH_IMAGE006
A function;
calculating a correction term for the entropy estimate according to
Figure 509091DEST_PATH_IMAGE007
Comprises the following steps:
Figure 646811DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 233650DEST_PATH_IMAGE009
denotes the radius around the sample xi is
Figure 456821DEST_PATH_IMAGE010
The volume of the neighborhood sphere of (a),
Figure 787308DEST_PATH_IMAGE011
representing random variables
Figure 779535DEST_PATH_IMAGE012
The boundary constraint of (2);
obtaining the information entropy output by each layer of the network according to the following formula
Figure 537276DEST_PATH_IMAGE013
Comprises the following steps:
Figure 247743DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 116342DEST_PATH_IMAGE016
Figure 228654DEST_PATH_IMAGE017
is a function of the number of Digamma,
Figure 157296DEST_PATH_IMAGE018
(1) = gamma, gamma is the euler-marseini constant,
Figure 355059DEST_PATH_IMAGE018
(m) -lg (m-1), wherein, the expression of, -Lg is approximately equal to.
5. An entropy optimization method based on deep neural network information entropy estimation according to claim 4, wherein the step 3) comprises:
step 3-1) traversing n multi-dimensional continuous random variables
Figure 27349DEST_PATH_IMAGE012
Step 3-2) for each
Figure 994168DEST_PATH_IMAGE012
Traversing each sampling sample xi of the network layer where the sampling sample xi is located, and determining an ellipsoid neighborhood of each sample xi; the radii of the d-dimensional ellipsoids of the sampling samples xi are sorted from large to small to obtain correction terms of the sampling samples xi
Figure 359290DEST_PATH_IMAGE019
Combining information entropy output by each layer of network layer
Figure 637825DEST_PATH_IMAGE020
Obtaining a modified entropy estimate
Figure 989172DEST_PATH_IMAGE021
6. An entropy optimization method based on deep neural network information entropy estimation according to claim 5, wherein the step 3-2) comprises:
step 3-2-1) selecting k sample points which are close to xi and are close to xi, carrying out PCA (principal component analysis) treatment on k +1 sample points including xi, calculating a covariance matrix of d-dimensional random variables by using the k +1 sample points, and calculating d eigenvectors of the covariance matrix;
step 3-2-2) taking the directions of the d eigenvectors as the axes of the d-dimensional ellipsoid, searching a sample point with the farthest distance along the direction of each eigenvector in the selected k +1 samples, and taking the distance of the sample point in the direction as the radius of the ellipsoid on the axis, thereby determining the ellipsoid neighborhood of the sampling sample xi;
step 3-2-3) sequentially sorting the radii of the d-dimensional ellipsoids from large to small, so that the correction term of the sample xi
Figure 76076DEST_PATH_IMAGE019
Comprises the following steps:
Figure 346521DEST_PATH_IMAGE022
wherein the radii of the D-dimension ellipsoid are sequentially from large to small
Figure 518876DEST_PATH_IMAGE023
According to the information entropy of each layer network layer
Figure 532968DEST_PATH_IMAGE020
Obtaining a modified entropy estimate
Figure 333434DEST_PATH_IMAGE021
Comprises the following steps:
Figure 915725DEST_PATH_IMAGE025
7. an entropy optimization method based on deep neural network information entropy estimation according to claim 6, wherein the step 4) comprises:
designing a loss function
Figure 575377DEST_PATH_IMAGE026
Figure 127581DEST_PATH_IMAGE027
Wherein the content of the first and second substances,
Figure 923498DEST_PATH_IMAGE028
as the information entropy of the original input data,
Figure 795465DEST_PATH_IMAGE029
the information entropy output by the j layer of the deep neural network is obtained, and n is the number of network layers;
according to the decreasing of entropy value of the output layer of the deep neural network in each round of training, the judgment result is that the entropy value is the first one
Figure 676833DEST_PATH_IMAGE030
The entropy of the information output after the sub-training is larger than that of the information output after the sub-training
Figure 298308DEST_PATH_IMAGE031
Then, the loss function is increased
Figure 948732DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE033
Wherein the content of the first and second substances,
Figure 200722DEST_PATH_IMAGE034
are respectively the first
Figure 694020DEST_PATH_IMAGE030
Second and third
Figure 728972DEST_PATH_IMAGE031
The information entropy output after the secondary training;
will be provided with
Figure 624116DEST_PATH_IMAGE026
Figure 453531DEST_PATH_IMAGE032
As auxiliary items, combined netsThe cross-entropy loss of the network constitutes a loss function for deep neural network training.
CN202210924688.7A 2022-08-03 2022-08-03 Entropy optimization method based on deep neural network information entropy estimation Pending CN115019061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210924688.7A CN115019061A (en) 2022-08-03 2022-08-03 Entropy optimization method based on deep neural network information entropy estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210924688.7A CN115019061A (en) 2022-08-03 2022-08-03 Entropy optimization method based on deep neural network information entropy estimation

Publications (1)

Publication Number Publication Date
CN115019061A true CN115019061A (en) 2022-09-06

Family

ID=83065323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210924688.7A Pending CN115019061A (en) 2022-08-03 2022-08-03 Entropy optimization method based on deep neural network information entropy estimation

Country Status (1)

Country Link
CN (1) CN115019061A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022583A (en) * 2016-05-12 2016-10-12 中国电力科学研究院 Electric power communication service risk calculation method and system based on fuzzy decision tree
CN109189921A (en) * 2018-08-07 2019-01-11 阿里巴巴集团控股有限公司 Comment on the training method and device of assessment models
CN110320806A (en) * 2019-07-24 2019-10-11 东北大学 Sewage disposal process adaptive prediction control method based on integrated instant learning
CN110531313A (en) * 2019-08-30 2019-12-03 西安交通大学 A kind of near-field signals source localization method based on deep neural network regression model
CN110690912A (en) * 2019-10-10 2020-01-14 宾斌 Single-beam device, near-field communication of self-organizing computing network and construction method
CN110929802A (en) * 2019-12-03 2020-03-27 北京迈格威科技有限公司 Information entropy-based subdivision identification model training and image identification method and device
CN112364975A (en) * 2020-10-14 2021-02-12 山东大学 Terminal operation state prediction method and system based on graph neural network
CN113011722A (en) * 2021-03-04 2021-06-22 中国工商银行股份有限公司 System resource data allocation method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022583A (en) * 2016-05-12 2016-10-12 中国电力科学研究院 Electric power communication service risk calculation method and system based on fuzzy decision tree
CN109189921A (en) * 2018-08-07 2019-01-11 阿里巴巴集团控股有限公司 Comment on the training method and device of assessment models
CN110320806A (en) * 2019-07-24 2019-10-11 东北大学 Sewage disposal process adaptive prediction control method based on integrated instant learning
CN110531313A (en) * 2019-08-30 2019-12-03 西安交通大学 A kind of near-field signals source localization method based on deep neural network regression model
CN110690912A (en) * 2019-10-10 2020-01-14 宾斌 Single-beam device, near-field communication of self-organizing computing network and construction method
CN110929802A (en) * 2019-12-03 2020-03-27 北京迈格威科技有限公司 Information entropy-based subdivision identification model training and image identification method and device
CN112364975A (en) * 2020-10-14 2021-02-12 山东大学 Terminal operation state prediction method and system based on graph neural network
CN113011722A (en) * 2021-03-04 2021-06-22 中国工商银行股份有限公司 System resource data allocation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黎旭: "代理模型技术及其在飞行器可靠性优化中的应用研究", 《中国博士学位论文全文数据库 (工程科技Ⅱ辑)》 *

Similar Documents

Publication Publication Date Title
US11481585B2 (en) Segmentation of data
CN108388896B (en) License plate identification method based on dynamic time sequence convolution neural network
CN111583263B (en) Point cloud segmentation method based on joint dynamic graph convolution
CN109800692B (en) Visual SLAM loop detection method based on pre-training convolutional neural network
CN112488025B (en) Double-temporal remote sensing image semantic change detection method based on multi-modal feature fusion
CN110880010A (en) Visual SLAM closed loop detection algorithm based on convolutional neural network
CN112052802A (en) Front vehicle behavior identification method based on machine vision
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN112926696A (en) Interpretable local migration mutual learning method based on attention diagram
CN114419413A (en) Method for constructing sensing field self-adaptive transformer substation insulator defect detection neural network
CN113962281A (en) Unmanned aerial vehicle target tracking method based on Siamese-RFB
Hu et al. A video streaming vehicle detection algorithm based on YOLOv4
CN114283325A (en) Underwater target identification method based on knowledge distillation
CN116152554A (en) Knowledge-guided small sample image recognition system
CN115048870A (en) Target track identification method based on residual error network and attention mechanism
CN111325259A (en) Remote sensing image classification method based on deep learning and binary coding
Chen et al. A finger vein recognition algorithm based on deep learning
CN111950635A (en) Robust feature learning method based on hierarchical feature alignment
CN111578956A (en) Visual SLAM positioning method based on deep learning
CN115019061A (en) Entropy optimization method based on deep neural network information entropy estimation
CN116109649A (en) 3D point cloud instance segmentation method based on semantic error correction
CN115578574A (en) Three-dimensional point cloud completion method based on deep learning and topology perception
CN113222867B (en) Image data enhancement method and system based on multi-template image
CN113469133A (en) Deep learning-based lane line detection method
Ahuja et al. Convolutional Neural Network and Kernel Extreme Learning Machine for Face Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220906

RJ01 Rejection of invention patent application after publication