CN115019061A - Entropy optimization method based on deep neural network information entropy estimation - Google Patents
Entropy optimization method based on deep neural network information entropy estimation Download PDFInfo
- Publication number
- CN115019061A CN115019061A CN202210924688.7A CN202210924688A CN115019061A CN 115019061 A CN115019061 A CN 115019061A CN 202210924688 A CN202210924688 A CN 202210924688A CN 115019061 A CN115019061 A CN 115019061A
- Authority
- CN
- China
- Prior art keywords
- entropy
- neural network
- deep neural
- layer
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000005457 optimization Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 51
- 230000008569 process Effects 0.000 claims abstract description 35
- 238000004891 communication Methods 0.000 claims abstract description 6
- 238000005070 sampling Methods 0.000 claims description 15
- 238000012937 correction Methods 0.000 claims description 10
- 239000000126 substance Substances 0.000 claims description 10
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 5
- 238000011158 quantitative evaluation Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 21
- 238000000605 extraction Methods 0.000 description 19
- 238000009826 distribution Methods 0.000 description 11
- 230000006872 improvement Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000009827 uniform distribution Methods 0.000 description 2
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention discloses an entropy optimization method based on deep neural network information entropy estimation, which comprises the following steps: step 1) modeling input data and output data of a deep neural network to be optimized based on a communication theory to obtain expectation and constraint on information entropy; the deep neural network comprises n network layers, and the nth network layer is an output layer; step 2) establishing a probability model for the training process of the deep neural network according to each layer of network structure of the deep neural network; step 3) calculating the information entropy output by each layer of the deep neural network in the training process by adopting a K-near entropy estimation method; and 4) establishing a loss function of the information entropy according to the expectation and the constraint of the information entropy, and guiding the training process and the optimization direction of the deep neural network. The invention improves the interpretability of the deep neural network training process, makes the training process more transparent and can carry out quantitative evaluation.
Description
Technical Field
The invention belongs to the technical field of deep neural network optimization and interpretability, and particularly relates to an entropy optimization method based on deep neural network information entropy estimation.
Background
With the development of artificial intelligence, the algorithm of neural network deep learning in machine learning shows outstanding performance from a multilayer perceptron (MLP), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN) to more and more modifications, improvement and optimization of a neural network structure in recent years and the like under different application scenes in various industries. The deep neural network has the problem of irrevocability, namely poor interpretability, which is always existed while the deep neural network is proved to have strong functions. The neural network is similar to a black box, and the training process of the deep neural network, how to train, the quantitative observation and the quantitative guidance and optimization of the training process of the network are further lacked.
Deep learning algorithms have found widespread use in the field of autonomous driving, which has been rapidly developed in recent years, particularly in the task of perception of the environment by autonomous vehicles. At present, the environmental perception of the automatic driving vehicle mostly utilizes information and data (such as point cloud data captured by a laser radar and RGB image data obtained by a vehicle-mounted camera) of a plurality of sensors to perform tasks such as target detection and the like. After the information of the sensors is acquired, the data of different types and structures needs to be subjected to feature extraction, so that subsequent tasks can be performed by utilizing the features. In this feature extraction process, the deep neural network plays an important role. However, whether the process of extracting features by the deep neural network is reasonable or not and whether the extracted features are effective or not are difficult questions to answer.
The current deep neural network carries out special extraction on multi-modal data: (1) because the neural network is poor in interpretability and the characteristic extraction process is opaque, the optimization of the neural network training process is very difficult; (2) feature extraction of multi-modal data is important for feature fusion of the next step, and the feature extracted through the deep neural network is difficult to quantitatively evaluate the effectiveness and the rationality of the feature.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an entropy optimization method based on deep neural network information entropy estimation.
In order to achieve the above object, the present invention provides an entropy optimization method based on deep neural network information entropy estimation, the method including:
step 1) modeling input data and output data of a deep neural network to be optimized based on a communication theory to obtain expectation and constraint on information entropy; the deep neural network comprises an n-layer network layer, and the n-layer network layer is an output layer;
step 2) establishing a probability model for the training process of the deep neural network according to each layer of network structure of the deep neural network;
step 3) calculating the information entropy output by each layer of the deep neural network in the training process by adopting a K-near entropy estimation method;
and 4) establishing a loss function of the information entropy according to the expectation and the constraint of the information entropy, and guiding the training process and the optimization direction of the deep neural network.
As an improvement of the above method, the step 1) of information entropy expectation and constraint includes: in each round of training, the entropy value of the output layer of the deep neural network is decreased progressively; and the output of each layer of the trained network layer is the same as the information entropy input by the deep neural network.
As an improvement of the above method, the probabilistic model of step 2) includes:
for the deep neural network with n layers in total, the output of each layer of the network layer is taken as a multi-dimensional continuous random variableThe ith channel of each layer is taken as a multi-dimensional continuous random variableThe number d of pixel points of each channel is the dimension d of xi, and each layer has the same dimensionmA sample is sampled.
For an improvement of the above method, the K-near entropy estimation method in step 3) includes:
calculating the sphere neighborhood radius of the sampling sample xi according to the following formula:
Wherein the content of the first and second substances,is the Euclidean distance, V, between the d-dimensional sample xi and the nearest k-th sample point d Is the volume of the d-dimensional unit sphere,is composed ofA function;
wherein the content of the first and second substances,denotes the radius around the sample xi isThe volume of the neighborhood sphere of (a),representing random variablesThe boundary constraint of (2);
obtaining the information entropy output by each layer of the network according to the following formulaComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,;is a Digamma function, ψ (1) = -gamma, gamma is Euler-Marseoni constant, ψ (m) </g (m-1) </or < lambda > means approximately equal to.
As a modification of the above method, the step 3) includes:
Step 3-2) for eachTraversing each sampling sample xi of the network layer where the sampling sample xi is located, and determining an ellipsoid neighborhood of each sample xi; the radii of the d-dimensional ellipsoids of the sampling samples xi are sorted from large to small to obtain correction terms of the sampling samples xiCombining information entropy output by each layer of network layerObtaining a modified entropy estimate。
As a modification of the above method, the step 3-2) includes:
step 3-2-1) selecting k sample points which are close to xi and are close to xi, carrying out PCA (principal component analysis) treatment on k +1 sample points including xi, calculating a covariance matrix of d-dimensional random variables by using the k +1 sample points, and calculating d eigenvectors of the covariance matrix;
step 3-2-2) taking the directions of the d eigenvectors as the axes of the d-dimensional ellipsoid, searching a sample point with the farthest distance along the direction of each eigenvector in the selected k +1 samples, and taking the distance of the sample point in the direction as the radius of the ellipsoid on the axis, thereby determining the ellipsoid neighborhood of the sampling sample xi;
step 3-2-3) sequentially sorting the radii of the d-dimensional ellipsoids from large to small, so that the correction term of the sample xiComprises the following steps:
According to the information entropy of each layer of network layerObtaining a modified entropy estimateComprises the following steps:
as a modification of the above method, the step 4) includes:
Wherein the content of the first and second substances,as the information entropy of the original input data,the information entropy output by the j layer of the deep neural network is obtained, and n is the number of network layers;
according to the decreasing of entropy value of the output layer of the deep neural network in each round of training, the judgment result is that the entropy value is the first oneThe entropy of the information output after the secondary training is larger than that of the information output after the secondary trainingThen, the loss function is increased:
Wherein the content of the first and second substances,are respectively the firstSecond and thirdThe information entropy output after the secondary training;
will be provided with、As an auxiliary term, the cross entropy loss of the combined network forms a loss function of the deep neural network training.
Compared with the prior art, the invention has the advantages that:
1. the invention improves the interpretability of the deep neural network training process, makes the training process more transparent and can carry out quantitative evaluation;
2. the invention establishes expectation to neural network information entropy, adds a network loss function based on entropy, and better guides the network gradient descending process;
3. the invention increases the verification of the network training result in the aspect of information entropy, and better ensures the validity and rationality of the network training result;
4. the method and the thought provided by the invention are not limited to the optimization of a single deep learning task or a single neural network, are not restricted by different neural network structures, and can be applied to various deep neural networks.
Drawings
FIG. 1 is a flow chart of an entropy optimization method based on deep neural network information entropy estimation according to the present invention;
FIG. 2 is a neural network probabilistic model of the present invention;
FIG. 3 is a block diagram of the entropy optimizer of the present invention.
Detailed Description
Before describing the embodiments of the present invention, the related terms related to the embodiments of the present invention are first explained as follows:
information entropy: the magnitude of the average amount of information used to represent the network output is an expectation of the amount of information the network output has, a measure of the amount of information the network output has, and also a measure of the average uncertainty and complexity of the output.
Differential entropy: the information entropy is obtained by calculating the continuous random variable from the popularization of the information entropy calculated by the discrete random variable.
Source coding: source coding is a transformation of source symbols for the purpose of increasing communication efficiency, or for reducing or eliminating source redundancy.
Point cloud: a series of discrete three-dimensional point data obtained for the surface profile of an object in space, by a lidar or like device, contains (x, y, z) coordinate information.
RGB image: the image data collected by the camera is a three-channel image.
A convolutional neural network: is a type of feedforward neural network that contains convolution calculations and has a depth structure.
The invention aims to perform segmentation and interpretation on a network layer of a deep neural network, quantitatively evaluate a network training process and a final result, guide a network gradient descent process and selection of network structure hyper-parameters, prevent overfitting and approach a model performance boundary.
The invention relates to a method for quantitatively explaining the training process of a neural network by utilizing a mode of calculating entropy under the scene of multi-modal data feature extraction, and guiding the training and optimizing directions of the neural network, comprising the following two aspects of work: one aspect is probabilistic and communicative modeling of deep neural network models and processes of multi-modal data feature extraction, and obtaining expectations about model information entropy. Another aspect is to calculate and estimate the entropy of information during network training and to use it as a direction for guiding neural network training and optimization based on the desired to entropy-related loss function of the above entropy.
The invention provides a universal entropy optimization method based on deep neural network information entropy estimation, which comprises the following steps:
1. and modeling input and output data of the deep neural network based on a communication theory to obtain corresponding expectation and constraint on information entropy.
2. And establishing a probability model for the calculation and training process of the deep neural network according to the network structure of each layer of the deep neural network.
3. And calculating the information entropy output by each layer of network in the training process and the information entropy finally output by the network.
4. And according to the expectation and the constraint of the entropy, establishing an entropy loss function and guiding the process of network training and the direction of optimization.
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, embodiment 1 of the present invention provides an entropy optimization method based on deep neural network information entropy estimation, which utilizes a way of calculating entropy to quantitatively explain a neural network training process, so as to guide the direction of neural network training and optimization, and apply the method in a multi-modal data feature extraction scenario.
The specific implementation steps are as follows:
to summarize: when the deep neural network is used for feature extraction of multi-modal data, the interpretability of the network is poor, and the feature extraction process is not transparent. And (3) carrying out segmentation and interpretation on the network layer by adopting an entropy optimization method, quantitatively evaluating the process and the final result of feature extraction, guiding the gradient descent process of the network and the selection of the super-parameters of the network structure, preventing overfitting and approaching the performance boundary of the model.
Step 1) obtaining expectation and constraint on information entropy.
For the neural network, the neural network has strong learning capability, and the complex nonlinear relation between an input layer and expected output is learned through continuous iteration and adjustment of parameters in the network training process, so that the requirements of different tasks are met. Therefore, the training process of the neural network can be regarded as continuously searching the relation between the input data and the expected output under the constraint condition according to the input data, thereby achieving the learning effect. It is believed that before the neural network is untrained, the "connections" it can learn are uncertain, i.e., the final output is uncertain, and when the neural network is trained with data, the "connections" between the inputs and outputs it learns are continuously strengthened, and the certainty of the network output is continuously improved.
In order to quantify and measure this uncertainty of neural networks, the concept of information entropy is introduced to represent the size of the average information volume output by the network. It is an expectation of the amount of information that the network output has, the size of which is a measure of the amount of information that the network outputs, and also of the average uncertainty and complexity of the output. Therefore, the final output uncertainty of the neural network is the largest when the neural network is not trained, namely, the information entropy is the largest. The 'connection' between the input and the output is continuously learned in the network training process, and the final output uncertainty is continuously reduced, namely, the process of information entropy reduction is realized. One of the expectations about entropy in the neural network training process is thus derived: and the entropy value of the output layer of the training neural network is decreased.
In the task of multi-modal information feature extraction, it is expected that abstract features can be extracted, structured data is formed to represent all information of original input, and the effect of removing redundant parts in original data is achieved. With reference to a communication model, the concept of source coding is introduced into the task, and the feature extraction of the multi-modal data is regarded as a source coding process. To ensure that the network can extract all the information in the data completely, the source coding should be lossless coding without losing information amount, i.e. the original input of the network and the output of the feature extraction should have the same information amount. Thus, another expectation of feature extraction on information entropy is derived: the output of the network layer is the same as the information entropy of the original input.
Step 2) establishing a neural network probability model
As shown in fig. 2, calculating the entropy-optimized loss function according to the expectation of entropy requires information entropy estimation for each layer of output in the feature extraction network. Finding the method of computing entropy for the output of each layer of the network requires probabilistic modeling of the neural network and is applicable to the general neural network. In the multi-modal information feature extraction task, the condition that multi-modal input information is an image and a point cloud is considered, the point cloud image is projected to a two-dimensional generation Front View (FV) and a Bird's Eye View (BEV), and the image and the point cloud image are respectively input into a convolution network for feature extraction. The following settings are now made for the convolutional neural network:
the channel generated by convolution of the image by the convolution check is regarded as a multi-dimensional continuous random variable X, each channel { X1, X2.., xi } is regarded as a sample (i is the number of channels) of the random variable X, and the number of pixels of each channel is the dimension d of the continuous random variable X. The proposed probabilistic modeling method for the neural network can be applied to other deep neural networks in an extended way, namely, the output of each layer of the network is regarded as a continuous random variable, and the actual output of the layer is taken as the sampling of the continuous random variable. (taking MLP as an example, in an MLP network, a continuous random variable X is the output of each hidden layer, and each neuron sees a sample of the random variable X (i.e., the random variable X is a continuous random variable of 1 dimension).
Step 3) calculating the information entropy of the network layer and the final output
Through the modeling of the neural network probability, the problem of entropy estimation of each layer of the neural network is converted into that: the entropy is estimated from the samples without knowing the probability distribution of the continuous random variable X.
There are many ways to solve the above problem, which essentially calculates the differential entropy of continuous random variables. Differential entropy (also known as continuous entropy) is a concept in information theory that originates from shannon's attempt to extend his concept of shannon entropy to a continuous probability distribution. Let a random variable X whose domain of the probability density function f is the set of X. The differential entropy h (x) is defined as:
since in this problem the probability distribution of the random variable is not known in advance, the probability density function is not known, only a limited number of sample values are sampled for it. The information entropy is calculated by selecting a K-near entropy estimation method.
The "K-near entropy estimation" method is described below:
discretizing a continuous variable by sampling, in order to make n samples sampled approximately represent the entire sample space, each sample point is extended as a d-dimensional hyper-sphere, the radius of which is the distance between the sample point and its nearest sample point. When the variables are completely evenly distributed in the sample space, the probability for each sample point can be approximately 1/n. Since the distribution of the random variable in the sample space is unknown, and there may be a large difference from the uniform distribution, the distribution of the random variable in the sample space is corrected by using the position distribution of the sample in the space. That is, when the sample point is closer to the nearest sample point in the vicinity, the tendency is closer to the distribution of the variables in the vicinity of the sample point area. Conversely, when the sample point is further away from the nearest surrounding sample point, the trend is more sparse than the distribution of the variables around the sample point area. The density and sparseness of the variables in different sample space regions directly affect the probability density near each sample point. The discrete probability for each sample point is estimated as:
wherein m is the number of samples, r d (x i ) Is a sample x i The d-dimensional euclidean distance to its nearest sample point,is the unit sphere volume in d-dimensional space.
The estimate of the entropy of the random variable X is then:
The K-neighbor entropy estimation method is such that when the distance of each sample point from its nearest sample point is extended to the distance from its nearest kth sample point, the estimation of the entropy of the random variable X becomes:
where ψ (·) is a Digamma function, ψ (1) = - γ, ψ (n) _ lg (n-1), and (phi) denotes approximately equal.Is the d-dimensional euclidean distance between the sample xi to the k-th sample point closest thereto. It can be shown that when k =1,andand equivalence.
The K-nearest neighbor entropy estimation has better estimation on the entropy when the dimension of the variable is low, but the deviation of the entropy estimation is larger as the dimension is increased continuously. The reason is mainly as follows:
1. if the distribution of the random variable is limited by the boundary, the neighborhood of the sample near the boundary may exceed the boundary (i.e., the estimated range of the sample is larger than the distribution range of the actual variable), resulting in the entropy being overestimated.
2. k-nearest estimates assume a uniform distribution for the probability distribution in the neighborhood of the sample, which may be more distorted as the dimension of the sample increases, thereby producing larger errors.
The improvements and corrections for the above deviations are as follows:
1. neighborhood sphere volume representing sample xi,Representing random variablesThe boundary constraint of (b) represents a boundary constraint of the random variable X. The modified entropy estimate is:
wherein d is the dimension of the sample,mis the total number of sampled samples per layer,is the Euclidean distance, V, between the d-dimensional sample xi and the nearest kth sample point d Is the volume of the d-dimensional unit sphere,is composed ofA function;for the purpose of the correction term of the entropy estimation,denotes the radius around the sample xi isA neighborhood sphere volume of;the entropy of the information output for each layer of the network,
2. the sphere neighborhood of the traditional k-nearest neighbor estimation is changed into the ellipsoid neighborhood of each sample point. Taking the ellipsoid neighborhood of the sample point xi as an example, first, k sample points around xi that are closest to xi are selected. A total of k +1 sample points, including xi, are PCA processed. Namely, a covariance matrix of the d-dimensional random variable is calculated by using k +1 sample points, and d eigenvectors of the covariance matrix are calculated. And taking the directions of the d eigenvectors as the axes of the d-dimensional ellipsoid, searching a sample point which is farthest away along the direction of each eigenvector in the selected k +1 samples, and taking the distance of the sample point in the direction as the radius of the ellipsoid on the axis. Thus determining the ellipsoid neighborhood of xi. The radii of the d-dimensional ellipsoids are r from large to small 1 (xi), r 2 (xi), ..., r D (xi) And (4) dividing. The correction term for sample xi is:
adding correction terms for the two errors to obtain corrected entropy estimationComprises the following steps:
step 4) establishing a loss function of entropy
According to the two expectations and the entropy estimation calculation result of the network layer, a loss function of model training can be designed to optimize the feature extraction process. According to the final output and input information entropy unchanged, design loss function:
According to the decreasing of the information entropy output by the network in the training process, the judgment of the first timeThe entropy of the information output after the secondary training is larger than that of the information output after the secondary trainingThen, the loss function is increased:
Wherein the content of the first and second substances,is as followsqThe information entropy output after the secondary training;
will be provided with、As an auxiliary item, a loss function (such as cross entropy loss) formed by combining other constraints of the network forms a loss function of the whole neural network training.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (7)
1. An entropy optimization method based on deep neural network information entropy estimation is characterized by comprising the following steps:
step 1) modeling input data and output data of a deep neural network to be optimized based on a communication theory to obtain expectation and constraint on information entropy; the deep neural network comprises an n-layer network layer, and the n-layer network layer is an output layer;
step 2) establishing a probability model for the training process of the deep neural network according to each layer of network structure of the deep neural network;
step 3) calculating the information entropy output by each layer of the deep neural network in the training process by adopting a K-near entropy estimation method;
and 4) establishing a loss function of the information entropy according to the expectation and the constraint of the information entropy, and guiding the training process and the optimization direction of the deep neural network.
2. An entropy optimization method based on deep neural network information entropy estimation according to claim 1, wherein the step 1) information entropy expectation and constraint comprises: in each round of training, the entropy value of the output layer of the deep neural network is decreased progressively; and the output of each layer of the trained network layer is the same as the information entropy input by the deep neural network.
3. An entropy optimization method based on deep neural network information entropy estimation according to claim 2, wherein the probability model of the step 2) comprises:
for the deep neural network with n layers in total, the output of each layer of the network layer is taken as a multi-dimensional continuous random variableThe ith channel of each layer is taken as a multi-dimensional continuous random variableThe number d of the pixels of each channel is the dimension d of xi, and each layer has m sampling samples.
4. An entropy optimization method based on deep neural network information entropy estimation according to claim 3, wherein the K-near entropy estimation method of the step 3) comprises:
calculating the sphere neighborhood radius of the sampling sample xi according to the following formula:
Wherein the content of the first and second substances,is the Euclidean distance, V, between the d-dimensional sample xi and the nearest k-th sample point d Is the volume of the d-dimensional unit sphere,is composed ofA function;
wherein the content of the first and second substances,denotes the radius around the sample xi isThe volume of the neighborhood sphere of (a),representing random variablesThe boundary constraint of (2);
obtaining the information entropy output by each layer of the network according to the following formulaComprises the following steps:
5. An entropy optimization method based on deep neural network information entropy estimation according to claim 4, wherein the step 3) comprises:
Step 3-2) for eachTraversing each sampling sample xi of the network layer where the sampling sample xi is located, and determining an ellipsoid neighborhood of each sample xi; the radii of the d-dimensional ellipsoids of the sampling samples xi are sorted from large to small to obtain correction terms of the sampling samples xiCombining information entropy output by each layer of network layerObtaining a modified entropy estimate。
6. An entropy optimization method based on deep neural network information entropy estimation according to claim 5, wherein the step 3-2) comprises:
step 3-2-1) selecting k sample points which are close to xi and are close to xi, carrying out PCA (principal component analysis) treatment on k +1 sample points including xi, calculating a covariance matrix of d-dimensional random variables by using the k +1 sample points, and calculating d eigenvectors of the covariance matrix;
step 3-2-2) taking the directions of the d eigenvectors as the axes of the d-dimensional ellipsoid, searching a sample point with the farthest distance along the direction of each eigenvector in the selected k +1 samples, and taking the distance of the sample point in the direction as the radius of the ellipsoid on the axis, thereby determining the ellipsoid neighborhood of the sampling sample xi;
step 3-2-3) sequentially sorting the radii of the d-dimensional ellipsoids from large to small, so that the correction term of the sample xiComprises the following steps:
According to the information entropy of each layer network layerObtaining a modified entropy estimateComprises the following steps:
7. an entropy optimization method based on deep neural network information entropy estimation according to claim 6, wherein the step 4) comprises:
Wherein the content of the first and second substances,as the information entropy of the original input data,the information entropy output by the j layer of the deep neural network is obtained, and n is the number of network layers;
according to the decreasing of entropy value of the output layer of the deep neural network in each round of training, the judgment result is that the entropy value is the first oneThe entropy of the information output after the sub-training is larger than that of the information output after the sub-trainingThen, the loss function is increased:
Wherein the content of the first and second substances,are respectively the firstSecond and thirdThe information entropy output after the secondary training;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210924688.7A CN115019061A (en) | 2022-08-03 | 2022-08-03 | Entropy optimization method based on deep neural network information entropy estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210924688.7A CN115019061A (en) | 2022-08-03 | 2022-08-03 | Entropy optimization method based on deep neural network information entropy estimation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115019061A true CN115019061A (en) | 2022-09-06 |
Family
ID=83065323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210924688.7A Pending CN115019061A (en) | 2022-08-03 | 2022-08-03 | Entropy optimization method based on deep neural network information entropy estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115019061A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022583A (en) * | 2016-05-12 | 2016-10-12 | 中国电力科学研究院 | Electric power communication service risk calculation method and system based on fuzzy decision tree |
CN109189921A (en) * | 2018-08-07 | 2019-01-11 | 阿里巴巴集团控股有限公司 | Comment on the training method and device of assessment models |
CN110320806A (en) * | 2019-07-24 | 2019-10-11 | 东北大学 | Sewage disposal process adaptive prediction control method based on integrated instant learning |
CN110531313A (en) * | 2019-08-30 | 2019-12-03 | 西安交通大学 | A kind of near-field signals source localization method based on deep neural network regression model |
CN110690912A (en) * | 2019-10-10 | 2020-01-14 | 宾斌 | Single-beam device, near-field communication of self-organizing computing network and construction method |
CN110929802A (en) * | 2019-12-03 | 2020-03-27 | 北京迈格威科技有限公司 | Information entropy-based subdivision identification model training and image identification method and device |
CN112364975A (en) * | 2020-10-14 | 2021-02-12 | 山东大学 | Terminal operation state prediction method and system based on graph neural network |
CN113011722A (en) * | 2021-03-04 | 2021-06-22 | 中国工商银行股份有限公司 | System resource data allocation method and device |
-
2022
- 2022-08-03 CN CN202210924688.7A patent/CN115019061A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022583A (en) * | 2016-05-12 | 2016-10-12 | 中国电力科学研究院 | Electric power communication service risk calculation method and system based on fuzzy decision tree |
CN109189921A (en) * | 2018-08-07 | 2019-01-11 | 阿里巴巴集团控股有限公司 | Comment on the training method and device of assessment models |
CN110320806A (en) * | 2019-07-24 | 2019-10-11 | 东北大学 | Sewage disposal process adaptive prediction control method based on integrated instant learning |
CN110531313A (en) * | 2019-08-30 | 2019-12-03 | 西安交通大学 | A kind of near-field signals source localization method based on deep neural network regression model |
CN110690912A (en) * | 2019-10-10 | 2020-01-14 | 宾斌 | Single-beam device, near-field communication of self-organizing computing network and construction method |
CN110929802A (en) * | 2019-12-03 | 2020-03-27 | 北京迈格威科技有限公司 | Information entropy-based subdivision identification model training and image identification method and device |
CN112364975A (en) * | 2020-10-14 | 2021-02-12 | 山东大学 | Terminal operation state prediction method and system based on graph neural network |
CN113011722A (en) * | 2021-03-04 | 2021-06-22 | 中国工商银行股份有限公司 | System resource data allocation method and device |
Non-Patent Citations (1)
Title |
---|
黎旭: "代理模型技术及其在飞行器可靠性优化中的应用研究", 《中国博士学位论文全文数据库 (工程科技Ⅱ辑)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11481585B2 (en) | Segmentation of data | |
CN108388896B (en) | License plate identification method based on dynamic time sequence convolution neural network | |
CN111583263B (en) | Point cloud segmentation method based on joint dynamic graph convolution | |
CN109800692B (en) | Visual SLAM loop detection method based on pre-training convolutional neural network | |
CN112488025B (en) | Double-temporal remote sensing image semantic change detection method based on multi-modal feature fusion | |
CN110880010A (en) | Visual SLAM closed loop detection algorithm based on convolutional neural network | |
CN112052802A (en) | Front vehicle behavior identification method based on machine vision | |
CN112084895B (en) | Pedestrian re-identification method based on deep learning | |
CN112926696A (en) | Interpretable local migration mutual learning method based on attention diagram | |
CN114419413A (en) | Method for constructing sensing field self-adaptive transformer substation insulator defect detection neural network | |
CN113962281A (en) | Unmanned aerial vehicle target tracking method based on Siamese-RFB | |
Hu et al. | A video streaming vehicle detection algorithm based on YOLOv4 | |
CN114283325A (en) | Underwater target identification method based on knowledge distillation | |
CN116152554A (en) | Knowledge-guided small sample image recognition system | |
CN115048870A (en) | Target track identification method based on residual error network and attention mechanism | |
CN111325259A (en) | Remote sensing image classification method based on deep learning and binary coding | |
Chen et al. | A finger vein recognition algorithm based on deep learning | |
CN111950635A (en) | Robust feature learning method based on hierarchical feature alignment | |
CN111578956A (en) | Visual SLAM positioning method based on deep learning | |
CN115019061A (en) | Entropy optimization method based on deep neural network information entropy estimation | |
CN116109649A (en) | 3D point cloud instance segmentation method based on semantic error correction | |
CN115578574A (en) | Three-dimensional point cloud completion method based on deep learning and topology perception | |
CN113222867B (en) | Image data enhancement method and system based on multi-template image | |
CN113469133A (en) | Deep learning-based lane line detection method | |
Ahuja et al. | Convolutional Neural Network and Kernel Extreme Learning Machine for Face Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220906 |
|
RJ01 | Rejection of invention patent application after publication |