Image classification method based on gated neural network information fusion
Technical Field
The invention belongs to the field of machine learning and neural networks, and particularly relates to a gated information fusion method for a neural network.
Background
Neural Networks (NN) have achieved good results in many fields such as speech recognition, natural language recognition, image processing, and pattern recognition.
Multitasking, multi-branch neural networks are becoming more popular, and several popular neural network models, such as ResNet (He et, 2016), densnet (huang et, 2017), GRU (Cho et, 2014), etc., all introduce the operation of fusing two-branch information into one branch. On the comprehensive system of a robot, an unmanned aerial vehicle and an automatic driving system, a plurality of branches, a plurality of tasks and other complex neural network models are more and more common, and the information fusion of the neural network models is particularly important in the applications. Most of the existing information fusion methods use splicing (Concatenation) or weighted average as a fusion strategy: the use of splicing can cause the feature dimension to be greatly increased, and a large amount of computing resources are needed; weighted averaging, as a simple linear combination method, cannot fit a nonlinear fusion function.
HeK,ZhangX,Ren S,etal.DeepResidualLearningforImageRecognition[A].IEEEConferenceon ComputerVisionandPatternRecognition[C].LasVegas,NV,UnitedStates:IEEE,2016:770–778.
HuangG,LiuZ,van derMaatenL,etal.DenselyConnectedConvolutionalNetworks[A].IEEE ConferenceonComputerVisionandPatternRecognition[C].Honolulu,HI,USA:IEEE,2017:2261–2269.
ChoK,vanMerrienboerB,BahdanauD,etal.OntheProperties ofNeuralMachineTranslation:Encoder-DecoderApproaches[A].Workshop on Syntax,SemanticsandStructure in StatisticalTranslation[C].Doha,Qatar:2014.
Disclosure of Invention
The invention aims to provide a neural network information fusion method with small calculated amount and strong fitting capability. The technical scheme of the invention is as follows:
a gated neural network information fusion method comprises the following steps:
1) giving the neural network feature tensor I which needs to be fused1,I2,…,InN in total, and the tensors are called fusion input;
2) determining the dimensionality of the output tensor, and recording the output tensor as O;
3) calculating a neural network including transformation and activation for each input, so that the input has the same dimensionality as the output tensor O;
4) selecting a proper fusion evidence to input: the fusion evidence is a feature tensor for calculating fusion weight for controlling each fusion input, and the fusion evidence of the ith input is recorded as EiThen there is a pair input I1,I2,…,InHaving E of1,E2,…,En;
5) For the fusion of the ith path, pair EiPerforming neural network calculation;
6) for calculated fused evidence EiActivation is performed to obtain fusion weight αi;
7) Fuse weights αiAnd input IiMultiplying;
8) the paths are combined linearly or non-linearly into an output tensor O.
The invention has the substantial characteristics that: by introducing fusion evidence and nonlinear operation, the nonlinear intelligent fusion method capable of fusing any multi-path information and heterogeneous multi-source information is provided, fine-grained fusion of any characteristic tensor can be robustly carried out, and the method can be used for improving any neural network model and some non-neural network models. The beneficial effects are as follows:
1. it is applicable to all neural networks and some non-neural network methods.
2. Compared with the existing fusion method, the invention achieves better fusion performance and provides a fusion strategy of multipath, nonlinear and fine-grained control.
3. The method is simple to realize and has little influence on the existing structure.
Drawings
FIG. 1 Structure of the invention
FIG. 2 embodiment of the fusion construct
Detailed Description
The method of the invention can perform the feature tensor fusion element by element, the size and the quantity of the input and output tensors are flexible and variable, the fusion strategy is self-learning and has strong expression capability, and the method is not limited to a certain neural network and a certain network structure,
has stronger universality and practicability. In order to solve the above problems and achieve the above object, the technical solution of the present invention is as follows:
1) given any number of neural network feature tensors to be fused, these inputs I are recorded1,I2,…,InA total of n (yellow arrows in fig. 1), these inputs are referred to as fusion inputs;
2) determining the dimensionality of the output tensor, and recording the output tensor as O;
3) calculating (including transforming, activating and the like) each input by using a neural network method so that the input has the same dimension as the output tensor O;
4) selecting a proper fusion evidence to input: fused evidence refers to controlThe feature tensor for calculating the fusion weight for this fusion input (green arrow in FIG. 1) is recorded as the fusion evidence for the ith input as EiThen there is a pair input I1,I2,…,InHaving E of1,E2,…,En;
5) For the fusion of the ith way, optionally, for EiPerforming neural network calculations ("arbitrary functions" in fig. 1);
6) for calculated fused evidence EiActivating (e.g. Sigmoid, tanh, ReLU, etc.) (the "activation function" in FIG. 1) to obtain fusion weight αi;
7) Fuse weights αiAnd input IiMultiplication (symbol "x" in fig. 1);
8) the paths are combined linearly or non-linearly into an output tensor O (the "+" sign in figure 1).
9) In particular, when the data to be fused has only two paths, the fusion evidence of the two paths can be shared and α and 1- α can be used as the weight of the two data to be fused (as shown in fig. 1).
This section will be based on the inclusion-v 4 network architecture featuring multiple parallel branches as proposed by szegdy et al, 2016. it is clear that the invention is not limited to an infrastructure, which is only one example.
SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,Inception-ResNet andthe Impact ofResidual Connections on Learning[C]//AAAI Conference onArtificial Intelligence.San Francisco,CA,USA:AAAI,2017.
(1) Suitable training data is prepared, the training data of the present example including training images and class labels.
(2) And establishing an inclusion-v 4 basic network.
(3) The data to be fused is determined (fig. 2), in this example a three-input fusion unit is used. Specifically, parallel branches of each unit in the inclusion-v 4 are used as fusion inputs.
(4) And selecting fusion evidence. In particular, in this example, the output of the previous unit in the inclusion-v 4 is taken as the fusion evidence of the unit, and a convolution and linear rectification unit (ReLU) is added in sequence as a fusion control branch.
(5) In order to obtain different receptive fields, the convolutions added by the convolution branches with large, medium and small scales in the inclusion-v 4 are respectively 3 × 3 convolution, 3 × 3 dilated convolution with a dilation rate of 2 and 3 × 3 dilated convolution with a dilation rate of 4.
(6) Evidence of fusion is activated. At the end of the fusion control branch of the previous step, tanh is added as an activation function.
(7) Each way fused input is multiplied by a corresponding fusion weight.
(8) And adding the obtained data of each path.
(9) Inputting the training data obtained in the step 1 into the obtained neural network, using an optimization method of mini-batch stochastic gradient descent (mini-batch SGD), selecting the sum of cross entropy loss and weight attenuation loss as a loss item, setting each batch of 32 training images and a weight attenuation coefficient of 0.01, and training until a loss function value is converged by descending in an exponential form of 0.95 power every 1 generation from 0.001.
(10) And (4) storing the neural network weights obtained by training in the step 9.
(11) Inputting the image to be detected into the neural network model obtained in the step 10, and obtaining a prediction result, namely a classification result of the image to be detected.