CN109063765B

CN109063765B - Image classification method based on gated neural network information fusion

Info

Publication number: CN109063765B
Application number: CN201810835370.5A
Authority: CN
Inventors: 庞彦伟; 孙汉卿
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2020-03-27
Anticipated expiration: 2038-07-26
Also published as: CN109063765A

Abstract

The invention relates to a neural network information fusion method based on gating, which comprises the following steps: giving neural network characteristic tensors needing to be fused, and referring the tensors as fusion input; determining the dimensionality of the output tensor, and recording the output tensor as O; calculating a neural network including transformation and activation for each input, so that the input has the same dimensionality as the output tensor O; selecting a proper fusion evidence to input: the fusion evidence is a feature tensor for calculating fusion weight for controlling each fusion input, and the fusion evidence of the ith input is recorded as E_i(ii) a For the fusion of the ith path, pair E_iPerforming neural network calculation; for calculated fused evidence E_iActivation is performed to obtain fusion weight α_iFusing the weight α_iAnd input I_iMultiplying; the paths are combined linearly or non-linearly into an output tensor O.

Description

Image classification method based on gated neural network information fusion

Technical Field

The invention belongs to the field of machine learning and neural networks, and particularly relates to a gated information fusion method for a neural network.

Background

Neural Networks (NN) have achieved good results in many fields such as speech recognition, natural language recognition, image processing, and pattern recognition.

Multitasking, multi-branch neural networks are becoming more popular, and several popular neural network models, such as ResNet (He et, 2016), densnet (huang et, 2017), GRU (Cho et, 2014), etc., all introduce the operation of fusing two-branch information into one branch. On the comprehensive system of a robot, an unmanned aerial vehicle and an automatic driving system, a plurality of branches, a plurality of tasks and other complex neural network models are more and more common, and the information fusion of the neural network models is particularly important in the applications. Most of the existing information fusion methods use splicing (Concatenation) or weighted average as a fusion strategy: the use of splicing can cause the feature dimension to be greatly increased, and a large amount of computing resources are needed; weighted averaging, as a simple linear combination method, cannot fit a nonlinear fusion function.

HeK,ZhangX,Ren S,etal.DeepResidualLearningforImageRecognition[A].IEEEConferenceon ComputerVisionandPatternRecognition[C].LasVegas,NV,UnitedStates:IEEE,2016:770–778.

HuangG,LiuZ,van derMaatenL,etal.DenselyConnectedConvolutionalNetworks[A].IEEE ConferenceonComputerVisionandPatternRecognition[C].Honolulu,HI,USA:IEEE,2017:2261–2269.

ChoK,vanMerrienboerB,BahdanauD,etal.OntheProperties ofNeuralMachineTranslation:Encoder-DecoderApproaches[A].Workshop on Syntax,SemanticsandStructure in StatisticalTranslation[C].Doha,Qatar:2014.

Disclosure of Invention

The invention aims to provide a neural network information fusion method with small calculated amount and strong fitting capability. The technical scheme of the invention is as follows:

a gated neural network information fusion method comprises the following steps:

1) giving the neural network feature tensor I which needs to be fused₁,I₂,…,I_nN in total, and the tensors are called fusion input;

2) determining the dimensionality of the output tensor, and recording the output tensor as O;

3) calculating a neural network including transformation and activation for each input, so that the input has the same dimensionality as the output tensor O;

4) selecting a proper fusion evidence to input: the fusion evidence is a feature tensor for calculating fusion weight for controlling each fusion input, and the fusion evidence of the ith input is recorded as E_iThen there is a pair input I₁,I₂,…,I_nHaving E of₁,E₂,…,E_n；

5) For the fusion of the ith path, pair E_iPerforming neural network calculation;

6) for calculated fused evidence E_iActivation is performed to obtain fusion weight α_i；

7) Fuse weights α_iAnd input I_iMultiplying;

8) the paths are combined linearly or non-linearly into an output tensor O.

The invention has the substantial characteristics that: by introducing fusion evidence and nonlinear operation, the nonlinear intelligent fusion method capable of fusing any multi-path information and heterogeneous multi-source information is provided, fine-grained fusion of any characteristic tensor can be robustly carried out, and the method can be used for improving any neural network model and some non-neural network models. The beneficial effects are as follows:

1. it is applicable to all neural networks and some non-neural network methods.

2. Compared with the existing fusion method, the invention achieves better fusion performance and provides a fusion strategy of multipath, nonlinear and fine-grained control.

3. The method is simple to realize and has little influence on the existing structure.

Drawings

FIG. 1 Structure of the invention

FIG. 2 embodiment of the fusion construct

Detailed Description

The method of the invention can perform the feature tensor fusion element by element, the size and the quantity of the input and output tensors are flexible and variable, the fusion strategy is self-learning and has strong expression capability, and the method is not limited to a certain neural network and a certain network structure,

has stronger universality and practicability. In order to solve the above problems and achieve the above object, the technical solution of the present invention is as follows:

1) given any number of neural network feature tensors to be fused, these inputs I are recorded₁,I₂,…,I_nA total of n (yellow arrows in fig. 1), these inputs are referred to as fusion inputs;

3) calculating (including transforming, activating and the like) each input by using a neural network method so that the input has the same dimension as the output tensor O;

4) selecting a proper fusion evidence to input: fused evidence refers to controlThe feature tensor for calculating the fusion weight for this fusion input (green arrow in FIG. 1) is recorded as the fusion evidence for the ith input as E_iThen there is a pair input I₁,I₂,…,I_nHaving E of₁,E₂,…,E_n；

5) For the fusion of the ith way, optionally, for E_iPerforming neural network calculations ("arbitrary functions" in fig. 1);

6) for calculated fused evidence E_iActivating (e.g. Sigmoid, tanh, ReLU, etc.) (the "activation function" in FIG. 1) to obtain fusion weight α_i；

7) Fuse weights α_iAnd input I_iMultiplication (symbol "x" in fig. 1);

8) the paths are combined linearly or non-linearly into an output tensor O (the "+" sign in figure 1).

9) In particular, when the data to be fused has only two paths, the fusion evidence of the two paths can be shared and α and 1- α can be used as the weight of the two data to be fused (as shown in fig. 1).

This section will be based on the inclusion-v 4 network architecture featuring multiple parallel branches as proposed by szegdy et al, 2016. it is clear that the invention is not limited to an infrastructure, which is only one example.

SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,Inception-ResNet andthe Impact ofResidual Connections on Learning[C]//AAAI Conference onArtificial Intelligence.San Francisco,CA,USA:AAAI,2017.

(1) Suitable training data is prepared, the training data of the present example including training images and class labels.

(2) And establishing an inclusion-v 4 basic network.

(3) The data to be fused is determined (fig. 2), in this example a three-input fusion unit is used. Specifically, parallel branches of each unit in the inclusion-v 4 are used as fusion inputs.

(4) And selecting fusion evidence. In particular, in this example, the output of the previous unit in the inclusion-v 4 is taken as the fusion evidence of the unit, and a convolution and linear rectification unit (ReLU) is added in sequence as a fusion control branch.

(5) In order to obtain different receptive fields, the convolutions added by the convolution branches with large, medium and small scales in the inclusion-v 4 are respectively 3 × 3 convolution, 3 × 3 dilated convolution with a dilation rate of 2 and 3 × 3 dilated convolution with a dilation rate of 4.

(6) Evidence of fusion is activated. At the end of the fusion control branch of the previous step, tanh is added as an activation function.

(7) Each way fused input is multiplied by a corresponding fusion weight.

(8) And adding the obtained data of each path.

(9) Inputting the training data obtained in the step 1 into the obtained neural network, using an optimization method of mini-batch stochastic gradient descent (mini-batch SGD), selecting the sum of cross entropy loss and weight attenuation loss as a loss item, setting each batch of 32 training images and a weight attenuation coefficient of 0.01, and training until a loss function value is converged by descending in an exponential form of 0.95 power every 1 generation from 0.001.

(10) And (4) storing the neural network weights obtained by training in the step 9.

(11) Inputting the image to be detected into the neural network model obtained in the step 10, and obtaining a prediction result, namely a classification result of the image to be detected.

Claims

1. A gated neural network information fusion-based image classification method comprises the following steps:

(1) preparing training data, wherein the training data comprises training images and class labels;

(2) building an inclusion-v 4 basic network;

(3) taking parallel branches of each unit in increment-v 4 as fusion input;

(4) selecting a fusion evidence; taking the output of the previous unit in the inclusion-v 4 as the fusion evidence of the unit, and sequentially adding a convolution and linear rectification unit (ReLU) as a fusion control branch;

(5) in order to obtain different receptive fields, the convolutions added for convolution branches with large, medium and small scales in inclusion-v 4 are respectively 3 × 3 convolution, 3 × 3 expansion convolution with an expansion rate of 2 and 3 × 3 expansion convolution with an expansion rate of 4;

(6) activation of fusion evidence: at the end of the fusion control branch in the step (4), adding tanh as an activation function;

(7) multiplying each way of fusion input by the corresponding fusion weight;

(8) adding the obtained data of each path;

(9) inputting the training data obtained in the step 1 into the obtained neural network, using an optimization method of mini-batch stochastic gradient descent (mini-batch SGD), selecting the sum of cross entropy loss and weight attenuation loss as a loss item, setting each batch of 32 training images and weight attenuation coefficient of 0.01, and training until a loss function value is converged, wherein the learning rate is decreased in an exponential form of 0.95 power every 1 generation from 0.001;

(10) saving the neural network weight obtained by training in the step (9);

(11) inputting the image to be detected into the neural network model obtained in the step (10), wherein the obtained prediction result is the classification result of the image to be detected;

the image classification method adopts a neural network information fusion method based on gating, and comprises the following steps:

5) For the fusion of the ith path, pair E_iPerforming neural network computations；

7) Fuse weights α_iAnd input I_iMultiplying;

8) the paths are combined linearly or non-linearly into an output tensor O.