CN108664924A

CN108664924A - A kind of multi-tag object identification method based on convolutional neural networks

Info

Publication number: CN108664924A
Application number: CN201810443651.6A
Authority: CN
Inventors: 李新德; 孙振华
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-05-10
Filing date: 2018-05-10
Publication date: 2018-10-16
Anticipated expiration: 2038-05-10
Also published as: CN108664924B

Abstract

The invention discloses a kind of multi-tag object identification method based on convolutional neural networks, this method utilizes the inclusion relation between label, and according to including the CNN characteristic extraction parts for sequentially building each label successively, pass through the constantly abstract each layer feature of extraction of convolution algorithm operation, and the grader of each label is respectively set in the different depth of network, the characteristic pattern that the CNN characteristic extraction parts of respective labels are extracted, it is input to corresponding grader, simultaneously error back propagation is carried out using multiple graders, the network weight parameter of the corresponding each layer of supervised training, each label classification is finally obtained to complete to identify.The multi-tag convolutional neural networks that the present invention uses, can well solve the fusion of information between multiple labels, solve the problems, such as that traditional multi-tag object identification accuracy rate is not high, while improve the efficiency of training and identification.

Description

A kind of multi-tag object identification method based on convolutional neural networks

Technical field

The invention belongs to technical field of image processing, more particularly to a kind of multi-tag object based on convolutional neural networks is known Other method.

Background technology

With the rapid development of artificial intelligence technology, automatic object identification has become the hot spot studied both at home and abroad in recent years Problem is with a wide range of applications in fields such as intelligent monitoring, telemetering remote sensing, robot, Medical Image Processings.Really giving birth to In work, kind of object is various, and individual similarity is higher, the mankind can by visual perception shape, color and distance etc. information, and And these comprehensive information, accurately to judge object generic, but this is often relatively difficult for computer.Cause How this, make computer possess similar with the mankind, the even more than mankind recognition capabilities, it has also become current object identifies important Direction and significant challenge.

The main approaches of object identification are that extraction object features realize Object representation, then utilize certain engineering The study that algorithm carries out object type is practised, is finally classified to object, realizes object identification.But target to be identified exists The high situation of similarity, the feature extracted in this case are generally difficult to reflect class inherited and class difference greatly, between class in class Interior general character.Especially in the case where object has multiple labels, due to the limitation of traditional characteristic extracting mode, different labels are made It is often no associated between feature, it is more difficult that this makes object identification get up.

Multi-tag object identification problem relies primarily on two kinds of method to handle at present.One is based on traditional images The method of processing extracts the features such as some SIFT, HOG, SURF to subject image first, in conjunction with optimization BoW models and melt Hop algorithm converts feature, and traditional grader (such as SVM) is finally used to obtain the prediction result of object category, the party The recognition speed of method is than very fast, but recognition accuracy is relatively low.

Another kind is the method based on deep learning, and this method is mainly burning hoter using image processing field in recent years Convolutional neural networks structure, for automatically extracting the required feature of classification, recognition accuracy is than traditional images processing method It improves.But present most of convolutional neural networks structures are when handling multi-tag object identification problem, are according to each Independent network is respectively trained in label, then by each independent network, carries out classification prediction to each label respectively.The party Mainly there are two problems for method：On the one hand, since the network number of use is excessive, and the connection that is independent of each other, to increase network instruction Experienced cost, causes information redundancy, so that time efficiency is low；On the other hand, intrinsic relationship between different labels is had ignored, To have lost a part of priori, effective information is made not to be fully used, causes object identification accuracy rate not high.

Invention content

Goal of the invention：The repetition of characteristics of image is extracted for single label convolutional neural networks in the prior art, and Between each label network lack contact, the problem for causing the accuracy of object recognition algorithm not high, provide it is a kind of using each label it Between inclusion relation, carry out the feature extraction and classification of different levels, the object identification accuracy rate for solving traditional is not high The multi-tag object identification method based on convolutional neural networks of technical problem.

Technical solution：In order to solve the above technical problems, the present invention provides a kind of multi-tag object based on convolutional neural networks Body recognition methods, includes the following steps：

(1) data set used in object identification is read, the reference format of convolutional neural networks input is translated into；

(2) multi-tag convolutional neural networks model is built, and parameter initialization is carried out to the model；

(3) the multi-tag convolutional neural networks model put up is trained, continues to optimize network internal structural parameters；

(4) whether the multi-tag convolutional neural networks model in judgment step (3) after training meets training requirement, if full Sufficient then enter step (5), return to step (3) re-starts training if being unsatisfactory for；

(5) test assessment is carried out to trained multi-tag convolutional neural networks model, obtains test accuracy rate；

(6) judge whether test accuracy rate can reach A grades, step (7) is if it is carried out, if otherwise updating model Benefit soldier reenters step (2) and re-starts building, train and testing for model；

(7) final multi-tag convolutional neural networks model and parameter are exported, acquisition can be with the object identification of practical application Method.

Further, the multi-tag convolutional neural networks model by data input unit, CNN characteristic extraction parts and divides Class device part forms.

Further, CNN characteristic extraction parts need to carry out multiple convolution in the multi-tag convolutional neural networks model Local shape factor is carried out to input feature vector figure.

Further, grader part includes similar point of several structures in the multi-tag convolutional neural networks model Class device.

Further, the multi-tag convolutional neural networks model put up is trained in the step (3) specific Steps are as follows：

(3.1) parameters of MLCNN models are initialized；

(3.2) training image of current step number is read to network layer；

(3.3) it allows image stream to carry out feedforward conduction in network model, obtains training error；According to current network parameter Value, constantly carries out convolution sum pond arithmetic operation, until network successively since first convolutional layer to each image of reading Export the training penalty values of each grader；

(3.4) whether the training penalty values exported in judgment step (3.3) reach trained penalty values requirement or reach setting Step number enters step (3.5) if reaching, according to network losses value if not reaching, according to error back propagation Method, obtain the variable quantity of each layer parameter, and carry out equivalent layer parameter update for next step number feed forward operation, finally Return to step (3.2)；

(3.5) network paramter models are exported.

Further, the method initialized to the parameters of MLCNN models in the step (3.1) includes constant Initialization, Gaussian Profile initialize and are uniformly distributed initialization.

Further, test is carried out to trained multi-tag convolutional neural networks model in the step (5) to comment Estimate and is as follows：

(5.1) network paramter models obtained in step (4) are loaded into multi-tag convolutional neural networks model；

(5.2) test image of current step number is read to network layer；

(5.3) parameter for obtaining the test image of current step number according to model structure and training successively, carries out convolution etc. Feed forward operation operates, and exports the corresponding prediction classification of each image by grader part, and export prediction classification；

(5.4) judge whether current step number reaches the required minimum step number of the whole test set images of traversal, if reached It then exports the label data of all images currently preserved and enters step (5.5)；The return to step if not reaching (5.2)；

(5.5) the prediction classification for all test set images for recording front concrete class corresponding with each image carries out Comparison, statistics obtain the test set classification accuracy under the model parameter.

Compared with the prior art, the advantages of the present invention are as follows：

Multi-tag object identification method provided by the invention based on convolutional neural networks, can solve object identification in object In the case that body surface structure phase Sihe viewing angle is changeable, due to multi-tag relationship using not exclusively and acquisition of information it is not true The low problem of qualitative caused correct recognition rata, and can shorten multi-tag identifying system training structure it is required when Between.

Description of the drawings

Fig. 1 is flow chart of the method for the present invention；

Fig. 2 is the flow chart of multi-tag convolutional neural networks model training in Fig. 1；

Fig. 3 is the flow chart of multi-tag convolutional neural networks model measurement in Fig. 1；

Fig. 4 is the structural schematic diagram of multi-tag convolutional neural networks model in Fig. 1；

Fig. 5 is CNN characteristic extraction part structural schematic diagrams in embodiment；

Fig. 6 is grader structural schematic diagram in embodiment；

Fig. 7 is airplane data collection label hierarchical structure schematic diagram in embodiment.

Specific implementation mode

With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.Embodiments described herein are only A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's obtained other embodiment without making creative work belongs to the range that the present invention is protected.

The present invention proposes a kind of multi-tag object identification method based on convolutional neural networks.This method is directed to traditional convolution Neural network handle multi-tag identification when there are the problem of, using the multi-tag convolutional neural networks (Multi- of such as Fig. 4 Label Convolutional Neural Network, MLCNN) structure, using the relationship between each label, by multiple labels Feature extraction and fusion for classification to a complete network.

As shown in figure 4, MLCNN networks include mainly three parts：Data input unit, CNN characteristic extraction part Ci (i =1,2 ..., n) and grader part Li (i=1,2 ..., n), the grader of each label Li has corresponding spy Levy extractor Ci.Wherein n is number of tags possessed by the classification problem using the network, and is had as follows between each label Inclusion relation：

Wherein,Show that any one class in label Li includes one or more of label Lj classes, that is, Say that label Lj has classification that is more than label Li, more segmenting.It can be seen that similar between being subordinated to the inhomogeneity of label Lj Property bigger, there is higher identification difficulty, it is therefore desirable to which the feature more abstract than label Li also just needs to build deeper Convolutional network further carries out feature extraction.

Therefore, the characteristic extraction part C1 of lowermost layer is located in network MLCNN, is mainly responsible for the feature progress to image Preliminary extraction, gained feature are used as the input of L1 label classifiers and deeper network C 2.Then, C2 pairs of characteristic extraction part Characteristic pattern carries out the semantic abstraction of higher, and gained feature is used as the input of L2 label classifiers and deeper network C 3.According to It is secondary downwards until the characteristic extraction part Cn of label Ln and the highest semanteme most segmented.

Since initial data input, data flow first passes around expansion and pretreatment, is then carried by corresponding CNN features It takes part Ci to carry out convolution to be abstracted with after the dimensionality reduction of pond, respectively obtains and required feature of classifying is carried out to each label Li Figure, then this feature figure is input in the corresponding graders of each label Li, by 1 × 1 convolution kernel dimensionality reduction and use dropout Full articulamentum processing after, be input in softmax layers and carry out regression analysis, may finally obtain corresponding to each label Li Classification.

Each section is described in detail as follows：

1.1 data input layers

Data input layer is mainly responsible for carries out expansion and data prediction to initial data set, to enhance the various of data Property, prevent model over-fitting.This layer realize data extending method, mainly include flip horizontal, change of scale, rotation transformation and Fancy PCA etc..

After expanding data set, need to carry out center type normalization operation to input picture, i.e., to each spy Sign subtracts the average value of training set image, to highlight image individual difference.

1.2CNN feature extraction

As shown in figure 5, " CNN feature extractions " part mainly uses the core algorithm thought of CNN --- convolution is special to input Sign figure carries out local shape factor, and in order to more efficiently control the depth and adjusting parameter of network, convolution kernel size is fixed It is 3 × 3.Since the multi-tag network overall structure is deeper, to reduce the generation of gradient disperse, while avoiding ReLU functions common " dead zone " problem, therefore the part use indexation linear unit (Exponential Linear Unit, ELU) function conduct The activation primitive of characteristic value, function formula are as follows after convolution.

After multiple convolution, the characteristic pattern after convolution is carried out by the way of maximum value pond (max-pooling) It is down-sampled, to reduce the size of characteristic pattern, and ensures a degree of feature invariance and prevent over-fitting.

1.3 grader

The grader part of MLCNN networks includes altogether the similar grader of n structure, corresponds respectively to the mark of n level Label.The basic structure of the grader is as shown in Figure 6.

The characteristic pattern that characteristic extraction part obtains needs to first pass around the convolution operation dimensionality reduction that convolution kernel size is 1 × 1 Afterwards, then it is input to the full connection neuron for taking dropout strategies to be trained, the feature vector for then exporting neuron is defeated Enter to softmax layers of progress regression analysis.

Softmax recurrence is popularization of the logistic regression in multi-class classification problem, class object loss function For：

Wherein ω is network architecture parameters, x_lFor the input of first of sample, N is training sample number, and C is single label Including classification number, 1 { y_l=i } meeting y_lIt is 1 when=i, is otherwise 0.

Solution due to directly being acquired object function using mathematical method is difficult to realize, and is forced using gradient descent method The minimum point of nearly cost function, and using the corresponding parameter of required minimum point as the systematic parameter estimated value of network, ladder It is as follows to spend formula：

There is partial derivative formula above, it is brought into gradient descent algorithm, is carried out as follows in iteration each time Weight updates, until target loss or train epochs reach requirement.

2 experiment tests

2.1 data set

The present invention carries out network training and test by taking aircraft object as an example, using the airplane data collection with multi-tag, comes Detect the recognition performance of method provided by the present invention.It trains the reliability in order to ensure experimental result and is asked for multi-tag classification The applicability of topic so that experiment can accurately test out the validity of algorithm, mainly selected when data set build Boeing with Two manufacturers with Multiple Type aircraft of Airbus, while having carried out appropriate balance for the quantity of different label datas.

Image and label used by data set of the present invention, it is main to acquire from FGVC-Aircraft data sets, while basis It needs to crawl picture from network by the way of reptile by aircraft type label, respective labels data is expanded, wrap altogether Containing 10,000 aircraft brake discs.As shown in fig. 7, it includes mainly three-level label：" manufacturer ", " series ", " model ", and three has Following inclusion relation：One manufacturer includes one or more series, and a series includes one or more models, and relationship is such as Shown in figure.Such as figure, " manufacturer " includes 2 classes altogether, and " series " includes 14 classes altogether, and " model " includes 35 classes, and every aircraft brake disc all has There are unique " manufacturer ", " series ", " model " mark.

In order to assess the performance that the present invention proposes algorithm, which has carried out multiple experiments, correct in the case of statistics is a variety of Discrimination, correct recognition rata computational methods：

Wherein, P indicates correct recognition rata, n_rIt indicates correctly to identify number in test sample, N indicates total test sample Number.

2.2 experimental results and analysis

After the completion of dataset acquisition structure, 2/3rds of data set are split as training set, remaining three/ One is used as test set.Wherein training set is used as training neural network, makes network that can be updated under the supervision of object function certainly Body weight, to realize self-teaching, and test set is used as the validity of verification algorithm.

Since the data set that the present invention uses has 3 labels, so n=3 corresponding in MLCNN structures in Fig. 4, That is MLCNN used by experiment include altogether 3 CNN characteristic extraction parts (C1, C2, C3) and 3 corresponding labels (L1, L2, L3 grader).Because the number of plies of convolutional layer directly determines the ability of CNN feature extractions, and influence the identification effect of grader Fruit, so in the MLCNN networks, the number of plies of the respective convolutional layer of 3 CNN characteristic extraction parts (C1, C2, C3), is to influence it The main structure parameters of recognition effect.

In order on data set of the present invention, select the preferable MLCNN networks of recognition effect.Not for C1, C2, C3 number of plies It is distributed in proportion, different MLCNN networks is trained and is tested respectively.Simultaneously in order to which ensure between each group network can be into Row compares, and total number of plies of fixed every group of MLCNN network is 15 layers.

The experiment chooses 7 groups of numbers of plies and is distributed different MLCNN networks altogether.Wherein, it includes 5 convolution that the 1st group of network, which corresponds to, The C3 of the C1 of layer, C2 and 5 convolutional layer of 5 convolutional layers, the 2nd group of network correspond to C1,6 convolutional layers for including 6 convolutional layers C2 and 3 convolutional layer C3.Equally, the number of the included convolutional layers of rear 5 groups of networks C1, C2, C3, is followed successively by：6 layers, 3 layers, 6 Layer, 3 layers, 6 layers, 6 layers, 3 layers, 3 layers, 9 layers, 3 layers, 9 layers, 3 layers, 9 layers, 3 layers, 3 layers.The experimental result of gained is as shown in table 1.

1 different characteristic of table extracts the test result of number of plies distribution

As it can be seen from table 1 when the discrimination of most subdivision label L3 is higher, the discrimination of label L1 and L2 are also higher； The only discrimination of more final label L3 finds that the 2nd group and the 7th group of discrimination is higher than other each groups, and C1+C2=12 is at this time Highest in several groups of experiments；Only compare the discrimination of label L2, finds to be also the 2nd group and the 7th group of discrimination highest, C1 layers at this time Number is respectively 6 and 9, is not less than other each groups；Only compare the discrimination of label L1, be equally the 2nd group and the 7th group of discrimination most It is high.

The number of plies that characteristic extraction part can be summarized as follows by the experiment selects experience：

(1) when total convolution number of plies is in unlimited time, the appropriate total number of plies for increasing MLCNN can improve the accurate of most subdivision label Rate.

(2) when total convolution number of plies is limited, suitably reduce the number of plies of the last one feature extraction layer C3, increase C1's and C2 The number of plies can improve the discrimination of label L3；The appropriate number of plies for reducing C2, increases the number of plies of C1, can improve the knowledge of label L2 Not rate.It is summarized as the decrescence tactful of MLCNN feature extraction numbers of plies selection, it is as follows：

C1≥C2≥C3≥...≥Cn

Go out from the experimental results, when " CNN feature extractions " number of plies is using decrescence strategy, more can guarantee and accurately extract Each required feature of layer label classifier, to generally improve the classification accuracy of each label.

The recognition effect for considering multiple labels selects the result of the 7th group of scheme of above-mentioned experiment to be directed to this as MLCNN The optimal experimental result of invention data set.Meanwhile using other two kinds of traditional schemes, same data set is trained respectively and Test.One is SIFT feature extractions, then the mode classified with SVM；Another kind is respectively trained solely for each label Vertical CNN, for the ease of comparing, the number of plies of each CNN is corresponding with the feature extraction number of plies in MLCNN, respectively：9 layers, 12 layers With 15 layers.Experimental result is as shown in table 2.

Test result of 2 distinct methods of table in data set of the present invention

From table 2 it can be seen that MLCNN proposed by the invention is above mutually independent list in the discrimination of three labels Label C NN, especially in highest level label --- individual 15 layers of " model " CNN sorter networks are compared in " model ", are being tested Discrimination on collection improves 7.54%.This shows that MLCNN proposed by the present invention can utilize the pass between different levels label It is information, as the foundation of feature extraction, to reduce interference information, optimizes the recognition effect of network.

In order to test time efficiencies of the MLCNN in training, timing has been carried out while network training.Experiment is used Allocation of computer be：GPU (TITANXP), CPU (E5-2650) and memory (64GB).Corresponding to the independent CNN of three labels The training used time is respectively 5h, 6.5h and 7h, and the training used time of MLCNN is 7.5h.As can be seen that training one MLCNN compared with Three independent CNN of training, it is more to have saved half in time.Therefore, there has also been larger to carry in time efficiency by MLCNN It is high.

Claims

1. a kind of multi-tag object identification method based on convolutional neural networks, which is characterized in that include the following steps：

(4) whether the multi-tag convolutional neural networks model in judgment step (3) after training meets training requirement, if meeting (5) are entered step, return to step (3) re-starts training if being unsatisfactory for；

(6) judge whether test accuracy rate can reach A grades, if it is carry out step (7), if otherwise update model mends soldier It reenters step (2) and re-starts building, train and testing for model；

(7) final multi-tag convolutional neural networks model and parameter are exported, acquisition can be with the object identification method of practical application.

2. a kind of multi-tag object identification method based on convolutional neural networks according to claim 1, which is characterized in that The multi-tag convolutional neural networks model is made of data input unit, CNN characteristic extraction parts and grader part.

3. a kind of multi-tag object identification method based on convolutional neural networks according to claim 1, which is characterized in that CNN characteristic extraction parts need to carry out multiple convolution to input feature vector figure carry out office in the multi-tag convolutional neural networks model Portion's feature extraction.

4. a kind of multi-tag object identification method based on convolutional neural networks according to claim 1, which is characterized in that Grader part includes the similar grader of several structures in the multi-tag convolutional neural networks model.

5. a kind of multi-tag object identification method based on convolutional neural networks according to claim 1, which is characterized in that The multi-tag convolutional neural networks model put up is trained in the step (3) and is as follows：

(3.1) parameters of MLCNN models are initialized；

(3.2) training image of current step number is read to network layer；

(3.3) it allows image stream to carry out feedforward conduction in network model, obtains training error；According to current network parameter value, from First convolutional layer starts each image constantly to reading and carries out convolution sum pond arithmetic operation successively, until network output is each The training penalty values of grader；

(3.4) whether the training penalty values exported in judgment step (3.3) reach trained penalty values requirement or reach setting step number, (3.5) are entered step if reaching, according to network losses value if not reaching, according to the side of error back propagation Method obtains the variable quantity of each layer parameter, and carries out feed forward operation of the parameter update of equivalent layer for next step number, finally returns to Step (3.2)；

(3.5) network paramter models are exported.

6. a kind of multi-tag object identification method based on convolutional neural networks according to claim 5, which is characterized in that At the beginning of the method initialized to the parameters of MLCNN models in the step (3.1) includes constant initialization, Gaussian Profile Beginningization and it is uniformly distributed initialization.

7. a kind of multi-tag object identification method based on convolutional neural networks according to claim 1, which is characterized in that Test assessment is carried out in the step (5) to trained multi-tag convolutional neural networks model to be as follows：

(5.2) test image of current step number is read to network layer；

(5.3) parameter for obtaining the test image of current step number according to model structure and training successively, carries out the feedforwards such as convolution Arithmetic operation exports the corresponding prediction classification of each image by grader part, and exports prediction classification；

(5.4) judge whether current step number reaches the required minimum step number of the whole test set images of traversal, it is defeated if reaching Go out the label data of all images currently preserved and enters step (5.5)；Return to step (5.2) if not reaching；

(5.5) the prediction classification concrete class corresponding with each image for all test set images that front records is carried out pair Than statistics obtains the test set classification accuracy under the model parameter.