CN109102024A

CN109102024A - A kind of Layer semantics incorporation model finely identified for object and its implementation

Info

Publication number: CN109102024A
Application number: CN201810924288.XA
Authority: CN
Inventors: 聂琳; 吴文熙; 陈添水; 王青
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-08-14
Filing date: 2018-08-14
Publication date: 2018-12-28
Anticipated expiration: 2038-08-14
Also published as: CN109102024B

Abstract

The invention discloses a kind of Layer semantics incorporation model finely identified for object and its implementation, the Layer semantics incorporation model includes: core network, it extracts, is exported in the form of characteristic pattern to each branching networks for the shallow-layer feature to input picture；Several branching networks, image shallow-layer characteristic pattern for exporting to core network carries out further further feature extraction, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and mechanism is embedded in by introducing semantic knowledge, realize guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning, the additional information that the present invention solves the problems, such as to rely in the object fining identification technology scheme of additional information study-leading marks at high cost.

Description

A kind of Layer semantics incorporation model finely identified for object and its implementation

Technical field

The present invention relates to the fine identification technology fields of object, more particularly to a kind of level language finely identified for object Justice insertion (Hierarchical Semantic Embedding, HSE) model and its training method.

Background technique

In recent years, demand of the every field to visual analysis, analytical technology has been ignited in the change that deep vision calculates, such as electricity The online precisely retrieval dress ornament picture of quotient's urgent need, security industry urgent need accurately match case-involving vehicle and agricultural environmental protection sector urgent need essence Thin identification wild animals and plants etc..These demands often require that recognizer can meticulously distinguish a certain basic class from Belong to classification, usually this technology is called the fine identification of object.

In general, the technological difficulties of the fining identification of object are:

1) difference between the class being difficult to differentiate between: for the object obtained from similar categorization, their visual difference in many cases Be not it is very small, even people is difficult to differentiate between for some；

2) it difference in apparent class: for the object obtained from same category, due to scale, visual angle, blocks and multiplicity The background of change, these objects show very big vision difference.

Currently, fining identification technology is based primarily upon several identification regions and distinguishes to object, it is existing mainly have with Lower two class schemes:

First is that utilizing attention mechanism automatic mining identification region；

Second is that being learnt using additional information pilot model, preferably to carry out feature representation to identification region.

However, the former usually utilizes multiple network implementationss, operation repeatedly improves the complexity of model, meanwhile, also because To lack effective supervision or guiding the location ambiguity so as to cause identification region；And although the latter effectively increases key area The discriminability in domain, but the mark cost of introduced additional information is often very high.

Summary of the invention

In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide one kind finely identifies for object Layer semantics incorporation model and its implementation, with solve rely on additional information study-leading object refine identification technology Additional information in scheme marks problem at high cost.

In view of the above and other objects, the present invention proposes a kind of Layer semantics incorporation model finely identified for object, Include:

Core network is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branch Network；

Several branching networks, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature and mention It takes, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and by introducing semantic knowledge insertion Mechanism realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.

Preferably, the branching networks carry out quadratic character expression to the characteristic pattern from the core network, generate New branching characteristic figure learns the power that gains attention by combining the score vector of higher level's prediction and its branching characteristic figure of junior Weight map acts on the attention weight map on branching characteristic figure, the final branching characteristic figure for generating weighting, pre- with this Survey the label distribution of the level type.

Preferably, the core network using ResNet-50 network structure layer4_x layer and its input layer before, Its parameter layer shares 41 layers, and the parameter of the core network is shared by the prediction network of each level.

Preferably, the branching networks include:

Further feature extracting sub-module, the characteristic pattern for exporting to the core network carry out further feature extraction, and Export the feature representation under the guidance of higher level's semantic knowledge and the feature representation without guidance；

Higher level's semantic knowledge is embedded in submodule, the score vector s that higher level is predicted_i-1By a full articulamentum, Chinese idiom is mapped Adopted knowledge representation vector, and will be in the W × H plane for the characteristic pattern that exported with the further feature extracting sub-module by the vector Each site splicing learns the characteristic pattern after splicing to an attention coefficient vector, by this by an attention model Attention coefficient vector is applied to the characteristic pattern of the further feature extracting sub-module output, the characteristic pattern weighted, wherein W and H difference finger beam and height；

Score value merges submodule, for higher level's semantic knowledge insertion submodule and further feature extracting sub-module is defeated Characteristic pattern out exports corresponding score value vector by score value mixing operation.

Preferably, the further feature extracting sub-module is described using the layer5_x layer structure in ResNet-50 network Layer5_x layers of structure are by 3 residual error module compositions, and the layer5_x layers of structure is re-used twice, towards the higher level at one Semantic knowledge is embedded in submodule, expression of another place towards global characteristics.

Preferably, the attention model is to each site in splicing characteristic pattern W × H plane, continuous two full connections Layer is gradually mapped as corresponding dimension, finally obtains the attention coefficient vector.

Preferably, the score value fusion process of the score value fusion submodule is as follows:

S=(fc_1+fc_2+fc_cat)/3

Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two are directly by by higher level's semantic knowledge Insertion submodule and the characteristic pattern of further feature extracting sub-module output are obtained through a full articulamentum respectively, and the latter is by by fc_1 It is connected in series, then by a full articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2 with fc_2.

Preferably, the network structure of the classification of the top of the branching networks will be with this in addition to the full articulamentum of the last layer The classification number of level is corresponding outer, and other layers of parameter setting is consistent with original ResNet-50 network.

In order to achieve the above objectives, the present invention also provides a kind of realities of Layer semantics incorporation model finely identified for object Existing method, includes the following steps:

Step S1 carries out stratification mark to each training data；

Step S2, using the weighted array of Classification Loss function and regularization constraint loss function as optimization HSE model Objective function, successively from the 1st stratigraphic classification to n-th stratigraphic classification, step by step train the corresponding branching networks of the category；

Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model Carry out combined optimization.

Preferably, the optimization object function of the branching networks are as follows:

Wherein, γ is a balance parameters, is used for balanced sort loss function itemWith regularization constraint loss function itemInfluence to network parameter.

Compared with prior art, a kind of Layer semantics incorporation model finely identified for object of the present invention and its realization side Method is embedded into deep neural network mould as a kind of semantic information, and by this semantic information using the hierarchical structure of object classification Feature representation in type solves the additional information in the object fining identification technology scheme for relying on additional information study-leading Problem at high cost is marked, the complexity of model is reduced.

Detailed description of the invention

Fig. 1 is a kind of system architecture diagram of the Layer semantics incorporation model finely identified for object of the present invention；

Fig. 2 is the schematic diagram of core network in the specific embodiment of the invention；

Fig. 3 is the branching networks Structure Comparison figure of ResNet-50 and the present invention；

Fig. 4 is that semantic embedding expresses process schematic to the specific embodiment of the invention at the middle and upper levels；

Fig. 5 is the attention mechanism principle figure of branching networks in the specific embodiment of the invention；

Fig. 6 is the branching networks structural schematic diagram of top grade in the specific embodiment of the invention；

Fig. 7 is a kind of step process of the implementation method of the Layer semantics incorporation model finely identified for object of the present invention Figure.

Specific embodiment

Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from Various modifications and change are carried out under spirit of the invention.

Fig. 1 is a kind of system architecture diagram of the Layer semantics incorporation model finely identified for object of the present invention.In this hair In bright, the hierarchical semantic knowledge embedded mobile GIS model (Hierarchical Semantic Embedding, abbreviation HSE) Including three aspects: extraction, semantic knowledge insertion expression study and the semantic knowledge of picture depth feature are to prediction result semanteme The constraint in space.The HSE model of the present invention is a kind of algorithm model based on depth learning technology, depends on depth nerve net Network, depth expression study run through entire HSE frame, and HSE frame utilizes the semantic knowledge of stratification by two ways, respectively body Regularization prediction result is guided using semantic knowledge when the insertion of semantic knowledge when present feature representation and model training.Specifically Ground, as shown in Figure 1, a kind of Layer semantics insertion (Hierarchical Semantic finely identified for object of the present invention Embedding, abbreviation HSE) model, comprising:

Core network 1 is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branch Network 2, that is to say, that input picture by the preliminary extraction characteristics of image of core network 1, exported in the form of characteristic pattern to point Branch network 2；

Several branching networks 2, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature It extracts, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and embedding by introducing semantic knowledge Enter mechanism, realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.That is, what core network 1 exported Characteristic pattern can be separately input to carry out further feature expression in the branching networks 2 for corresponding to each level type, with feature vector Form output, this feature vector by the calculating of softmax classifier, will also obtain the prediction label point of each hierarchical category Cloth,

In the present invention, semantic knowledge insertion mechanism is embodied in branching networks 2, and branching networks 2 are using one kind by semanteme The attention mechanism of knowledge elicitation, specifically, branching networks 2 first can carry out the characteristic pattern from core network 1 secondary Expression is characterized, new branching characteristic figure is generated, is essentially the stacking of several characteristic patterns, is one 3 dimension tensor, the branch Characteristic pattern learns the power weight map that gains attention by the score vector for combining higher level to predict and its branching characteristic figure of junior, It substantially and the stacking of several characteristic patterns, is three-dimensional tensor, the meaning represented by the attention weight map is branching networks Significance level of each spatial position of new feature figure produced for identification target category, the higher position of identification are inhaled Draw more attentions, corresponding weight is also bigger in weight map corresponding position, this weight map is acted on branching characteristic figure On, the final branching characteristic figure for generating a weighting predicts the label distribution of the level type with this.

As it can be seen that side of the multiple-limb network structure of this " trunk --- branch " of the invention by shared shallow-layer network parameter Method, while reducing computing overhead, multiple independent branches enable model to take into account the optimization aim of different task again.

Here it should be noted that, semantic knowledge is embodied in the training of HSE model to the regularization of prediction result semantic space In the process, the score vector that the present invention predicts higher level obligates the prediction of its junior as soft object (soft target) As a result meet the semantic rules of classification tree, so that the semantic space of its junior's prediction result of regularization, subsequent to will be explained in.

Specifically, the following are the substantially workflows of the Layer semantics incorporation model of the present invention:

(1) an image I is inputted；

(2) core network Tr extracts the characteristic pattern of image I, is denoted as f_I；

(3)f_IIt is input to the branching networks Br of highest level₁In；

(4)Br₁To f_IMake forward calculation, obtains the prediction score vector s of high-level classification₁；

(5) from the i-th level to n-th layer time (i >=2):

(5.1)f_IIt is input to the i-th level branching networks Br_iIn；

(5.2) the prediction score vector s of a stratigraphic classification on_i-1It is input to branching networks Br_iIn；

(5.3) in s_i-1Guidance under, Br_iTo f_IMake forward calculation, obtains i-th layer of prediction score vector s_i。

Compared with prior art, using the present invention, than algorithm optimal before this on Caltech-UCSD-Bird data set Accuracy rate is high by 1.6%, and algorithm accuracy rate optimal than before is high by 2.3% on Vegfru data set.In addition, in Caltech- On UCSD-Bird data set, reach in the comparable situation of optimal algorithm accuracy rate before this, the present invention can save 20% instruction Practice data.

Hereinafter the present invention will be further illustrated by specific embodiment:

Embodiment:

1, the stratification mark of data

By taking the image of bird as an example, need to be ready to the level markup information other than image.For example, if to bird mesh, section, The classification of 4 levels of genus and species is labeled, it is desirable to provide each training/test data should include: image, mesh classification Label, section's class label belong to class label and kind class label.

2, the realization of HSE model

HSE model includes core network (trunk net), branching networks (branch net).The effect master of core network If the shallow-layer feature extraction to input picture.And there are two aspects for the effect of branching networks, first is that core network output Image shallow-layer characteristic pattern is made further further feature and is extracted, and the characteristic pattern for exporting it is suitable for level corresponding to branching networks Identification mission；Second is that being embedded in module by introducing knowledge, upper layer semantic knowledge is realized to lower layer's branching networks feature learning Guidance.The multiple-limb network structure of this " trunk --- branch " reduces fortune by the method for shared shallow-layer network parameter While calculating expense, multiple independent branches enable model to take into account the optimization aim of different task again.It will be described in detail below The network structure of core network and branching networks.

1) core network

In the specific embodiment of the invention, core network is that the layer structure based on residual error network is built, it and ResNet- The comparison of 50 network structure is as shown in Figure 2.Conv1 is the convolution algorithm of single layer in figure, and layer2_x to layer5_x is Layer structure is formed by connecting by several residual error module stacks, they have separately included several layer of structure convolution algorithm.? In the structure of ResNet-50, layer2_x to layer5_x is respectively by 3,4,6,3 residual error module compositions, each residual error module Including 3 convolution algorithm layers, amounts to 48 layers, along with the conv1 of the bottom and full articulamentum fc of output, be built into 50 jointly The network structure of layer.

The core network of the HSE model of the present invention uses only ResNet-50 network structure layer4_x and its before Input layer, parameter layer share 41 layers.Table 1 describes the specific network parameter of core network of the present invention, and HSE model is towards multiple The parameter of the class prediction of level, core network is shared by the prediction network of each level.Input a picture, core network pair It carries out preliminary shallow-layer feature extraction, is exported in the form of characteristic pattern.In the specific embodiment of the invention, input is differentiated The picture of rate 448 × 448, the dimension that core network exports characteristic pattern is 28 × 28.

1 HSE model core network key parameter of table

2) branching networks

Fig. 3 is the branching networks Structure Comparison figure of ResNet-50 and the present invention.It is each in the structure chart of branching networks The corresponding branching networks of the classification of level.Specifically, branching networks include: further feature extracting sub-module 201, higher level's semanteme Knowledge is embedded in submodule 202 and score value merges submodule 203.

In order to keep the consistency with ResNet-50 network structure, make justice relatively convenient for subsequent experimental, further feature mentions Submodule 201 is taken to continue to use the layer5_x layer structure in ResNet-50.In the specific embodiment of the invention, the layer5_x For layer structure by 3 residual error module compositions, the characteristic pattern for exporting to the core network carries out further feature extraction, and exports Feature representation (emphasizing local discriminant) under the guidance of higher level's semantic knowledge and the feature representation without guidance (emphasize global differentiation Property).It specifically, is input with the characteristic pattern (28 × 28 resolution ratio) of core network output, by branching networks mid-deep strata feature After the layer5_x layer structure arithmetic processing of extraction module 201, the characteristic pattern that Output Size is 14 × 14 exports the dimension of characteristic pattern Degree is actually n × C × W × H, and in the specific embodiment of the invention, n 8 indicates batch size；C refers to port number, is worth and is 2048, W and H distinguishes finger beam and height, is 14.It is especially noted that layer5_x structure is re-used in branching networks Twice, submodule, expression of another place towards global characteristics are embedded in towards higher level's semantic knowledge at one.In order to distinguish two Layer5_x layers of structure are denoted as φ for be used for the former_i() is denoted as ψ for the latter_i(), φ_i() and ψ_i() phase It is mutually independent, not shared parameter.

In higher level's semantic knowledge insertion submodule 202, the score vector s of higher level's prediction_i-1It can connect entirely by one first Layer is connect, the semantic knowledge expression vector of one 1024 dimension is mapped to.This vector will be with φ_iW × H of the characteristic pattern of () output Each site splicing in plane, in figure withIndicate this concatenation.It, for convenience, can be simply when realizing Knowledge representation vector is copied into w × h dimension.Fig. 4 demonstrates above-mentioned treatment process.

Characteristic pattern after splicing will be learnt by an attention model α () to an attention coefficient vector. Fig. 5 has annotated the treatment process of attention model.To each site in splicing characteristic pattern W × H plane, continuous two full connections Layer fc is gradually mapped as 1024 peacekeepings 2048 dimension, finally obtains an attention coefficient vector (such as figure for the rightmost side Fig. 5 Shape).

Obtained attention coefficient vector, will be applied to φ_iOn the characteristic pattern of () output, " ⊙ " in Fig. 5 is indicated Attention coefficient vector and φ_iThe value of each corresponding position of characteristic pattern of () output is multiplied, the spy then just weighted Sign figure f_i。

Submodule 203 is merged in score value, by φ_i() and ψ_iThe characteristic pattern of () output passes through score value mixing operation, output Corresponding score value vector.

Specifically, score value fusion process is specifically expressed as follows:

S=(fc_1+fc_2+fc_cat)/3

Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two directly pass through " by φ_i() and ψ_i() is defeated Characteristic pattern out is obtained through full articulamentum fc2 and fc1 ", and the latter is by " fc_1 and fc_2 being connected in series, then complete by one Articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2, i.e. c × 1 "

Particularly, in the specific embodiment of the invention, since the classification of top does not have higher level's semanteme to guide it, Therefore its network structure is actually as shown in Figure 6.Its network structure will be with the classification of the level in addition to the full articulamentum fc1 of the last layer Number is corresponding outer, and other layers of parameter setting is consistent with original ResNet-50, and it will not be described here.

Fig. 7 is a kind of step process of the implementation method of the Layer semantics incorporation model finely identified for object of the present invention Figure.The present invention is in training HSE model, using normal class label as optimization aim, using cross entropy loss function as excellent The objective function of change.Specifically, i-th layer of prediction score vector is normalized with softmax function:

It needs to particularly point out, softmax function and the aforementioned softmax function referred to herein is only on temperature coefficient Numerical value setting is upper variant, sets 1 for temperature coefficient T in the implementation herein.

To some image pattern, it is assumed that it is c in the correct label of current stratigraphic classification_i, then its penalty values can To indicate are as follows:

Similarly, rightSummation, obtains the Classification Loss value of entire training set totality

Specifically, as shown in fig. 7, a kind of realization side of the Layer semantics incorporation model finely identified for object of the present invention Method includes the following steps:

Step S1 carries out stratification mark to each training data.

Step S2, the mesh using the weighted array of Classification Loss function and regularization loss function as optimization HSE model Scalar functions train the corresponding branching networks of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step.

When the branching networks corresponding to a certain stratigraphic classification of training, first has to obtain the pre- of a stratigraphic classification thereon and measure Divide vector.Therefore, this step trains corresponding point of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step Branch network.Since the parameter of core network is shared by all branches, the step in, do not need temporarily to core network Parameter optimizes, therefore it may only be necessary to be simply used in the good Resnet-50 network mould of pre-training on ImageNet data set Shape parameter initializes the parameter of core network.The parameter of core network the step for be always kept in a fixed state shape State does not need optimization and updates.

In the corresponding branching networks of i-th of stratigraphic classification of training, HSE model has been integrated with preceding i-1 stratigraphic classification Corresponding network structure, therefore, the branching networks model of use preceding i-1 level trained before this initializes in HSE model The parameter of preceding i-1 layers of branching networks.And to i-th layer of branching networks, sub-network ψ_i() is related to 9 involved in φ () The parameter of layer, the present invention equally use pre-training is good on ImageNet data set Resnet-50 network model parameter to it It is initialized.In addition,Realization with a () is made of full articulamentum, their parameter using Xavier algorithm into Row initialization, the optimization object function of branching networks are as follows:

γ is a balance parameters in formula, for balanced sort loss function item and regularization constraint loss function item to net The influence of network parameter.Due to by regularization constraint loss function itemThe gradient value of generation scales in magnitudeSound, therefore, It is necessary to a relatively large weighted value (γ=2 have been used in the specific embodiment of the invention) is arranged.

It should be pointed out that since the classification of top does not introduce upper layer semantic knowledge, so its corresponding branched network Network only need to use Classification Loss function item as the objective function of Model Parameter Optimization.

It is 512 × 512 by the size scaling of training image in the specific embodiment of the invention, data increment means include Using the mode of random cropping, the region of clip 448 × 448 is trained and training sample makees the transformation of flip horizontal.? In terms of optimization method, present invention uses SGD algorithms and batch optimisation strategy, wherein the size of batch value is the momentum of 8, SGD Item is 0.9, and weight decay factor is 0.00005, and initial learning rate is 0.001, is traversed and uses about 300 times in training set Afterwards, learning rate declines 10 times and continues to train.

Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model Carry out combined optimization.The objective function of combined optimization are as follows:

In training, other than with a smaller learning rate 0.00001, the present invention still uses identical as step 1 Data increment method and same hyper parameter configuration, it will not be described here.

Here it should be noted that, core network of the invention uses the network structure of ResNet-50, similar, can also be with Using other such as VGG16 general convolutional neural networks structure as substitution.

Network structure cited by the present invention has 4 levels, in fact, level quantity is only with data set taxonomical hierarchy structure Number of levels have relationship, regardless of how many a levels, the equally applicable present invention.

One of loss function used when training pattern of the present invention is KL scattering, in fact, general distance metric function, Such as Euclidean distance, it is equally applicable.

In conclusion a kind of Layer semantics incorporation model finely identified for object of the present invention and its implementation use The hierarchical structure of object classification is embedded into deep neural network model as a kind of semantic information, and by this semantic information Feature representation, solve rely on additional information study-leading object fining identification technology scheme in additional information mark at This high problem reduces the complexity of model.

The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore, The scope of the present invention, should be as listed in the claims.

Claims

1. a kind of Layer semantics incorporation model finely identified for object, comprising:

Core network is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branching networks；

Several branching networks, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature extraction, The characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and by introducing semantic knowledge coil insertion device System realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.

2. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, it is characterised in that: described Branching networks carry out quadratic character expression to the characteristic pattern from the core network, generate new branching characteristic figure, pass through The branching characteristic figure of the score vector and its junior predicted in conjunction with higher level learns the power weight map that gains attention, by the attention Weight map acts on branching characteristic figure, and the final branching characteristic figure for generating weighting predicts the label of the level type with this Distribution.

3. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, it is characterised in that: described For core network using the layer4_x layer and its input layer before of ResNet-50 network structure, parameter layer shares 41 layers, institute The parameter for stating core network is shared by the prediction network of each level.

4. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, which is characterized in that described Branching networks include:

Further feature extracting sub-module, the characteristic pattern for exporting to the core network carries out further feature extraction, and exports Feature representation under the guidance of higher level's semantic knowledge and the feature representation without guidance；

Higher level's semantic knowledge is embedded in submodule, the score vector s that higher level is predicted_i-1By a full articulamentum, it is mapped to semanteme and knows Know expression vector, and will be each of in the W × H plane for the characteristic pattern that exported with the further feature extracting sub-module by the vector Site splicing learns the characteristic pattern after splicing to an attention coefficient vector, by the attention by an attention model Force coefficient vector is applied to the characteristic pattern of the further feature extracting sub-module output, the characteristic pattern weighted, wherein W and H Finger beam and height respectively；

Score value merges submodule, for export higher level's semantic knowledge insertion submodule and further feature extracting sub-module Characteristic pattern exports corresponding score value vector by score value mixing operation.

5. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described Further feature extracting sub-module is using the layer5_x layer structure in ResNet-50 network, and the layer5_x layers of structure is by 3 Residual error module composition, the layer5_x layers of structure are re-used twice, are embedded in submodule towards higher level's semantic knowledge at one, Expression of another place towards global characteristics.

6. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described Attention model is continuously gradually mapped as accordingly each site in splicing characteristic pattern W × H plane with two full articulamentums Dimension finally obtains the attention coefficient vector.

7. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described The score value fusion process that score value merges submodule is as follows:

S=(fc_1+fc_2+fc_cat)/3

Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two are directly by the way that higher level's semantic knowledge to be embedded in Submodule and the characteristic pattern of further feature extracting sub-module output are obtained through a full articulamentum respectively, the latter by by fc_1 and Fc_2 is connected in series, then by a full articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2.

8. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described The network structure of the classification of the top of branching networks in addition to the full articulamentum of the last layer will it is corresponding with the classification number of the level other than, Other layers of parameter setting is consistent with original ResNet-50 network.

9. a kind of implementation method of the Layer semantics incorporation model finely identified for object, includes the following steps:

Step S1 carries out stratification mark to each training data；

Step S2, the mesh using the weighted array of Classification Loss function and regularization constraint loss function as optimization HSE model Scalar functions train the corresponding branching networks of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step；

Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model is carried out Combined optimization.

10. a kind of implementation method of the HSE model finely identified for object as claimed in claim 9, which is characterized in that institute State the optimization object function of branching networks are as follows:

Wherein, γ is a balance parameters, is used for balanced sort loss function itemWith regularization constraint loss function itemIt is right The influence of network parameter.