CN109102024A - A kind of Layer semantics incorporation model finely identified for object and its implementation - Google Patents

A kind of Layer semantics incorporation model finely identified for object and its implementation Download PDF

Info

Publication number
CN109102024A
CN109102024A CN201810924288.XA CN201810924288A CN109102024A CN 109102024 A CN109102024 A CN 109102024A CN 201810924288 A CN201810924288 A CN 201810924288A CN 109102024 A CN109102024 A CN 109102024A
Authority
CN
China
Prior art keywords
layer
characteristic pattern
branching networks
branching
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810924288.XA
Other languages
Chinese (zh)
Other versions
CN109102024B (en
Inventor
聂琳
吴文熙
陈添水
王青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201810924288.XA priority Critical patent/CN109102024B/en
Publication of CN109102024A publication Critical patent/CN109102024A/en
Application granted granted Critical
Publication of CN109102024B publication Critical patent/CN109102024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of Layer semantics incorporation model finely identified for object and its implementation, the Layer semantics incorporation model includes: core network, it extracts, is exported in the form of characteristic pattern to each branching networks for the shallow-layer feature to input picture;Several branching networks, image shallow-layer characteristic pattern for exporting to core network carries out further further feature extraction, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and mechanism is embedded in by introducing semantic knowledge, realize guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning, the additional information that the present invention solves the problems, such as to rely in the object fining identification technology scheme of additional information study-leading marks at high cost.

Description

A kind of Layer semantics incorporation model finely identified for object and its implementation
Technical field
The present invention relates to the fine identification technology fields of object, more particularly to a kind of level language finely identified for object Justice insertion (Hierarchical Semantic Embedding, HSE) model and its training method.
Background technique
In recent years, demand of the every field to visual analysis, analytical technology has been ignited in the change that deep vision calculates, such as electricity The online precisely retrieval dress ornament picture of quotient's urgent need, security industry urgent need accurately match case-involving vehicle and agricultural environmental protection sector urgent need essence Thin identification wild animals and plants etc..These demands often require that recognizer can meticulously distinguish a certain basic class from Belong to classification, usually this technology is called the fine identification of object.
In general, the technological difficulties of the fining identification of object are:
1) difference between the class being difficult to differentiate between: for the object obtained from similar categorization, their visual difference in many cases Be not it is very small, even people is difficult to differentiate between for some;
2) it difference in apparent class: for the object obtained from same category, due to scale, visual angle, blocks and multiplicity The background of change, these objects show very big vision difference.
Currently, fining identification technology is based primarily upon several identification regions and distinguishes to object, it is existing mainly have with Lower two class schemes:
First is that utilizing attention mechanism automatic mining identification region;
Second is that being learnt using additional information pilot model, preferably to carry out feature representation to identification region.
However, the former usually utilizes multiple network implementationss, operation repeatedly improves the complexity of model, meanwhile, also because To lack effective supervision or guiding the location ambiguity so as to cause identification region;And although the latter effectively increases key area The discriminability in domain, but the mark cost of introduced additional information is often very high.
Summary of the invention
In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide one kind finely identifies for object Layer semantics incorporation model and its implementation, with solve rely on additional information study-leading object refine identification technology Additional information in scheme marks problem at high cost.
In view of the above and other objects, the present invention proposes a kind of Layer semantics incorporation model finely identified for object, Include:
Core network is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branch Network;
Several branching networks, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature and mention It takes, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and by introducing semantic knowledge insertion Mechanism realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.
Preferably, the branching networks carry out quadratic character expression to the characteristic pattern from the core network, generate New branching characteristic figure learns the power that gains attention by combining the score vector of higher level's prediction and its branching characteristic figure of junior Weight map acts on the attention weight map on branching characteristic figure, the final branching characteristic figure for generating weighting, pre- with this Survey the label distribution of the level type.
Preferably, the core network using ResNet-50 network structure layer4_x layer and its input layer before, Its parameter layer shares 41 layers, and the parameter of the core network is shared by the prediction network of each level.
Preferably, the branching networks include:
Further feature extracting sub-module, the characteristic pattern for exporting to the core network carry out further feature extraction, and Export the feature representation under the guidance of higher level's semantic knowledge and the feature representation without guidance;
Higher level's semantic knowledge is embedded in submodule, the score vector s that higher level is predictedi-1By a full articulamentum, Chinese idiom is mapped Adopted knowledge representation vector, and will be in the W × H plane for the characteristic pattern that exported with the further feature extracting sub-module by the vector Each site splicing learns the characteristic pattern after splicing to an attention coefficient vector, by this by an attention model Attention coefficient vector is applied to the characteristic pattern of the further feature extracting sub-module output, the characteristic pattern weighted, wherein W and H difference finger beam and height;
Score value merges submodule, for higher level's semantic knowledge insertion submodule and further feature extracting sub-module is defeated Characteristic pattern out exports corresponding score value vector by score value mixing operation.
Preferably, the further feature extracting sub-module is described using the layer5_x layer structure in ResNet-50 network Layer5_x layers of structure are by 3 residual error module compositions, and the layer5_x layers of structure is re-used twice, towards the higher level at one Semantic knowledge is embedded in submodule, expression of another place towards global characteristics.
Preferably, the attention model is to each site in splicing characteristic pattern W × H plane, continuous two full connections Layer is gradually mapped as corresponding dimension, finally obtains the attention coefficient vector.
Preferably, the score value fusion process of the score value fusion submodule is as follows:
S=(fc_1+fc_2+fc_cat)/3
Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two are directly by by higher level's semantic knowledge Insertion submodule and the characteristic pattern of further feature extracting sub-module output are obtained through a full articulamentum respectively, and the latter is by by fc_1 It is connected in series, then by a full articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2 with fc_2.
Preferably, the network structure of the classification of the top of the branching networks will be with this in addition to the full articulamentum of the last layer The classification number of level is corresponding outer, and other layers of parameter setting is consistent with original ResNet-50 network.
In order to achieve the above objectives, the present invention also provides a kind of realities of Layer semantics incorporation model finely identified for object Existing method, includes the following steps:
Step S1 carries out stratification mark to each training data;
Step S2, using the weighted array of Classification Loss function and regularization constraint loss function as optimization HSE model Objective function, successively from the 1st stratigraphic classification to n-th stratigraphic classification, step by step train the corresponding branching networks of the category;
Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model Carry out combined optimization.
Preferably, the optimization object function of the branching networks are as follows:
Wherein, γ is a balance parameters, is used for balanced sort loss function itemWith regularization constraint loss function itemInfluence to network parameter.
Compared with prior art, a kind of Layer semantics incorporation model finely identified for object of the present invention and its realization side Method is embedded into deep neural network mould as a kind of semantic information, and by this semantic information using the hierarchical structure of object classification Feature representation in type solves the additional information in the object fining identification technology scheme for relying on additional information study-leading Problem at high cost is marked, the complexity of model is reduced.
Detailed description of the invention
Fig. 1 is a kind of system architecture diagram of the Layer semantics incorporation model finely identified for object of the present invention;
Fig. 2 is the schematic diagram of core network in the specific embodiment of the invention;
Fig. 3 is the branching networks Structure Comparison figure of ResNet-50 and the present invention;
Fig. 4 is that semantic embedding expresses process schematic to the specific embodiment of the invention at the middle and upper levels;
Fig. 5 is the attention mechanism principle figure of branching networks in the specific embodiment of the invention;
Fig. 6 is the branching networks structural schematic diagram of top grade in the specific embodiment of the invention;
Fig. 7 is a kind of step process of the implementation method of the Layer semantics incorporation model finely identified for object of the present invention Figure.
Specific embodiment
Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from Various modifications and change are carried out under spirit of the invention.
Fig. 1 is a kind of system architecture diagram of the Layer semantics incorporation model finely identified for object of the present invention.In this hair In bright, the hierarchical semantic knowledge embedded mobile GIS model (Hierarchical Semantic Embedding, abbreviation HSE) Including three aspects: extraction, semantic knowledge insertion expression study and the semantic knowledge of picture depth feature are to prediction result semanteme The constraint in space.The HSE model of the present invention is a kind of algorithm model based on depth learning technology, depends on depth nerve net Network, depth expression study run through entire HSE frame, and HSE frame utilizes the semantic knowledge of stratification by two ways, respectively body Regularization prediction result is guided using semantic knowledge when the insertion of semantic knowledge when present feature representation and model training.Specifically Ground, as shown in Figure 1, a kind of Layer semantics insertion (Hierarchical Semantic finely identified for object of the present invention Embedding, abbreviation HSE) model, comprising:
Core network 1 is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branch Network 2, that is to say, that input picture by the preliminary extraction characteristics of image of core network 1, exported in the form of characteristic pattern to point Branch network 2;
Several branching networks 2, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature It extracts, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and embedding by introducing semantic knowledge Enter mechanism, realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.That is, what core network 1 exported Characteristic pattern can be separately input to carry out further feature expression in the branching networks 2 for corresponding to each level type, with feature vector Form output, this feature vector by the calculating of softmax classifier, will also obtain the prediction label point of each hierarchical category Cloth,
In the present invention, semantic knowledge insertion mechanism is embodied in branching networks 2, and branching networks 2 are using one kind by semanteme The attention mechanism of knowledge elicitation, specifically, branching networks 2 first can carry out the characteristic pattern from core network 1 secondary Expression is characterized, new branching characteristic figure is generated, is essentially the stacking of several characteristic patterns, is one 3 dimension tensor, the branch Characteristic pattern learns the power weight map that gains attention by the score vector for combining higher level to predict and its branching characteristic figure of junior, It substantially and the stacking of several characteristic patterns, is three-dimensional tensor, the meaning represented by the attention weight map is branching networks Significance level of each spatial position of new feature figure produced for identification target category, the higher position of identification are inhaled Draw more attentions, corresponding weight is also bigger in weight map corresponding position, this weight map is acted on branching characteristic figure On, the final branching characteristic figure for generating a weighting predicts the label distribution of the level type with this.
As it can be seen that side of the multiple-limb network structure of this " trunk --- branch " of the invention by shared shallow-layer network parameter Method, while reducing computing overhead, multiple independent branches enable model to take into account the optimization aim of different task again.
Here it should be noted that, semantic knowledge is embodied in the training of HSE model to the regularization of prediction result semantic space In the process, the score vector that the present invention predicts higher level obligates the prediction of its junior as soft object (soft target) As a result meet the semantic rules of classification tree, so that the semantic space of its junior's prediction result of regularization, subsequent to will be explained in.
Specifically, the following are the substantially workflows of the Layer semantics incorporation model of the present invention:
(1) an image I is inputted;
(2) core network Tr extracts the characteristic pattern of image I, is denoted as fI
(3)fIIt is input to the branching networks Br of highest level1In;
(4)Br1To fIMake forward calculation, obtains the prediction score vector s of high-level classification1
(5) from the i-th level to n-th layer time (i >=2):
(5.1)fIIt is input to the i-th level branching networks BriIn;
(5.2) the prediction score vector s of a stratigraphic classification oni-1It is input to branching networks BriIn;
(5.3) in si-1Guidance under, BriTo fIMake forward calculation, obtains i-th layer of prediction score vector si
Compared with prior art, using the present invention, than algorithm optimal before this on Caltech-UCSD-Bird data set Accuracy rate is high by 1.6%, and algorithm accuracy rate optimal than before is high by 2.3% on Vegfru data set.In addition, in Caltech- On UCSD-Bird data set, reach in the comparable situation of optimal algorithm accuracy rate before this, the present invention can save 20% instruction Practice data.
Hereinafter the present invention will be further illustrated by specific embodiment:
Embodiment:
1, the stratification mark of data
By taking the image of bird as an example, need to be ready to the level markup information other than image.For example, if to bird mesh, section, The classification of 4 levels of genus and species is labeled, it is desirable to provide each training/test data should include: image, mesh classification Label, section's class label belong to class label and kind class label.
2, the realization of HSE model
HSE model includes core network (trunk net), branching networks (branch net).The effect master of core network If the shallow-layer feature extraction to input picture.And there are two aspects for the effect of branching networks, first is that core network output Image shallow-layer characteristic pattern is made further further feature and is extracted, and the characteristic pattern for exporting it is suitable for level corresponding to branching networks Identification mission;Second is that being embedded in module by introducing knowledge, upper layer semantic knowledge is realized to lower layer's branching networks feature learning Guidance.The multiple-limb network structure of this " trunk --- branch " reduces fortune by the method for shared shallow-layer network parameter While calculating expense, multiple independent branches enable model to take into account the optimization aim of different task again.It will be described in detail below The network structure of core network and branching networks.
1) core network
In the specific embodiment of the invention, core network is that the layer structure based on residual error network is built, it and ResNet- The comparison of 50 network structure is as shown in Figure 2.Conv1 is the convolution algorithm of single layer in figure, and layer2_x to layer5_x is Layer structure is formed by connecting by several residual error module stacks, they have separately included several layer of structure convolution algorithm.? In the structure of ResNet-50, layer2_x to layer5_x is respectively by 3,4,6,3 residual error module compositions, each residual error module Including 3 convolution algorithm layers, amounts to 48 layers, along with the conv1 of the bottom and full articulamentum fc of output, be built into 50 jointly The network structure of layer.
The core network of the HSE model of the present invention uses only ResNet-50 network structure layer4_x and its before Input layer, parameter layer share 41 layers.Table 1 describes the specific network parameter of core network of the present invention, and HSE model is towards multiple The parameter of the class prediction of level, core network is shared by the prediction network of each level.Input a picture, core network pair It carries out preliminary shallow-layer feature extraction, is exported in the form of characteristic pattern.In the specific embodiment of the invention, input is differentiated The picture of rate 448 × 448, the dimension that core network exports characteristic pattern is 28 × 28.
1 HSE model core network key parameter of table
2) branching networks
Fig. 3 is the branching networks Structure Comparison figure of ResNet-50 and the present invention.It is each in the structure chart of branching networks The corresponding branching networks of the classification of level.Specifically, branching networks include: further feature extracting sub-module 201, higher level's semanteme Knowledge is embedded in submodule 202 and score value merges submodule 203.
In order to keep the consistency with ResNet-50 network structure, make justice relatively convenient for subsequent experimental, further feature mentions Submodule 201 is taken to continue to use the layer5_x layer structure in ResNet-50.In the specific embodiment of the invention, the layer5_x For layer structure by 3 residual error module compositions, the characteristic pattern for exporting to the core network carries out further feature extraction, and exports Feature representation (emphasizing local discriminant) under the guidance of higher level's semantic knowledge and the feature representation without guidance (emphasize global differentiation Property).It specifically, is input with the characteristic pattern (28 × 28 resolution ratio) of core network output, by branching networks mid-deep strata feature After the layer5_x layer structure arithmetic processing of extraction module 201, the characteristic pattern that Output Size is 14 × 14 exports the dimension of characteristic pattern Degree is actually n × C × W × H, and in the specific embodiment of the invention, n 8 indicates batch size;C refers to port number, is worth and is 2048, W and H distinguishes finger beam and height, is 14.It is especially noted that layer5_x structure is re-used in branching networks Twice, submodule, expression of another place towards global characteristics are embedded in towards higher level's semantic knowledge at one.In order to distinguish two Layer5_x layers of structure are denoted as φ for be used for the formeri() is denoted as ψ for the latteri(), φi() and ψi() phase It is mutually independent, not shared parameter.
In higher level's semantic knowledge insertion submodule 202, the score vector s of higher level's predictioni-1It can connect entirely by one first Layer is connect, the semantic knowledge expression vector of one 1024 dimension is mapped to.This vector will be with φiW × H of the characteristic pattern of () output Each site splicing in plane, in figure withIndicate this concatenation.It, for convenience, can be simply when realizing Knowledge representation vector is copied into w × h dimension.Fig. 4 demonstrates above-mentioned treatment process.
Characteristic pattern after splicing will be learnt by an attention model α () to an attention coefficient vector. Fig. 5 has annotated the treatment process of attention model.To each site in splicing characteristic pattern W × H plane, continuous two full connections Layer fc is gradually mapped as 1024 peacekeepings 2048 dimension, finally obtains an attention coefficient vector (such as figure for the rightmost side Fig. 5 Shape).
Obtained attention coefficient vector, will be applied to φiOn the characteristic pattern of () output, " ⊙ " in Fig. 5 is indicated Attention coefficient vector and φiThe value of each corresponding position of characteristic pattern of () output is multiplied, the spy then just weighted Sign figure fi
Submodule 203 is merged in score value, by φi() and ψiThe characteristic pattern of () output passes through score value mixing operation, output Corresponding score value vector.
Specifically, score value fusion process is specifically expressed as follows:
S=(fc_1+fc_2+fc_cat)/3
Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two directly pass through " by φi() and ψi() is defeated Characteristic pattern out is obtained through full articulamentum fc2 and fc1 ", and the latter is by " fc_1 and fc_2 being connected in series, then complete by one Articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2, i.e. c × 1 "
Particularly, in the specific embodiment of the invention, since the classification of top does not have higher level's semanteme to guide it, Therefore its network structure is actually as shown in Figure 6.Its network structure will be with the classification of the level in addition to the full articulamentum fc1 of the last layer Number is corresponding outer, and other layers of parameter setting is consistent with original ResNet-50, and it will not be described here.
Fig. 7 is a kind of step process of the implementation method of the Layer semantics incorporation model finely identified for object of the present invention Figure.The present invention is in training HSE model, using normal class label as optimization aim, using cross entropy loss function as excellent The objective function of change.Specifically, i-th layer of prediction score vector is normalized with softmax function:
It needs to particularly point out, softmax function and the aforementioned softmax function referred to herein is only on temperature coefficient Numerical value setting is upper variant, sets 1 for temperature coefficient T in the implementation herein.
To some image pattern, it is assumed that it is c in the correct label of current stratigraphic classificationi, then its penalty values can To indicate are as follows:
Similarly, rightSummation, obtains the Classification Loss value of entire training set totality
Specifically, as shown in fig. 7, a kind of realization side of the Layer semantics incorporation model finely identified for object of the present invention Method includes the following steps:
Step S1 carries out stratification mark to each training data.
By taking the image of bird as an example, need to be ready to the level markup information other than image.For example, if to bird mesh, section, The classification of 4 levels of genus and species is labeled, it is desirable to provide each training/test data should include: image, mesh classification Label, section's class label belong to class label and kind class label.
Step S2, the mesh using the weighted array of Classification Loss function and regularization loss function as optimization HSE model Scalar functions train the corresponding branching networks of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step.
When the branching networks corresponding to a certain stratigraphic classification of training, first has to obtain the pre- of a stratigraphic classification thereon and measure Divide vector.Therefore, this step trains corresponding point of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step Branch network.Since the parameter of core network is shared by all branches, the step in, do not need temporarily to core network Parameter optimizes, therefore it may only be necessary to be simply used in the good Resnet-50 network mould of pre-training on ImageNet data set Shape parameter initializes the parameter of core network.The parameter of core network the step for be always kept in a fixed state shape State does not need optimization and updates.
In the corresponding branching networks of i-th of stratigraphic classification of training, HSE model has been integrated with preceding i-1 stratigraphic classification Corresponding network structure, therefore, the branching networks model of use preceding i-1 level trained before this initializes in HSE model The parameter of preceding i-1 layers of branching networks.And to i-th layer of branching networks, sub-network ψi() is related to 9 involved in φ () The parameter of layer, the present invention equally use pre-training is good on ImageNet data set Resnet-50 network model parameter to it It is initialized.In addition,Realization with a () is made of full articulamentum, their parameter using Xavier algorithm into Row initialization, the optimization object function of branching networks are as follows:
γ is a balance parameters in formula, for balanced sort loss function item and regularization constraint loss function item to net The influence of network parameter.Due to by regularization constraint loss function itemThe gradient value of generation scales in magnitudeSound, therefore, It is necessary to a relatively large weighted value (γ=2 have been used in the specific embodiment of the invention) is arranged.
It should be pointed out that since the classification of top does not introduce upper layer semantic knowledge, so its corresponding branched network Network only need to use Classification Loss function item as the objective function of Model Parameter Optimization.
It is 512 × 512 by the size scaling of training image in the specific embodiment of the invention, data increment means include Using the mode of random cropping, the region of clip 448 × 448 is trained and training sample makees the transformation of flip horizontal.? In terms of optimization method, present invention uses SGD algorithms and batch optimisation strategy, wherein the size of batch value is the momentum of 8, SGD Item is 0.9, and weight decay factor is 0.00005, and initial learning rate is 0.001, is traversed and uses about 300 times in training set Afterwards, learning rate declines 10 times and continues to train.
Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model Carry out combined optimization.The objective function of combined optimization are as follows:
In training, other than with a smaller learning rate 0.00001, the present invention still uses identical as step 1 Data increment method and same hyper parameter configuration, it will not be described here.
Here it should be noted that, core network of the invention uses the network structure of ResNet-50, similar, can also be with Using other such as VGG16 general convolutional neural networks structure as substitution.
Network structure cited by the present invention has 4 levels, in fact, level quantity is only with data set taxonomical hierarchy structure Number of levels have relationship, regardless of how many a levels, the equally applicable present invention.
One of loss function used when training pattern of the present invention is KL scattering, in fact, general distance metric function, Such as Euclidean distance, it is equally applicable.
In conclusion a kind of Layer semantics incorporation model finely identified for object of the present invention and its implementation use The hierarchical structure of object classification is embedded into deep neural network model as a kind of semantic information, and by this semantic information Feature representation, solve rely on additional information study-leading object fining identification technology scheme in additional information mark at This high problem reduces the complexity of model.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore, The scope of the present invention, should be as listed in the claims.

Claims (10)

1. a kind of Layer semantics incorporation model finely identified for object, comprising:
Core network is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branching networks;
Several branching networks, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature extraction, The characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and by introducing semantic knowledge coil insertion device System realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.
2. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, it is characterised in that: described Branching networks carry out quadratic character expression to the characteristic pattern from the core network, generate new branching characteristic figure, pass through The branching characteristic figure of the score vector and its junior predicted in conjunction with higher level learns the power weight map that gains attention, by the attention Weight map acts on branching characteristic figure, and the final branching characteristic figure for generating weighting predicts the label of the level type with this Distribution.
3. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, it is characterised in that: described For core network using the layer4_x layer and its input layer before of ResNet-50 network structure, parameter layer shares 41 layers, institute The parameter for stating core network is shared by the prediction network of each level.
4. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, which is characterized in that described Branching networks include:
Further feature extracting sub-module, the characteristic pattern for exporting to the core network carries out further feature extraction, and exports Feature representation under the guidance of higher level's semantic knowledge and the feature representation without guidance;
Higher level's semantic knowledge is embedded in submodule, the score vector s that higher level is predictedi-1By a full articulamentum, it is mapped to semanteme and knows Know expression vector, and will be each of in the W × H plane for the characteristic pattern that exported with the further feature extracting sub-module by the vector Site splicing learns the characteristic pattern after splicing to an attention coefficient vector, by the attention by an attention model Force coefficient vector is applied to the characteristic pattern of the further feature extracting sub-module output, the characteristic pattern weighted, wherein W and H Finger beam and height respectively;
Score value merges submodule, for export higher level's semantic knowledge insertion submodule and further feature extracting sub-module Characteristic pattern exports corresponding score value vector by score value mixing operation.
5. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described Further feature extracting sub-module is using the layer5_x layer structure in ResNet-50 network, and the layer5_x layers of structure is by 3 Residual error module composition, the layer5_x layers of structure are re-used twice, are embedded in submodule towards higher level's semantic knowledge at one, Expression of another place towards global characteristics.
6. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described Attention model is continuously gradually mapped as accordingly each site in splicing characteristic pattern W × H plane with two full articulamentums Dimension finally obtains the attention coefficient vector.
7. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described The score value fusion process that score value merges submodule is as follows:
S=(fc_1+fc_2+fc_cat)/3
Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two are directly by the way that higher level's semantic knowledge to be embedded in Submodule and the characteristic pattern of further feature extracting sub-module output are obtained through a full articulamentum respectively, the latter by by fc_1 and Fc_2 is connected in series, then by a full articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2.
8. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described The network structure of the classification of the top of branching networks in addition to the full articulamentum of the last layer will it is corresponding with the classification number of the level other than, Other layers of parameter setting is consistent with original ResNet-50 network.
9. a kind of implementation method of the Layer semantics incorporation model finely identified for object, includes the following steps:
Step S1 carries out stratification mark to each training data;
Step S2, the mesh using the weighted array of Classification Loss function and regularization constraint loss function as optimization HSE model Scalar functions train the corresponding branching networks of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step;
Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model is carried out Combined optimization.
10. a kind of implementation method of the HSE model finely identified for object as claimed in claim 9, which is characterized in that institute State the optimization object function of branching networks are as follows:
Wherein, γ is a balance parameters, is used for balanced sort loss function itemWith regularization constraint loss function itemIt is right The influence of network parameter.
CN201810924288.XA 2018-08-14 2018-08-14 Hierarchical semantic embedded model for fine object recognition and implementation method thereof Active CN109102024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810924288.XA CN109102024B (en) 2018-08-14 2018-08-14 Hierarchical semantic embedded model for fine object recognition and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810924288.XA CN109102024B (en) 2018-08-14 2018-08-14 Hierarchical semantic embedded model for fine object recognition and implementation method thereof

Publications (2)

Publication Number Publication Date
CN109102024A true CN109102024A (en) 2018-12-28
CN109102024B CN109102024B (en) 2021-08-31

Family

ID=64849727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810924288.XA Active CN109102024B (en) 2018-08-14 2018-08-14 Hierarchical semantic embedded model for fine object recognition and implementation method thereof

Country Status (1)

Country Link
CN (1) CN109102024B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961107A (en) * 2019-04-18 2019-07-02 北京迈格威科技有限公司 Training method, device, electronic equipment and the storage medium of target detection model
CN110097108A (en) * 2019-04-24 2019-08-06 佳都新太科技股份有限公司 Recognition methods, device, equipment and the storage medium of non-motor vehicle
CN110288049A (en) * 2019-07-02 2019-09-27 北京字节跳动网络技术有限公司 Method and apparatus for generating image recognition model
CN110321970A (en) * 2019-07-11 2019-10-11 山东领能电子科技有限公司 A kind of fine-grained objective classification method of multiple features based on branch neural network
CN110837856A (en) * 2019-10-31 2020-02-25 深圳市商汤科技有限公司 Neural network training and target detection method, device, equipment and storage medium
CN111242222A (en) * 2020-01-14 2020-06-05 北京迈格威科技有限公司 Training method of classification model, image processing method and device
CN111711821A (en) * 2020-06-15 2020-09-25 南京工程学院 Information hiding method based on deep learning
CN111814920A (en) * 2020-09-04 2020-10-23 中国科学院自动化研究所 Fine classification method and system for multi-granularity feature learning based on graph network
CN112990147A (en) * 2021-05-06 2021-06-18 北京远鉴信息技术有限公司 Method and device for identifying administrative-related images, electronic equipment and storage medium
CN113095349A (en) * 2020-01-09 2021-07-09 北京沃东天骏信息技术有限公司 Image identification method and device
CN113642415A (en) * 2021-07-19 2021-11-12 南京南瑞信息通信科技有限公司 Face feature expression method and face recognition method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100215277A1 (en) * 2009-02-24 2010-08-26 Huntington Stephen G Method of Massive Parallel Pattern Matching against a Progressively-Exhaustive Knowledge Base of Patterns
CN106682060A (en) * 2015-11-11 2017-05-17 奥多比公司 Structured Knowledge Modeling, Extraction and Localization from Images
CN107979606A (en) * 2017-12-08 2018-05-01 电子科技大学 It is a kind of that there is adaptive distributed intelligence decision-making technique
CN108229543A (en) * 2017-12-22 2018-06-29 中国科学院深圳先进技术研究院 Image classification design methods and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100215277A1 (en) * 2009-02-24 2010-08-26 Huntington Stephen G Method of Massive Parallel Pattern Matching against a Progressively-Exhaustive Knowledge Base of Patterns
CN106682060A (en) * 2015-11-11 2017-05-17 奥多比公司 Structured Knowledge Modeling, Extraction and Localization from Images
CN107979606A (en) * 2017-12-08 2018-05-01 电子科技大学 It is a kind of that there is adaptive distributed intelligence decision-making technique
CN108229543A (en) * 2017-12-22 2018-06-29 中国科学院深圳先进技术研究院 Image classification design methods and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TIANSHUI CHEN ET AL: "Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition", 《27TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
TIANSHUI CHEN ET AL: "Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition", 《AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
姚湘: "基于非线性知识迁移的交叉视角动作识别", 《重庆邮电大学学报( 自然科学版)》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961107B (en) * 2019-04-18 2022-07-19 北京迈格威科技有限公司 Training method and device for target detection model, electronic equipment and storage medium
CN109961107A (en) * 2019-04-18 2019-07-02 北京迈格威科技有限公司 Training method, device, electronic equipment and the storage medium of target detection model
CN110097108B (en) * 2019-04-24 2021-03-02 佳都新太科技股份有限公司 Method, device, equipment and storage medium for identifying non-motor vehicle
CN110097108A (en) * 2019-04-24 2019-08-06 佳都新太科技股份有限公司 Recognition methods, device, equipment and the storage medium of non-motor vehicle
CN110288049B (en) * 2019-07-02 2022-05-24 北京字节跳动网络技术有限公司 Method and apparatus for generating image recognition model
CN110288049A (en) * 2019-07-02 2019-09-27 北京字节跳动网络技术有限公司 Method and apparatus for generating image recognition model
CN110321970A (en) * 2019-07-11 2019-10-11 山东领能电子科技有限公司 A kind of fine-grained objective classification method of multiple features based on branch neural network
CN110837856A (en) * 2019-10-31 2020-02-25 深圳市商汤科技有限公司 Neural network training and target detection method, device, equipment and storage medium
CN113095349A (en) * 2020-01-09 2021-07-09 北京沃东天骏信息技术有限公司 Image identification method and device
CN111242222A (en) * 2020-01-14 2020-06-05 北京迈格威科技有限公司 Training method of classification model, image processing method and device
CN111242222B (en) * 2020-01-14 2023-12-19 北京迈格威科技有限公司 Classification model training method, image processing method and device
CN111711821A (en) * 2020-06-15 2020-09-25 南京工程学院 Information hiding method based on deep learning
CN111814920A (en) * 2020-09-04 2020-10-23 中国科学院自动化研究所 Fine classification method and system for multi-granularity feature learning based on graph network
CN112990147A (en) * 2021-05-06 2021-06-18 北京远鉴信息技术有限公司 Method and device for identifying administrative-related images, electronic equipment and storage medium
CN113642415A (en) * 2021-07-19 2021-11-12 南京南瑞信息通信科技有限公司 Face feature expression method and face recognition method
CN113642415B (en) * 2021-07-19 2024-06-04 南京南瑞信息通信科技有限公司 Face feature expression method and face recognition method

Also Published As

Publication number Publication date
CN109102024B (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN109102024A (en) A kind of Layer semantics incorporation model finely identified for object and its implementation
CN108596882B (en) The recognition methods of pathological picture and device
CN110287800A (en) A kind of remote sensing images scene classification method based on SGSE-GAN
CN110377686A (en) A kind of address information Feature Extraction Method based on deep neural network model
CN110532920A (en) Smallest number data set face identification method based on FaceNet method
CN109299274A (en) A kind of natural scene Method for text detection based on full convolutional neural networks
CN107330444A (en) A kind of image autotext mask method based on generation confrontation network
CN104933428B (en) A kind of face identification method and device based on tensor description
Weng et al. Semi-supervised vision transformers
CN110533024A (en) Biquadratic pond fine granularity image classification method based on multiple dimensioned ROI feature
CN110516530A (en) A kind of Image Description Methods based on the enhancing of non-alignment multiple view feature
CN111444343A (en) Cross-border national culture text classification method based on knowledge representation
CN106372597B (en) CNN Vehicle Detection method based on adaptive contextual information
CN107526798A (en) A kind of Entity recognition based on neutral net and standardization integrated processes and model
CN110334724A (en) Remote sensing object natural language description and multiple dimensioned antidote based on LSTM
CN109800768A (en) The hash character representation learning method of semi-supervised GAN
CN107577983A (en) It is a kind of to circulate the method for finding region-of-interest identification multi-tag image
Zhang et al. Knowledge amalgamation for object detection with transformers
CN109948628A (en) A kind of object detection method excavated based on identification region
Qiu et al. Semantic-visual guided transformer for few-shot class-incremental learning
CN109886105A (en) Price tickets recognition methods, system and storage medium based on multi-task learning
Xu et al. Pixdet: Prohibited item detection in x-ray image based on whole-process feature fusion and local-global semantic dependency interaction
Wang et al. Detection of key structure of auroral images based on weakly supervised learning
Wang et al. Generative Adversarial Networks Based on Dynamic Word-Level Update for Text-to-Image Synthesis
Lee et al. Boundary-aware camouflaged object detection via deformable point sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant