CN109102024A - A kind of Layer semantics incorporation model finely identified for object and its implementation - Google Patents
A kind of Layer semantics incorporation model finely identified for object and its implementation Download PDFInfo
- Publication number
- CN109102024A CN109102024A CN201810924288.XA CN201810924288A CN109102024A CN 109102024 A CN109102024 A CN 109102024A CN 201810924288 A CN201810924288 A CN 201810924288A CN 109102024 A CN109102024 A CN 109102024A
- Authority
- CN
- China
- Prior art keywords
- layer
- characteristic pattern
- branching networks
- branching
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010348 incorporation Methods 0.000 title claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 35
- 230000006870 function Effects 0.000 claims description 29
- 238000012549 training Methods 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 19
- 238000005457 optimization Methods 0.000 claims description 16
- 238000003780 insertion Methods 0.000 claims description 11
- 230000037431 insertion Effects 0.000 claims description 11
- 238000013517 stratification Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 4
- 238000007499 fusion processing Methods 0.000 claims description 3
- 239000013604 expression vector Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 10
- 230000007246 mechanism Effects 0.000 abstract description 7
- 239000000284 extract Substances 0.000 abstract description 3
- 239000010410 layer Substances 0.000 description 62
- 230000008569 process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of Layer semantics incorporation model finely identified for object and its implementation, the Layer semantics incorporation model includes: core network, it extracts, is exported in the form of characteristic pattern to each branching networks for the shallow-layer feature to input picture;Several branching networks, image shallow-layer characteristic pattern for exporting to core network carries out further further feature extraction, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and mechanism is embedded in by introducing semantic knowledge, realize guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning, the additional information that the present invention solves the problems, such as to rely in the object fining identification technology scheme of additional information study-leading marks at high cost.
Description
Technical field
The present invention relates to the fine identification technology fields of object, more particularly to a kind of level language finely identified for object
Justice insertion (Hierarchical Semantic Embedding, HSE) model and its training method.
Background technique
In recent years, demand of the every field to visual analysis, analytical technology has been ignited in the change that deep vision calculates, such as electricity
The online precisely retrieval dress ornament picture of quotient's urgent need, security industry urgent need accurately match case-involving vehicle and agricultural environmental protection sector urgent need essence
Thin identification wild animals and plants etc..These demands often require that recognizer can meticulously distinguish a certain basic class from
Belong to classification, usually this technology is called the fine identification of object.
In general, the technological difficulties of the fining identification of object are:
1) difference between the class being difficult to differentiate between: for the object obtained from similar categorization, their visual difference in many cases
Be not it is very small, even people is difficult to differentiate between for some;
2) it difference in apparent class: for the object obtained from same category, due to scale, visual angle, blocks and multiplicity
The background of change, these objects show very big vision difference.
Currently, fining identification technology is based primarily upon several identification regions and distinguishes to object, it is existing mainly have with
Lower two class schemes:
First is that utilizing attention mechanism automatic mining identification region;
Second is that being learnt using additional information pilot model, preferably to carry out feature representation to identification region.
However, the former usually utilizes multiple network implementationss, operation repeatedly improves the complexity of model, meanwhile, also because
To lack effective supervision or guiding the location ambiguity so as to cause identification region;And although the latter effectively increases key area
The discriminability in domain, but the mark cost of introduced additional information is often very high.
Summary of the invention
In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide one kind finely identifies for object
Layer semantics incorporation model and its implementation, with solve rely on additional information study-leading object refine identification technology
Additional information in scheme marks problem at high cost.
In view of the above and other objects, the present invention proposes a kind of Layer semantics incorporation model finely identified for object,
Include:
Core network is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branch
Network;
Several branching networks, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature and mention
It takes, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and by introducing semantic knowledge insertion
Mechanism realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.
Preferably, the branching networks carry out quadratic character expression to the characteristic pattern from the core network, generate
New branching characteristic figure learns the power that gains attention by combining the score vector of higher level's prediction and its branching characteristic figure of junior
Weight map acts on the attention weight map on branching characteristic figure, the final branching characteristic figure for generating weighting, pre- with this
Survey the label distribution of the level type.
Preferably, the core network using ResNet-50 network structure layer4_x layer and its input layer before,
Its parameter layer shares 41 layers, and the parameter of the core network is shared by the prediction network of each level.
Preferably, the branching networks include:
Further feature extracting sub-module, the characteristic pattern for exporting to the core network carry out further feature extraction, and
Export the feature representation under the guidance of higher level's semantic knowledge and the feature representation without guidance;
Higher level's semantic knowledge is embedded in submodule, the score vector s that higher level is predictedi-1By a full articulamentum, Chinese idiom is mapped
Adopted knowledge representation vector, and will be in the W × H plane for the characteristic pattern that exported with the further feature extracting sub-module by the vector
Each site splicing learns the characteristic pattern after splicing to an attention coefficient vector, by this by an attention model
Attention coefficient vector is applied to the characteristic pattern of the further feature extracting sub-module output, the characteristic pattern weighted, wherein
W and H difference finger beam and height;
Score value merges submodule, for higher level's semantic knowledge insertion submodule and further feature extracting sub-module is defeated
Characteristic pattern out exports corresponding score value vector by score value mixing operation.
Preferably, the further feature extracting sub-module is described using the layer5_x layer structure in ResNet-50 network
Layer5_x layers of structure are by 3 residual error module compositions, and the layer5_x layers of structure is re-used twice, towards the higher level at one
Semantic knowledge is embedded in submodule, expression of another place towards global characteristics.
Preferably, the attention model is to each site in splicing characteristic pattern W × H plane, continuous two full connections
Layer is gradually mapped as corresponding dimension, finally obtains the attention coefficient vector.
Preferably, the score value fusion process of the score value fusion submodule is as follows:
S=(fc_1+fc_2+fc_cat)/3
Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two are directly by by higher level's semantic knowledge
Insertion submodule and the characteristic pattern of further feature extracting sub-module output are obtained through a full articulamentum respectively, and the latter is by by fc_1
It is connected in series, then by a full articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2 with fc_2.
Preferably, the network structure of the classification of the top of the branching networks will be with this in addition to the full articulamentum of the last layer
The classification number of level is corresponding outer, and other layers of parameter setting is consistent with original ResNet-50 network.
In order to achieve the above objectives, the present invention also provides a kind of realities of Layer semantics incorporation model finely identified for object
Existing method, includes the following steps:
Step S1 carries out stratification mark to each training data;
Step S2, using the weighted array of Classification Loss function and regularization constraint loss function as optimization HSE model
Objective function, successively from the 1st stratigraphic classification to n-th stratigraphic classification, step by step train the corresponding branching networks of the category;
Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model
Carry out combined optimization.
Preferably, the optimization object function of the branching networks are as follows:
Wherein, γ is a balance parameters, is used for balanced sort loss function itemWith regularization constraint loss function itemInfluence to network parameter.
Compared with prior art, a kind of Layer semantics incorporation model finely identified for object of the present invention and its realization side
Method is embedded into deep neural network mould as a kind of semantic information, and by this semantic information using the hierarchical structure of object classification
Feature representation in type solves the additional information in the object fining identification technology scheme for relying on additional information study-leading
Problem at high cost is marked, the complexity of model is reduced.
Detailed description of the invention
Fig. 1 is a kind of system architecture diagram of the Layer semantics incorporation model finely identified for object of the present invention;
Fig. 2 is the schematic diagram of core network in the specific embodiment of the invention;
Fig. 3 is the branching networks Structure Comparison figure of ResNet-50 and the present invention;
Fig. 4 is that semantic embedding expresses process schematic to the specific embodiment of the invention at the middle and upper levels;
Fig. 5 is the attention mechanism principle figure of branching networks in the specific embodiment of the invention;
Fig. 6 is the branching networks structural schematic diagram of top grade in the specific embodiment of the invention;
Fig. 7 is a kind of step process of the implementation method of the Layer semantics incorporation model finely identified for object of the present invention
Figure.
Specific embodiment
Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can
Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences
Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from
Various modifications and change are carried out under spirit of the invention.
Fig. 1 is a kind of system architecture diagram of the Layer semantics incorporation model finely identified for object of the present invention.In this hair
In bright, the hierarchical semantic knowledge embedded mobile GIS model (Hierarchical Semantic Embedding, abbreviation HSE)
Including three aspects: extraction, semantic knowledge insertion expression study and the semantic knowledge of picture depth feature are to prediction result semanteme
The constraint in space.The HSE model of the present invention is a kind of algorithm model based on depth learning technology, depends on depth nerve net
Network, depth expression study run through entire HSE frame, and HSE frame utilizes the semantic knowledge of stratification by two ways, respectively body
Regularization prediction result is guided using semantic knowledge when the insertion of semantic knowledge when present feature representation and model training.Specifically
Ground, as shown in Figure 1, a kind of Layer semantics insertion (Hierarchical Semantic finely identified for object of the present invention
Embedding, abbreviation HSE) model, comprising:
Core network 1 is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branch
Network 2, that is to say, that input picture by the preliminary extraction characteristics of image of core network 1, exported in the form of characteristic pattern to point
Branch network 2;
Several branching networks 2, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature
It extracts, the characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and embedding by introducing semantic knowledge
Enter mechanism, realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.That is, what core network 1 exported
Characteristic pattern can be separately input to carry out further feature expression in the branching networks 2 for corresponding to each level type, with feature vector
Form output, this feature vector by the calculating of softmax classifier, will also obtain the prediction label point of each hierarchical category
Cloth,
In the present invention, semantic knowledge insertion mechanism is embodied in branching networks 2, and branching networks 2 are using one kind by semanteme
The attention mechanism of knowledge elicitation, specifically, branching networks 2 first can carry out the characteristic pattern from core network 1 secondary
Expression is characterized, new branching characteristic figure is generated, is essentially the stacking of several characteristic patterns, is one 3 dimension tensor, the branch
Characteristic pattern learns the power weight map that gains attention by the score vector for combining higher level to predict and its branching characteristic figure of junior,
It substantially and the stacking of several characteristic patterns, is three-dimensional tensor, the meaning represented by the attention weight map is branching networks
Significance level of each spatial position of new feature figure produced for identification target category, the higher position of identification are inhaled
Draw more attentions, corresponding weight is also bigger in weight map corresponding position, this weight map is acted on branching characteristic figure
On, the final branching characteristic figure for generating a weighting predicts the label distribution of the level type with this.
As it can be seen that side of the multiple-limb network structure of this " trunk --- branch " of the invention by shared shallow-layer network parameter
Method, while reducing computing overhead, multiple independent branches enable model to take into account the optimization aim of different task again.
Here it should be noted that, semantic knowledge is embodied in the training of HSE model to the regularization of prediction result semantic space
In the process, the score vector that the present invention predicts higher level obligates the prediction of its junior as soft object (soft target)
As a result meet the semantic rules of classification tree, so that the semantic space of its junior's prediction result of regularization, subsequent to will be explained in.
Specifically, the following are the substantially workflows of the Layer semantics incorporation model of the present invention:
(1) an image I is inputted;
(2) core network Tr extracts the characteristic pattern of image I, is denoted as fI;
(3)fIIt is input to the branching networks Br of highest level1In;
(4)Br1To fIMake forward calculation, obtains the prediction score vector s of high-level classification1;
(5) from the i-th level to n-th layer time (i >=2):
(5.1)fIIt is input to the i-th level branching networks BriIn;
(5.2) the prediction score vector s of a stratigraphic classification oni-1It is input to branching networks BriIn;
(5.3) in si-1Guidance under, BriTo fIMake forward calculation, obtains i-th layer of prediction score vector si。
Compared with prior art, using the present invention, than algorithm optimal before this on Caltech-UCSD-Bird data set
Accuracy rate is high by 1.6%, and algorithm accuracy rate optimal than before is high by 2.3% on Vegfru data set.In addition, in Caltech-
On UCSD-Bird data set, reach in the comparable situation of optimal algorithm accuracy rate before this, the present invention can save 20% instruction
Practice data.
Hereinafter the present invention will be further illustrated by specific embodiment:
Embodiment:
1, the stratification mark of data
By taking the image of bird as an example, need to be ready to the level markup information other than image.For example, if to bird mesh, section,
The classification of 4 levels of genus and species is labeled, it is desirable to provide each training/test data should include: image, mesh classification
Label, section's class label belong to class label and kind class label.
2, the realization of HSE model
HSE model includes core network (trunk net), branching networks (branch net).The effect master of core network
If the shallow-layer feature extraction to input picture.And there are two aspects for the effect of branching networks, first is that core network output
Image shallow-layer characteristic pattern is made further further feature and is extracted, and the characteristic pattern for exporting it is suitable for level corresponding to branching networks
Identification mission;Second is that being embedded in module by introducing knowledge, upper layer semantic knowledge is realized to lower layer's branching networks feature learning
Guidance.The multiple-limb network structure of this " trunk --- branch " reduces fortune by the method for shared shallow-layer network parameter
While calculating expense, multiple independent branches enable model to take into account the optimization aim of different task again.It will be described in detail below
The network structure of core network and branching networks.
1) core network
In the specific embodiment of the invention, core network is that the layer structure based on residual error network is built, it and ResNet-
The comparison of 50 network structure is as shown in Figure 2.Conv1 is the convolution algorithm of single layer in figure, and layer2_x to layer5_x is
Layer structure is formed by connecting by several residual error module stacks, they have separately included several layer of structure convolution algorithm.?
In the structure of ResNet-50, layer2_x to layer5_x is respectively by 3,4,6,3 residual error module compositions, each residual error module
Including 3 convolution algorithm layers, amounts to 48 layers, along with the conv1 of the bottom and full articulamentum fc of output, be built into 50 jointly
The network structure of layer.
The core network of the HSE model of the present invention uses only ResNet-50 network structure layer4_x and its before
Input layer, parameter layer share 41 layers.Table 1 describes the specific network parameter of core network of the present invention, and HSE model is towards multiple
The parameter of the class prediction of level, core network is shared by the prediction network of each level.Input a picture, core network pair
It carries out preliminary shallow-layer feature extraction, is exported in the form of characteristic pattern.In the specific embodiment of the invention, input is differentiated
The picture of rate 448 × 448, the dimension that core network exports characteristic pattern is 28 × 28.
1 HSE model core network key parameter of table
2) branching networks
Fig. 3 is the branching networks Structure Comparison figure of ResNet-50 and the present invention.It is each in the structure chart of branching networks
The corresponding branching networks of the classification of level.Specifically, branching networks include: further feature extracting sub-module 201, higher level's semanteme
Knowledge is embedded in submodule 202 and score value merges submodule 203.
In order to keep the consistency with ResNet-50 network structure, make justice relatively convenient for subsequent experimental, further feature mentions
Submodule 201 is taken to continue to use the layer5_x layer structure in ResNet-50.In the specific embodiment of the invention, the layer5_x
For layer structure by 3 residual error module compositions, the characteristic pattern for exporting to the core network carries out further feature extraction, and exports
Feature representation (emphasizing local discriminant) under the guidance of higher level's semantic knowledge and the feature representation without guidance (emphasize global differentiation
Property).It specifically, is input with the characteristic pattern (28 × 28 resolution ratio) of core network output, by branching networks mid-deep strata feature
After the layer5_x layer structure arithmetic processing of extraction module 201, the characteristic pattern that Output Size is 14 × 14 exports the dimension of characteristic pattern
Degree is actually n × C × W × H, and in the specific embodiment of the invention, n 8 indicates batch size;C refers to port number, is worth and is
2048, W and H distinguishes finger beam and height, is 14.It is especially noted that layer5_x structure is re-used in branching networks
Twice, submodule, expression of another place towards global characteristics are embedded in towards higher level's semantic knowledge at one.In order to distinguish two
Layer5_x layers of structure are denoted as φ for be used for the formeri() is denoted as ψ for the latteri(), φi() and ψi() phase
It is mutually independent, not shared parameter.
In higher level's semantic knowledge insertion submodule 202, the score vector s of higher level's predictioni-1It can connect entirely by one first
Layer is connect, the semantic knowledge expression vector of one 1024 dimension is mapped to.This vector will be with φiW × H of the characteristic pattern of () output
Each site splicing in plane, in figure withIndicate this concatenation.It, for convenience, can be simply when realizing
Knowledge representation vector is copied into w × h dimension.Fig. 4 demonstrates above-mentioned treatment process.
Characteristic pattern after splicing will be learnt by an attention model α () to an attention coefficient vector.
Fig. 5 has annotated the treatment process of attention model.To each site in splicing characteristic pattern W × H plane, continuous two full connections
Layer fc is gradually mapped as 1024 peacekeepings 2048 dimension, finally obtains an attention coefficient vector (such as figure for the rightmost side Fig. 5
Shape).
Obtained attention coefficient vector, will be applied to φiOn the characteristic pattern of () output, " ⊙ " in Fig. 5 is indicated
Attention coefficient vector and φiThe value of each corresponding position of characteristic pattern of () output is multiplied, the spy then just weighted
Sign figure fi。
Submodule 203 is merged in score value, by φi() and ψiThe characteristic pattern of () output passes through score value mixing operation, output
Corresponding score value vector.
Specifically, score value fusion process is specifically expressed as follows:
S=(fc_1+fc_2+fc_cat)/3
Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two directly pass through " by φi() and ψi() is defeated
Characteristic pattern out is obtained through full articulamentum fc2 and fc1 ", and the latter is by " fc_1 and fc_2 being connected in series, then complete by one
Articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2, i.e. c × 1 "
Particularly, in the specific embodiment of the invention, since the classification of top does not have higher level's semanteme to guide it,
Therefore its network structure is actually as shown in Figure 6.Its network structure will be with the classification of the level in addition to the full articulamentum fc1 of the last layer
Number is corresponding outer, and other layers of parameter setting is consistent with original ResNet-50, and it will not be described here.
Fig. 7 is a kind of step process of the implementation method of the Layer semantics incorporation model finely identified for object of the present invention
Figure.The present invention is in training HSE model, using normal class label as optimization aim, using cross entropy loss function as excellent
The objective function of change.Specifically, i-th layer of prediction score vector is normalized with softmax function:
It needs to particularly point out, softmax function and the aforementioned softmax function referred to herein is only on temperature coefficient
Numerical value setting is upper variant, sets 1 for temperature coefficient T in the implementation herein.
To some image pattern, it is assumed that it is c in the correct label of current stratigraphic classificationi, then its penalty values can
To indicate are as follows:
Similarly, rightSummation, obtains the Classification Loss value of entire training set totality
Specifically, as shown in fig. 7, a kind of realization side of the Layer semantics incorporation model finely identified for object of the present invention
Method includes the following steps:
Step S1 carries out stratification mark to each training data.
By taking the image of bird as an example, need to be ready to the level markup information other than image.For example, if to bird mesh, section,
The classification of 4 levels of genus and species is labeled, it is desirable to provide each training/test data should include: image, mesh classification
Label, section's class label belong to class label and kind class label.
Step S2, the mesh using the weighted array of Classification Loss function and regularization loss function as optimization HSE model
Scalar functions train the corresponding branching networks of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step.
When the branching networks corresponding to a certain stratigraphic classification of training, first has to obtain the pre- of a stratigraphic classification thereon and measure
Divide vector.Therefore, this step trains corresponding point of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step
Branch network.Since the parameter of core network is shared by all branches, the step in, do not need temporarily to core network
Parameter optimizes, therefore it may only be necessary to be simply used in the good Resnet-50 network mould of pre-training on ImageNet data set
Shape parameter initializes the parameter of core network.The parameter of core network the step for be always kept in a fixed state shape
State does not need optimization and updates.
In the corresponding branching networks of i-th of stratigraphic classification of training, HSE model has been integrated with preceding i-1 stratigraphic classification
Corresponding network structure, therefore, the branching networks model of use preceding i-1 level trained before this initializes in HSE model
The parameter of preceding i-1 layers of branching networks.And to i-th layer of branching networks, sub-network ψi() is related to 9 involved in φ ()
The parameter of layer, the present invention equally use pre-training is good on ImageNet data set Resnet-50 network model parameter to it
It is initialized.In addition,Realization with a () is made of full articulamentum, their parameter using Xavier algorithm into
Row initialization, the optimization object function of branching networks are as follows:
γ is a balance parameters in formula, for balanced sort loss function item and regularization constraint loss function item to net
The influence of network parameter.Due to by regularization constraint loss function itemThe gradient value of generation scales in magnitudeSound, therefore,
It is necessary to a relatively large weighted value (γ=2 have been used in the specific embodiment of the invention) is arranged.
It should be pointed out that since the classification of top does not introduce upper layer semantic knowledge, so its corresponding branched network
Network only need to use Classification Loss function item as the objective function of Model Parameter Optimization.
It is 512 × 512 by the size scaling of training image in the specific embodiment of the invention, data increment means include
Using the mode of random cropping, the region of clip 448 × 448 is trained and training sample makees the transformation of flip horizontal.?
In terms of optimization method, present invention uses SGD algorithms and batch optimisation strategy, wherein the size of batch value is the momentum of 8, SGD
Item is 0.9, and weight decay factor is 0.00005, and initial learning rate is 0.001, is traversed and uses about 300 times in training set
Afterwards, learning rate declines 10 times and continues to train.
Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model
Carry out combined optimization.The objective function of combined optimization are as follows:
In training, other than with a smaller learning rate 0.00001, the present invention still uses identical as step 1
Data increment method and same hyper parameter configuration, it will not be described here.
Here it should be noted that, core network of the invention uses the network structure of ResNet-50, similar, can also be with
Using other such as VGG16 general convolutional neural networks structure as substitution.
Network structure cited by the present invention has 4 levels, in fact, level quantity is only with data set taxonomical hierarchy structure
Number of levels have relationship, regardless of how many a levels, the equally applicable present invention.
One of loss function used when training pattern of the present invention is KL scattering, in fact, general distance metric function,
Such as Euclidean distance, it is equally applicable.
In conclusion a kind of Layer semantics incorporation model finely identified for object of the present invention and its implementation use
The hierarchical structure of object classification is embedded into deep neural network model as a kind of semantic information, and by this semantic information
Feature representation, solve rely on additional information study-leading object fining identification technology scheme in additional information mark at
This high problem reduces the complexity of model.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any
Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore,
The scope of the present invention, should be as listed in the claims.
Claims (10)
1. a kind of Layer semantics incorporation model finely identified for object, comprising:
Core network is extracted for the shallow-layer feature to input picture, is exported in the form of characteristic pattern to each branching networks;
Several branching networks, the image shallow-layer characteristic pattern for exporting to core network carry out further further feature extraction,
The characteristic pattern for exporting it is suitable for the identification mission of level corresponding to branching networks, and by introducing semantic knowledge coil insertion device
System realizes guidance of the upper layer semantic knowledge to lower layer's branching networks feature learning.
2. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, it is characterised in that: described
Branching networks carry out quadratic character expression to the characteristic pattern from the core network, generate new branching characteristic figure, pass through
The branching characteristic figure of the score vector and its junior predicted in conjunction with higher level learns the power weight map that gains attention, by the attention
Weight map acts on branching characteristic figure, and the final branching characteristic figure for generating weighting predicts the label of the level type with this
Distribution.
3. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, it is characterised in that: described
For core network using the layer4_x layer and its input layer before of ResNet-50 network structure, parameter layer shares 41 layers, institute
The parameter for stating core network is shared by the prediction network of each level.
4. a kind of Layer semantics incorporation model finely identified for object as described in claim 1, which is characterized in that described
Branching networks include:
Further feature extracting sub-module, the characteristic pattern for exporting to the core network carries out further feature extraction, and exports
Feature representation under the guidance of higher level's semantic knowledge and the feature representation without guidance;
Higher level's semantic knowledge is embedded in submodule, the score vector s that higher level is predictedi-1By a full articulamentum, it is mapped to semanteme and knows
Know expression vector, and will be each of in the W × H plane for the characteristic pattern that exported with the further feature extracting sub-module by the vector
Site splicing learns the characteristic pattern after splicing to an attention coefficient vector, by the attention by an attention model
Force coefficient vector is applied to the characteristic pattern of the further feature extracting sub-module output, the characteristic pattern weighted, wherein W and H
Finger beam and height respectively;
Score value merges submodule, for export higher level's semantic knowledge insertion submodule and further feature extracting sub-module
Characteristic pattern exports corresponding score value vector by score value mixing operation.
5. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described
Further feature extracting sub-module is using the layer5_x layer structure in ResNet-50 network, and the layer5_x layers of structure is by 3
Residual error module composition, the layer5_x layers of structure are re-used twice, are embedded in submodule towards higher level's semantic knowledge at one,
Expression of another place towards global characteristics.
6. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described
Attention model is continuously gradually mapped as accordingly each site in splicing characteristic pattern W × H plane with two full articulamentums
Dimension finally obtains the attention coefficient vector.
7. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described
The score value fusion process that score value merges submodule is as follows:
S=(fc_1+fc_2+fc_cat)/3
Wherein, fc_1, fc_2, fc_cat are the dimensional vector of c × 1, the above two are directly by the way that higher level's semantic knowledge to be embedded in
Submodule and the characteristic pattern of further feature extracting sub-module output are obtained through a full articulamentum respectively, the latter by by fc_1 and
Fc_2 is connected in series, then by a full articulamentum fc_concate operation, obtains and the identical dimension of fc_1, fc_2.
8. a kind of Layer semantics incorporation model finely identified for object as claimed in claim 4, it is characterised in that: described
The network structure of the classification of the top of branching networks in addition to the full articulamentum of the last layer will it is corresponding with the classification number of the level other than,
Other layers of parameter setting is consistent with original ResNet-50 network.
9. a kind of implementation method of the Layer semantics incorporation model finely identified for object, includes the following steps:
Step S1 carries out stratification mark to each training data;
Step S2, the mesh using the weighted array of Classification Loss function and regularization constraint loss function as optimization HSE model
Scalar functions train the corresponding branching networks of the category successively from the 1st stratigraphic classification to n-th stratigraphic classification step by step;
Step S3, after preliminary training is all obtained in all branching networks, the parameter all to entire complete HSE model is carried out
Combined optimization.
10. a kind of implementation method of the HSE model finely identified for object as claimed in claim 9, which is characterized in that institute
State the optimization object function of branching networks are as follows:
Wherein, γ is a balance parameters, is used for balanced sort loss function itemWith regularization constraint loss function itemIt is right
The influence of network parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810924288.XA CN109102024B (en) | 2018-08-14 | 2018-08-14 | Hierarchical semantic embedded model for fine object recognition and implementation method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810924288.XA CN109102024B (en) | 2018-08-14 | 2018-08-14 | Hierarchical semantic embedded model for fine object recognition and implementation method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109102024A true CN109102024A (en) | 2018-12-28 |
CN109102024B CN109102024B (en) | 2021-08-31 |
Family
ID=64849727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810924288.XA Active CN109102024B (en) | 2018-08-14 | 2018-08-14 | Hierarchical semantic embedded model for fine object recognition and implementation method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109102024B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961107A (en) * | 2019-04-18 | 2019-07-02 | 北京迈格威科技有限公司 | Training method, device, electronic equipment and the storage medium of target detection model |
CN110097108A (en) * | 2019-04-24 | 2019-08-06 | 佳都新太科技股份有限公司 | Recognition methods, device, equipment and the storage medium of non-motor vehicle |
CN110288049A (en) * | 2019-07-02 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating image recognition model |
CN110321970A (en) * | 2019-07-11 | 2019-10-11 | 山东领能电子科技有限公司 | A kind of fine-grained objective classification method of multiple features based on branch neural network |
CN110837856A (en) * | 2019-10-31 | 2020-02-25 | 深圳市商汤科技有限公司 | Neural network training and target detection method, device, equipment and storage medium |
CN111242222A (en) * | 2020-01-14 | 2020-06-05 | 北京迈格威科技有限公司 | Training method of classification model, image processing method and device |
CN111711821A (en) * | 2020-06-15 | 2020-09-25 | 南京工程学院 | Information hiding method based on deep learning |
CN111814920A (en) * | 2020-09-04 | 2020-10-23 | 中国科学院自动化研究所 | Fine classification method and system for multi-granularity feature learning based on graph network |
CN112990147A (en) * | 2021-05-06 | 2021-06-18 | 北京远鉴信息技术有限公司 | Method and device for identifying administrative-related images, electronic equipment and storage medium |
CN113095349A (en) * | 2020-01-09 | 2021-07-09 | 北京沃东天骏信息技术有限公司 | Image identification method and device |
CN113642415A (en) * | 2021-07-19 | 2021-11-12 | 南京南瑞信息通信科技有限公司 | Face feature expression method and face recognition method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100215277A1 (en) * | 2009-02-24 | 2010-08-26 | Huntington Stephen G | Method of Massive Parallel Pattern Matching against a Progressively-Exhaustive Knowledge Base of Patterns |
CN106682060A (en) * | 2015-11-11 | 2017-05-17 | 奥多比公司 | Structured Knowledge Modeling, Extraction and Localization from Images |
CN107979606A (en) * | 2017-12-08 | 2018-05-01 | 电子科技大学 | It is a kind of that there is adaptive distributed intelligence decision-making technique |
CN108229543A (en) * | 2017-12-22 | 2018-06-29 | 中国科学院深圳先进技术研究院 | Image classification design methods and device |
-
2018
- 2018-08-14 CN CN201810924288.XA patent/CN109102024B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100215277A1 (en) * | 2009-02-24 | 2010-08-26 | Huntington Stephen G | Method of Massive Parallel Pattern Matching against a Progressively-Exhaustive Knowledge Base of Patterns |
CN106682060A (en) * | 2015-11-11 | 2017-05-17 | 奥多比公司 | Structured Knowledge Modeling, Extraction and Localization from Images |
CN107979606A (en) * | 2017-12-08 | 2018-05-01 | 电子科技大学 | It is a kind of that there is adaptive distributed intelligence decision-making technique |
CN108229543A (en) * | 2017-12-22 | 2018-06-29 | 中国科学院深圳先进技术研究院 | Image classification design methods and device |
Non-Patent Citations (3)
Title |
---|
TIANSHUI CHEN ET AL: "Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition", 《27TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
TIANSHUI CHEN ET AL: "Recurrent Attentional Reinforcement Learning for Multi-label Image Recognition", 《AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
姚湘: "基于非线性知识迁移的交叉视角动作识别", 《重庆邮电大学学报( 自然科学版)》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961107B (en) * | 2019-04-18 | 2022-07-19 | 北京迈格威科技有限公司 | Training method and device for target detection model, electronic equipment and storage medium |
CN109961107A (en) * | 2019-04-18 | 2019-07-02 | 北京迈格威科技有限公司 | Training method, device, electronic equipment and the storage medium of target detection model |
CN110097108B (en) * | 2019-04-24 | 2021-03-02 | 佳都新太科技股份有限公司 | Method, device, equipment and storage medium for identifying non-motor vehicle |
CN110097108A (en) * | 2019-04-24 | 2019-08-06 | 佳都新太科技股份有限公司 | Recognition methods, device, equipment and the storage medium of non-motor vehicle |
CN110288049B (en) * | 2019-07-02 | 2022-05-24 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating image recognition model |
CN110288049A (en) * | 2019-07-02 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating image recognition model |
CN110321970A (en) * | 2019-07-11 | 2019-10-11 | 山东领能电子科技有限公司 | A kind of fine-grained objective classification method of multiple features based on branch neural network |
CN110837856A (en) * | 2019-10-31 | 2020-02-25 | 深圳市商汤科技有限公司 | Neural network training and target detection method, device, equipment and storage medium |
CN113095349A (en) * | 2020-01-09 | 2021-07-09 | 北京沃东天骏信息技术有限公司 | Image identification method and device |
CN111242222A (en) * | 2020-01-14 | 2020-06-05 | 北京迈格威科技有限公司 | Training method of classification model, image processing method and device |
CN111242222B (en) * | 2020-01-14 | 2023-12-19 | 北京迈格威科技有限公司 | Classification model training method, image processing method and device |
CN111711821A (en) * | 2020-06-15 | 2020-09-25 | 南京工程学院 | Information hiding method based on deep learning |
CN111814920A (en) * | 2020-09-04 | 2020-10-23 | 中国科学院自动化研究所 | Fine classification method and system for multi-granularity feature learning based on graph network |
CN112990147A (en) * | 2021-05-06 | 2021-06-18 | 北京远鉴信息技术有限公司 | Method and device for identifying administrative-related images, electronic equipment and storage medium |
CN113642415A (en) * | 2021-07-19 | 2021-11-12 | 南京南瑞信息通信科技有限公司 | Face feature expression method and face recognition method |
CN113642415B (en) * | 2021-07-19 | 2024-06-04 | 南京南瑞信息通信科技有限公司 | Face feature expression method and face recognition method |
Also Published As
Publication number | Publication date |
---|---|
CN109102024B (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109102024A (en) | A kind of Layer semantics incorporation model finely identified for object and its implementation | |
CN108596882B (en) | The recognition methods of pathological picture and device | |
CN110287800A (en) | A kind of remote sensing images scene classification method based on SGSE-GAN | |
CN110377686A (en) | A kind of address information Feature Extraction Method based on deep neural network model | |
CN110532920A (en) | Smallest number data set face identification method based on FaceNet method | |
CN109299274A (en) | A kind of natural scene Method for text detection based on full convolutional neural networks | |
CN107330444A (en) | A kind of image autotext mask method based on generation confrontation network | |
CN104933428B (en) | A kind of face identification method and device based on tensor description | |
Weng et al. | Semi-supervised vision transformers | |
CN110533024A (en) | Biquadratic pond fine granularity image classification method based on multiple dimensioned ROI feature | |
CN110516530A (en) | A kind of Image Description Methods based on the enhancing of non-alignment multiple view feature | |
CN111444343A (en) | Cross-border national culture text classification method based on knowledge representation | |
CN106372597B (en) | CNN Vehicle Detection method based on adaptive contextual information | |
CN107526798A (en) | A kind of Entity recognition based on neutral net and standardization integrated processes and model | |
CN110334724A (en) | Remote sensing object natural language description and multiple dimensioned antidote based on LSTM | |
CN109800768A (en) | The hash character representation learning method of semi-supervised GAN | |
CN107577983A (en) | It is a kind of to circulate the method for finding region-of-interest identification multi-tag image | |
Zhang et al. | Knowledge amalgamation for object detection with transformers | |
CN109948628A (en) | A kind of object detection method excavated based on identification region | |
Qiu et al. | Semantic-visual guided transformer for few-shot class-incremental learning | |
CN109886105A (en) | Price tickets recognition methods, system and storage medium based on multi-task learning | |
Xu et al. | Pixdet: Prohibited item detection in x-ray image based on whole-process feature fusion and local-global semantic dependency interaction | |
Wang et al. | Detection of key structure of auroral images based on weakly supervised learning | |
Wang et al. | Generative Adversarial Networks Based on Dynamic Word-Level Update for Text-to-Image Synthesis | |
Lee et al. | Boundary-aware camouflaged object detection via deformable point sampling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |