CN105678340B

CN105678340B - A kind of automatic image marking method based on enhanced stack autocoder

Info

Publication number: CN105678340B
Application number: CN201610035975.7A
Authority: CN
Inventors: 柯逍; 周铭柯; 杜明智
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2016-01-20
Filing date: 2016-01-20
Publication date: 2018-12-25
Anticipated expiration: 2036-01-20
Also published as: CN105678340A

Abstract

The present invention relates to a kind of automatic image marking methods based on enhanced stack autocoder: aiming at the problem that tradition SAE model in deep learning, which is difficult to effectively train, inclined data set, it proposes a kind of balance stack autocoder for promoting low frequency tags accuracy rate, preferably improves the mark effect of low frequency tags.Then aiming at the problem that single B-SAE model is unstable to cause mark effect easily to vary widely with parameter change, it is proposed that a kind of enhancing for image labeling task balances stack autocoder, by being grouped sequentially training, the optimal B-SAE submodel of weighted accumulation each group, stable annotation results are obtained.This method is by layer-by-layer pre-training weight and with Back Propagation Algorithm entirety tuning, improve that traditional shallow Model generalization ability is weak, is difficult to the problems such as converging to best extreme point, and reinforce the training of weak exemplar in the training process, improve the mark effect of entire model, this method simple and flexible has stronger practicability.

Description

A kind of automatic image marking method based on enhanced stack autocoder

Technical field

The present invention relates to pattern-recognitions and computer vision field, especially a kind of to be based on enhanced stack autocoder Automatic image marking method.

Background technique

As multimedia image technology is fast-developing, image information is in explosive increase on internet.These digital pictures It is very widely used, such as business, news media, medicine, education etc..Therefore, how user to be helped quickly and accurately to look for Become one of the hot subject of multimedia research in recent years to the image of needs.And it solves the most important technology of this project and is exactly Image retrieval and automatic image annotation technology.

Automatic image annotation, which refers to, adds several keywords to image to indicate the semantic content of image automatically.Automated graphics Mark can use the image set marked, and the automatic relational model for learning semantic concept space and visual signature space is used in combination The image of this model mark unknown semantics.On the one hand, automatic image annotation attempts to high-level semantics features and bottom visual signature Between establish a bridge block, therefore, it can solve semanteme existing for most of Content-Based Image Retrieval methods to a certain degree Divide problem, and there is preferable objectivity.On the other hand, text relevant to picture material can be generated in automatic image annotation Word information has better accuracy.If being able to achieve automatic image annotation, existing image retrieval problem actually may be used To be converted to more mature text retrieval problem.Therefore, automatic image annotation technology can easily be realized based on keyword Image retrieval, meet the retrieval habit of people.Generally speaking, automatic image annotation is related to computer vision, machine learning, letter Breath retrieval etc. it is multi-party and content, there is very strong researching value and potential business application, as image classification, image retrieval, Image understanding and intelligent image analysis etc..

According to the main implementation feature of existing automatic image marking method, two classes can be divided into: the mark based on probability statistics Injecting method and mask method based on machine learning.Although the method based on probability statistics can very easily expand to big data Collection, but overall mark effect is not ideal enough.Method based on machine learning, once model training finishes, so that it may it carries out quick Mark, and Most current classification, return etc. learning methods be shallow structure algorithm, for its generalization ability of complicated classification problem by It is restricted to certain.In recent years, innovation algorithm of the deep learning as machine learning, is widely used in target identification, image classification, language Sound identification and other fields, but rarely have and be applied in image labeling problem.Since deep learning can train deep layer, complexity Model, processing big data problem on have great advantage.The two models of DBN and CNN are label is less, feature is simple, special Better effects can be obtained by levying in complete identification mission, and image labeling problem label is numerous, characteristics of image is various and complicated, And it greatly affected in real world images there is also noise problems such as a large amount of each class texts, network address, two dimensional code and image watermarks The application effect of DBN and CNN.And SAE network, the more approximate expression between emphasis feature, model is easily adjusted by the defeated of complexity Enter to be expressed as preferably to export and be applied to particular condition, therefore, this patent selects SAE model to solve the problems, such as image labeling.

Summary of the invention

The purpose of the present invention is to provide a kind of automatic image marking method based on enhanced stack autocoder, with Overcome defect existing in the prior art, solves the problems, such as the automatic image annotation for multipair as multi-tag.

To achieve the above object, the technical scheme is that it is a kind of based on the automatic of enhanced stack autocoder Image labeling method is realized in accordance with the following steps:

Step S1: building stack autocoder model differentiates weak label sample on the stack autocoder model This, and noise is added to increase the frequency of training of the weak exemplar, and then constructs balance stack autocoder model；

Step S2: automatic to training image station work quantum balancing stack by the balance stack autocoder model Encoder model, the optimal submodel of weighted accumulation each group obtain enhancing balance stack autocoder model；

Step S3: unknown images are input to the enhancing balance stack autocoder model and export annotation results.

In an embodiment of the present invention, in the step S1, further include following steps:

Step S11: encoder f is defined_θWith decoder g_θ'；The encoder f_θInput picture x is converted into hidden layer expression H, decoder g_θ'By the hidden layer expression h be reconstructed into the consistent vector x of input picture x dimension '；Wherein, f_θ(x)=σ (Wx+b), θ={ W, b }, W are network weight, meet W'=W^T, b is bias vector,To activate letter Number；θ '={ W', b'}；

Step S12: one function of study makes to export x'=g_θ'(f_θ(x)) and the input picture x is approximate, and Definition loss function is L (x, x')=(x-x')², and learnt by minimizing loss function:

Step S13: note has L layers for the SAE model of image labeling, and is indicated with serial number l ∈ { 1 ..., L }；Use h^lTable Show l layers of output vector, W^lAnd b^lNetwork weight and the biasing for indicating l layers, by autocoder to { W^l,b^l, l ∈ { 1 ..., L } layer-by-layer pre-training；

Step S14: feed forward process is executed and with Back Propagation Algorithm tuning；The feedforward of the stack autocoder model Operation statement are as follows: h^l+1=σ (W^l+1h^l+b^l+1), l ∈ { 0 ..., L-1 }；The back-propagating of the stack autocoder model The statement of algorithm tuning are as follows:Wherein,It is multiple autocodings The composite function of device model, and θ_lFor parameter { W^l,b^l, l ∈ { 1 ..., L }, loss function is L (x, y)=(x-y)²；

Step S15: bound variable is defined, vector C=(c is enabled₁,c₂,...,c_M), indicate keyword y_iIn training The number occurred in collection P, indicates the average frequency of occurrence of keyword；Vector C=(c₁,c₂,...,c_M) indicate I-th width image x_iEach keyword Y_i ^j, number Y that j ∈ { 1,2 ..., M } occurs in training set_C,i=C*Y_i；To To in image x_iThe minimum keyword of middle frequency of occurrence is

Step S16: define Φ (x) function, the stack autocoder model in the training process to training sample into Row judgement adds noise appropriate to input picture x if input picture x includes the number of low frequency tags more than k；It is fixed Adopted Γ (x) function increases training strength to input picture x, if the frequency of occurrence of the included label of input picture x is lower than default Threshold value (generally takes), then increase frequency of training, wherein function gamma (x) are as follows:

Wherein, α and β is constant coefficient, and for β for determining the sample for needing to aggravate training, α, which is used to control, needs to aggravate training The training strength of sample；

Function phi (x) are as follows:

Wherein, χ is constant coefficient, and for controlling the intensity of noise addition, d is image x_iThe dimension of feature,Indicate image x_i The value of j-th of dimension, Ran () are random number functions；

Step S17: adjusting and optimizing equation is balanced stack autocoder model；It willIt is adjusted toIt willIt is adjusted toAfter model training is good, institute State the prediction distribution D of the output as keyword of forecast image of the last layer of balance stack autocoder model.

In an embodiment of the present invention, further include following steps in the step S2:

Step S21: station work quantum balancing stack autocoder model will balance stack autocoder model For model by the different different groups of model split of making an uproar that adds, each group interior according to different hidden neuron number division submodelsT indicates that balance stack autocoder model adds the mode of making an uproar using t kind, and k indicates k-th of sub- B-SAE mould The hidden neuron number of type setting；

Step S22: setting initial weight simultaneously calculates quantum balancing stack autocoder model category of model error rate, It is as follows to training data setting weight:

W=(w₁₁,...,w_1i,...,w_1N),

The error in classification rate of calculating:Wherein,It indicates: assuming that image x_iTrue tag collection Y_iComprising c keyword, And tally set Y is obtained by model prediction_i ^*Number be also c, if Y_i=Y_i ^*, then for Otherwise false is true；

Step S23: calculated equilibrium stack autocoder model weight, and update training data weight；According to group Interior all sonsThe error in classification rate of model, the minimum Model B of available group classification error rate_-SAE^tAnd it is corresponding Error in classification rate e^t, calculate B_-SAE^tWeight:After the model training of t group is complete, training number is updated According to weight, to obtain the weight of next group model, the mode for updating training data weight is as follows:

W_t+1={ w_t+1,1,...,w_t+1,i,...,w_t+1,N},

Step S24: weighted accumulation quantum balancing stack autocoder model obtains enhancing balance stack autocoding Device model arrives keyword prediction distribution after all having trained for all groups:

Compared to the prior art, the invention has the following advantages: one kind proposed by the invention is based on enhanced stack The automatic image marking method of formula autocoder, using the powerful feature representation ability of SAE deep neural network, based on to certainly Motion video mark, multi-tag are classified and the understanding of stack autocoder, proposes for image data set label imbalance, is difficult to The automatic image marking method of the enhanced stack autocoder for the problems such as effectively training big image data, finally obtains one kind Deep layer, complicated automatic image annotation model, especially a kind of automated graphics mark based on enhanced stack autocoder Injecting method.This method is simple, realizes that flexibly practicability is stronger.

Detailed description of the invention

Fig. 1 is the flow chart of the automatic image marking method based on enhanced stack autocoder in the present invention.

Specific embodiment

With reference to the accompanying drawing, technical solution of the present invention is specifically described.

The invention proposes a kind of automatic image marking methods based on enhanced stack autocoder, first against depth Tradition SAE (Stacked Auto-Encoder, SAE) model, which is difficult to effectively training, in degree study the problem of inclined data set, mentions A kind of balance stack autocoder (Balance Stacked Auto-Encoder, B- promoting low frequency tags accuracy rate out SAE), preferably improve the mark effect of low frequency tags.Then for single B-SAE model it is unstable (model is complicated, parameter compared with It is more) cause to mark the problem of effect is easily varied widely with parameter change, propose a kind of increasing for image labeling task Strong balance stack autocoder (Enhanced Balance StackedAuto-Encoder, EB-SAE), is pressed by grouping Sequence training, the optimal B-SAE submodel of weighted accumulation each group, obtain stable annotation results.Specific step is as follows:

S1: first constructing SAE model, and weak exemplar is then differentiated on SAE model and adds the weak label sample of noise increase This frequency of training, constructs B-SAE model with this；

S2: B-SAE model B-SAE model to training image station work, weighted accumulation each group are obtained using step S1 Optimal submodel obtains EB-SAE model, as shown in Figure 1；

S3: unknown images are input to EB-SAE model that step S2 is obtained and export annotation results.

Further, in the present embodiment, building B-SAE model is realized according to the following steps in step sl:

Step S11: encoder f is defined_θWith decoder g_θ', encoder f_θInput picture x is converted into hidden layer table Up to h, decoder g_θ'By h be reconstructed into the consistent vector x of x dimension '.f_θ(x)=σ (Wx+b), wherein θ= { W, b }, W are network weight, meet W'=W^T, b is bias vector_, For activation primitive.Wherein, θ '={ W', b'}.

Step S12: one function of study makes to export x'=g_θ'(f_θ(x)) and x is approximate, defines loss function For L (x, x')=(x-x')², then the model can be learnt by minimizing loss function:

Step S13: feed forward process is executed and with Back Propagation Algorithm tuning, it is assumed that the SAE model for image labeling has L Layer is indicated with serial number l ∈ { 1 ..., L }.Use h^lIndicate l layers of output vector (h⁰=x indicates input, h^LIndicate output).W^l And b^lIndicate l layers of network weight and biasing.According to noted earlier, { W^l,b^l, l ∈ { 1 ..., L } is successively instructed in advance using AE Practice.The feedforward operation of SAE can state are as follows: h^l+1=σ (W^l+1h^l+b^l+1), l ∈ { 0 ..., L-1 }, entire model use after to biography Broadcast algorithm tuning:Wherein,It is the synthesis of multiple AE models Function, and θ_lFor parameter { W^l,b^l, l ∈ { 1 ..., L }, loss function is defined as L (x, y)=(x-y)²。

Step S14: bound variable is defined, vector C=(c is enabled₁,c₂,...,c_M),Indicate keyword y_iIn training set The number occurred in P,Indicate the average frequency of occurrence of keyword.In this way, our available vector, Indicate the i-th width image x_iEach keyword Y_i ^j, number Y that j ∈ { 1,2 ..., M } occurs in training set_C,i=C*Y_i(* table Show that two vector corresponding points are multiplied to obtain a new vector).To obtain in image x_iThe minimum keyword of middle frequency of occurrence is

Step S15: defining Φ (x) function, model allowed to judge in the training process training sample, if sample x, That is input picture x, the number comprising low frequency tags then add noise appropriate to the sample more than k.Γ (x) function is defined, Training strength is increased to sample x, if the frequency of occurrence of the included label of the sample is lower than certain threshold value, increases its training time Number, in the present embodiment, which generally takes

Wherein, α and β is constant coefficient, and for determining which sample needs to aggravate training, α needs to aggravate training β for controlling Sample training strength.

Wherein, χ is constant coefficient, and for controlling the intensity of noise addition, d is image x_iThe dimension of feature,Indicate image x_iThe value of j-th of dimension, Ran () are random number functions, for example, the desirable random letter for obeying (0,1) Gaussian Profile of Ran () The equally distributed random function that several or value is 0 to 1.

Step S16: adjusting and optimizing equation obtains B-SAE model, is adjusted to It is adjusted toAfter model training is good, the output of the last layer of B-SAE is prognostic chart The prediction distribution D of the keyword of picture.

Further, in the present embodiment, training EB-SAE model is realized in step S2 according to the following steps:

Step S21: the sub- B-SAE model of station work, B-SAE model add the different group of model split of making an uproar by different, often Submodel is divided according to different hidden neuron numbers in one groupT indicates that Model B-SAE adds the side of making an uproar using t kind Formula, k indicate the hidden neuron number of k-th of sub- B-SAE model setting.

Step S22: setting initial weight simultaneously calculates sub- B-SAE category of model error rate, such as to training data setting weight Under:

W=(w₁₁,...,w_1i,...,w_1N),

In this way, error in classification rate can calculate in this way:Its In,The meaning of expression is, it is assumed that image x_iTrue tag collection Y_iInclude C keyword, and tally set Y is obtained by model prediction_i ^*Number be also c, if Y_i=Y_i ^*, then it is false, is otherwise true.

Step S23: calculating B-SAE Model Weight and updates training data weight, according to all sons in organizingModel Error in classification rate, the minimum Model B of available group classification error rate_-SAE^tWith corresponding error in classification rate e^t, B_-SAE^t Weight can calculate in this way:After the model training of t group is complete, need to update the weight of training data, Preferably to obtain the weight of next group model, the mode for updating training data weight is as follows:

W_t+1={ w_t+1,1,...,w_t+1,i,...,w_t+1,N},

Step S24: the sub- B-SAE model of weighted accumulation obtains EB-SAE model, after all having trained for all groups, so that it may To keyword prediction distribution:

The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.

Claims

1. a kind of automatic image marking method based on enhanced stack autocoder, which is characterized in that in accordance with the following steps It realizes:

Step S1: building stack autocoder model differentiates weak exemplar on the stack autocoder model, and Addition noise constructs balance stack autocoder model to increase the frequency of training of the weak exemplar；

Step S2: by the balance stack autocoder model to training image station work quantum balancing stack autocoding Device model, the optimal submodel of weighted accumulation each group obtain enhancing balance stack autocoder model；

Step S3: unknown images are input to the enhancing balance stack autocoder model and export annotation results；

In the step S1, further include following steps:

Step S11: encoder f is defined_θWith decoder g_θ'；The encoder f_θInput picture x is converted into hidden layer expression h, decoding Device g_θ'By the hidden layer expression h be reconstructed into the consistent vector x of input picture x dimension '；Wherein, f_θ(x)=σ (Wx+ B), θ={ W, b }, W are network weight, meet W'=W^T, b is bias vector,For activation primitive；θ '={ W', b'}；

Step S12: one function of study makes to export x'=g_θ'(f_θ(x)) and the input picture x is approximate, and defines loss function For L (x, x')=(x-x')², and learnt by minimizing loss function:

Step S13: note has L layers for the SAE model of image labeling, and is indicated with serial number l ∈ { 1 ..., L }；Use h^lIndicate l The output vector of layer, W^lAnd b^lNetwork weight and the biasing for indicating l layers, by autocoder to { W^l,b^l, l ∈ { 1 ..., L } layer-by-layer pre-training；

Step S14: feed forward process is executed and with Back Propagation Algorithm tuning；The feedforward operation of the stack autocoder model Statement are as follows: h^l+1=σ (W^l+1h^l+b^l+1), l ∈ { 0 ..., L-1 }；The Back Propagation Algorithm of the stack autocoder model Tuning statement are as follows:Wherein,It is multiple autocoder moulds The composite function of type, and θ_lFor parameter { W^l,b^l, l ∈ { 1 ..., L }, loss function is L (x, y)=(x-y)²；

Step S15: bound variable is defined, vector C=(c is enabled₁,c₂,...,c_M),Indicate keyword y_iIn training set P The number of appearance,Indicate the average frequency of occurrence of keyword；Vector C=(c₁,c₂,...,c_M) indicate the i-th width Image x_iEach keyword Y_i ^j, number Y that j ∈ { 1,2 ..., M } occurs in training set_C,i=C*Y_i；To obtain scheming As x_iThe minimum keyword of middle frequency of occurrence is

Step S16: defining Φ (x) function, and the stack autocoder model in the training process sentences training sample It is disconnected, if input picture x includes the number of low frequency tags more than k, noise appropriate is added to input picture x；Define Γ (x) function increases training strength to input picture x, if the frequency of occurrence of the included label of input picture x is lower than default threshold Value, then increase frequency of training, wherein function gamma (x) are as follows:

Wherein, α and β is constant coefficient, and β is for determining that the sample for needing to aggravate training, α are used to control the sample for needing to aggravate training Training strength；

Function phi (x) are as follows:

Wherein, χ is constant coefficient, and for controlling the intensity of noise addition, d is image x_iThe dimension of feature,Indicate image x_iJth The value of a dimension, Ran () are random number functions；

Step S17: adjusting and optimizing equation is balanced stack autocoder model；It willIt is adjusted toIt willIt is adjusted toIt is described after model training is good Balance the prediction distribution D of the output as keyword of forecast image of the last layer of stack autocoder model；

Further include following steps in the step S2:

Step S21: station work quantum balancing stack autocoder model will balance stack autocoder model By the different different groups of model split of making an uproar that adds, submodel is divided according to different hidden neuron numbers in each groupT indicates that balance stack autocoder model adds the mode of making an uproar using t kind, and k indicates k-th of sub- B-SAE mould The hidden neuron number of type setting；

Step S22: setting initial weight simultaneously calculates quantum balancing stack autocoder model category of model error rate, to instruction It is as follows to practice data setting weight:

It calculatesError in classification rate:Wherein, It indicates: assuming that image x_iTrue tag collection Y_iComprising c keyword, and pass through modelPrediction Obtain tally set Y_i ^*Number be also c, if Y_i=Y_i ^*, thenIt is otherwise true for false；

Step S23: calculated equilibrium stack autocoder model weight, and update training data weight；According to institute in group There is sonThe error in classification rate of model, the minimum Model B-SAE of available group classification error rate^tAnd it is corresponding Error in classification rate e^t, calculate B-SAE^tWeight:After the model training of t group is complete, training data is updated Weight, to obtain the weight of next group model, the mode for updating training data weight is as follows:

Step S24: weighted accumulation quantum balancing stack autocoder model obtains enhancing balance stack autocoder mould Type arrives keyword prediction distribution after all having trained for all groups: