CN105678340A - Automatic image marking method based on enhanced stack type automatic encoder - Google Patents

Automatic image marking method based on enhanced stack type automatic encoder Download PDF

Info

Publication number
CN105678340A
CN105678340A CN201610035975.7A CN201610035975A CN105678340A CN 105678340 A CN105678340 A CN 105678340A CN 201610035975 A CN201610035975 A CN 201610035975A CN 105678340 A CN105678340 A CN 105678340A
Authority
CN
China
Prior art keywords
model
theta
training
centerdot
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610035975.7A
Other languages
Chinese (zh)
Other versions
CN105678340B (en
Inventor
柯逍
周铭柯
杜明智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201610035975.7A priority Critical patent/CN105678340B/en
Publication of CN105678340A publication Critical patent/CN105678340A/en
Application granted granted Critical
Publication of CN105678340B publication Critical patent/CN105678340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention relates to an automatic image marking method based on an enhanced stack type automatic encoder. A balanced stack type automatic encoder which can improve the accuracy of low-frequency labels is provided to solve the problem that a traditional SAE model in deep learning cannot train a biased data set effectively, and the marking effect of the low-frequency labels is improved. An enhanced stack type automatic encoder aimed at an image marking task is provided to solve the problem that a single B-SAE model is instable and causes that the marking effect changes greatly with parameters, groups are trained sequentially, and optimal B-SAE sub models of the groups are weighted and accumulated, and a stable marking result is obtained. According to the method, the weights are trained layer by layer, a backward spreading algorithm is used for whole optimization, the problems including that a traditional shallow layer model is weak in the generalization capability and an optimal extreme point is hard to converge to are solved, training of weak label samples are strengthened in the training process, the marking effect of the whole model is improved, and the method is simple, flexible and highly practical.

Description

A kind of automatic image marking method based on enhancement type stack compiler
Technical field
The present invention relates to pattern recognition and computer vision field, particularly a kind of automatic image marking method based on enhancement type stack compiler.
Background technology
Along with multimedia image technology is fast-developing, on internet, graphic information is explosive increase. The application of these digital pictures widely, such as aspects such as business, news media, medical science, education. Therefore, how to help user to find one of the image of needs hot subject becoming multi-media research in recent years quickly and accurately. And solve the most important technology of this problem be exactly image retrieval and automatic image annotation technology.
Automatic image annotation refers to and automatically adds some keywords to represent the semantic content of image to image. Automatic image annotation can utilize the image collection marked, automatically the relational model in the semantic concept space of study and visual signature space, and mark the image of unknown semantics with this model. On the one hand, automatic image annotation is attempted to setting up a bridge block between high-level semantics features and bottom visual signature, and therefore, it can to a certain degree solve the semantic gap problem that most of Content-Based Image Retrieval method exists, and has good objectivity. On the other hand, automatic image annotation can generate the Word message relevant to image content, has better accuracy. If automatic image annotation can be realized, in fact so existing image retrieval problem can change into more ripe text retrieval problem. Therefore, automatic image annotation technology can realize the image based on keyword easily and retrieve, and meets the retrieval habit of people. Generally speaking, automatic image annotation relate to computer vision, machine learning, information retrieval etc. in many ways and content, there is very strong researching value and potential commercial applications, as image classification, image retrieval, image are understood and intelligent image analysis etc.
Main implementation feature according to existing automatic image marking method, it is possible to be divided into two classes: the mask method based on probability-statistics and the mask method based on machine learning.Although the method based on probability-statistics can expand to large data sets very easily, but overall mark effect is not ideal enough. Based on the method for machine learning, once model training is complete, so that it may to mark fast, and the learning methods such as current most classification, recurrence are shallow structure algorithm, are subject to certain restriction for complexity classification its generalization ability of problem. In recent years, degree of depth study, as the innovation algorithm of machine learning, extensively for Target Recognition, image classification, speech recognition and other field, but rarely has and is applied in image labeling problem. Owing to degree of depth study can train deep layer, complicated model, the big data problem of process has great advantage. DBN and CNN these two identification tasks that model is less at label, feature is simple, feature is complete can obtain better effects, and image labeling problem label is numerous, characteristics of image is various and complicated, and real world images also exists the noise problems such as a large amount of each class text, network address, Quick Response Code and image watermark, greatly have impact on the effect of DBN and CNN. And SAE network, more focus on the approximate expression between feature, it is easy to the input of complexity is expressed as desirable output and is applied to particular condition by adjustment model, therefore, this patent selects SAE model to solve image labeling problem.
Summary of the invention
It is an object of the invention to provide a kind of automatic image marking method based on enhancement type stack compiler, to overcome in prior art the defect existed, solve the automatic image annotation problem for the many labels of multi-object.
For achieving the above object, the technical scheme of the present invention is: a kind of automatic image marking method based on enhancement type stack compiler, realizes in accordance with the following steps:
Step S1: build stack compiler model, differentiate weak exemplar on described stack compiler model, and add noise to increase the training number of times of described weak exemplar, and then build balance stack compiler model;
Step S2: by described balance stack compiler model to training image grouping training quantum balancing stack compiler model, the weighting each group of optimum submodel that add up is enhanced balance stack compiler model;
Step S3: unknown images is input to described enhancing balance stack compiler model and exports annotation results.
In an embodiment of the present invention, in described step S1, also comprise the steps:
Step S11: definition encoder fθWith demoder gθ'; Described encoder fθInput picture x is converted to hidden layer and expresses h, demoder gθ'By described hidden layer express h be reconstructed into the vector x consistent with described input picture x dimension '; Wherein, fθX ()=σ (W x+b), θ={ W, b}, W are network weight, meet W'=WT, b is bias vector, σ ( x ) = 1 ( 1 + e ( - x ) ) For activation function;θ '={ W', b'};
Step S12: learn a function and make output x'=gθ'(fθ(x)) and described input picture x be similar to, and definition loss function be L (x, x')=(x-x')2, and learnt by minimum losses function: θ * , θ ′ * = arg min θ , θ ′ 1 N Σ i = 1 N L ( x i , g θ ′ ( f θ ( x i ) ) ) ;
Step S13: note be used for image labeling SAE model have L layer, and with sequence number l ∈ 1 ..., L} represents; Use hlRepresent the output vector of l layer, WlAnd blRepresent the network weight of l layer and it is biased, by compiler to { Wl,bl, l ∈ 1 ..., L} is by layer pre-training;
Step S14: perform feed forward process and use Back Propagation Algorithm tuning; The feedforward operation of described stack compiler model is expressed as: hl+1=σ (Wl+1hl+bl+1), l ∈ 0 ..., L-1};The Back Propagation Algorithm tuning of described stack compiler model is expressed as:Wherein,It is the synthesis function of multiple compiler model, and θlFor parameter { Wl,bl, l ∈ 1 ..., L}, loss function is L (x, y)=(x-y)2;
Step S15: definition bound variable, order vector C=(c1,c2,...,cM),Represent keyword yiThe number of times occurred in training set P,Represent the average occurrence number of keyword; Vector C=(c1,c2,...,cM) represent the i-th width image xiEach keyword Yi j, j ∈ 1,2 ..., the number of times Y that M} occurs in training setC,i=C*Yi; Thus obtain at image xiThe keyword that middle occurrence number is minimum is Λ x i = arg m i n j ( Y C , i j ) ;
Step S16: definition Φ (x) function, learning sample is judged by described stack compiler model in the training process, if input picture x comprises the number of low frequency tags more than k, then this input picture x is added suitable noise; Definition Γ (x) function, increases training strength to input picture x, if the occurrence number that this input picture x is comprised label (is generally got lower than predetermined threshold value), then increase training number of times, wherein, function gamma (x) is:
&Gamma; ( x i ) = &alpha; &CenterDot; &Pi; &Lambda; x i = &alpha; &CenterDot; 1 M &Sigma; j = 1 M c j arg min j ( Y C , i j ) , &Lambda; x i < = &beta; &CenterDot; &Pi; 1 , O t h e r s ,
Wherein, α and β is constant coefficient, and β is for determining the sample needing to increase the weight of training, and α is for controlling the training strength needing to increase the weight of the sample of training;
Function phi (x) is:
&Phi; ( x i ) = &chi; &CenterDot; ( 1 d &Sigma; j = 1 d x i j ) &CenterDot; R a n ( &CenterDot; ) , &Lambda; x i < = &beta; &CenterDot; &Pi; x i , O t h e r s ,
Wherein, χ is constant coefficient, and for the intensity that control noises adds, d is image xiThe dimension degree of feature,Represent image xiThe value of jth dimension degree, Ran () is random number functions;
Step S17: adjusting and optimizing equation is balanced stack compiler model; Will &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N L ( x i , g &theta; &prime; ( f &theta; ( x i ) ) ) It is adjusted to &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N { 1 &Gamma; ( x i ) &Sigma; j = 1 &Gamma; ( x i ) L ( &Phi; ( x i ) , g &theta; &prime; ( f &theta; ( &Phi; ( x i ) ) ) ) } , Will &theta; * = arg m i n &theta; &Sigma; i = 1 N L ( F &theta; ( x i ) , Y i ) It is adjusted to &theta; * = arg m i n &theta; &Sigma; i = 1 N &Sigma; j = 1 &Gamma; ( x i ) L ( F &theta; ( &Phi; ( x i ) ) , Y i ) , After model training is good, the output of last layer of described balance stack compiler model is the prediction distribution D of the keyword of predicted picture.
In an embodiment of the present invention, in described step S2, also comprise the steps:
Step S21: grouping training quantum balancing stack compiler model, makes an uproar balance stack compiler model the different group of model split by different adding, divides submodel according to different hidden neuron numbers in each groupT represents that balance stack compiler model adopts t kind to add the mode of making an uproar, and k represents the hidden neuron number that the sub-B-SAE model of kth is arranged;
Step S22: initial weight is set and calculates quantum balancing stack compiler model model classification specific inaccuracy, training data is arranged weights as follows:
W=(w11,...,w1i,...,w1N), w 1 i = 1 N , i = 1 , 2 , ... , N ,
CalculateClassification specific inaccuracy: e k t = &Sigma; i = 1 N w t i &CenterDot; S g n ( B - SAE k t ( x i ) &NotEqual; Y i ) , Wherein, S g n ( x ) = 1 , x = t r u e 0 , x = f a l s e , B - SAE k t ( x i ) &NotEqual; Y i Represent: assume image xiTrue tag collection YiComprise c keyword, and pass through modelPrediction obtains label collection Yi *Number be also c, if Yi=Yi *, thenFor false, otherwise it is true;
Step S23: calculated equilibrium stack compiler model weight, and upgrade training data weights; According to all sons in groupThe classification specific inaccuracy of model, it is possible to obtain the Model B that this component class specific inaccuracy is minimum-SAEtAnd the classification specific inaccuracy e of correspondencet, calculate B-SAEtWeight:After the model training of t group is complete, upgrading the weights of training data, to obtain the weight of next group model, the mode upgrading training data weights is as follows:
Wt+1={ wt+1,1,...,wt+1,i,...,wt+1,N}, w t + 1 , i = w t i &CenterDot; e ( - &alpha; t &CenterDot; Y i &CenterDot; B - SAE t ( x i ) ) &Sigma; i = 1 N w t i &CenterDot; e ( - &alpha; t &CenterDot; Y i &CenterDot; B - SAE t ( x i ) ) , i = 1 , 2 , ... , N ;
Step S24: the cumulative quantum balancing stack compiler model of weighting is enhanced and balances stack compiler model, after all having trained when all groups, namely obtains keyword prediction distribution: D = &Sigma; t = 1 T &alpha; t &CenterDot; B - SAE t ( x ) .
Compared to prior art, the present invention has following useful effect: a kind of automatic image marking method based on enhancement type stack compiler proposed by the invention, utilize the feature representation ability that SAE deep neural network is powerful, based on to automatic image annotation, the understanding of many labelings and stack compiler, propose for image data set label uneven, it is difficult to effectively train the automatic image marking method of the enhancement type stack compiler of the problems such as big view data, finally obtain a kind of deep layer, complicated automatic image annotation model, particularly a kind of automatic image marking method based on enhancement type stack compiler.The method is simple, it is achieved flexibly, practicality is stronger.
Accompanying drawing explanation
Fig. 1 is the schema of automatic image marking method based on enhancement type stack compiler in the present invention.
Embodiment
Below in conjunction with accompanying drawing, the technical scheme of the present invention is specifically described.
The present invention proposes a kind of automatic image marking method based on enhancement type stack compiler, first tradition SAE (StackedAuto-Encoder in learning for the degree of depth, SAE) model is difficult to effectively training the problem of inclined data set, a kind of balance stack compiler (BalanceStackedAuto-Encoder promoting low frequency tags accuracy rate is proposed, B-SAE), improve the mark effect of low frequency tags preferably. Then cause marking the problem that bigger change easily occurs effect with parameter change for single B-SAE model instability (model is complicated, parameter is more), a kind of enhancing for image labeling task balance stack compiler (EnhancedBalanceStackedAuto-Encoder is proposed, EB-SAE), trained according to the order of sequence by grouping, weighting adds up each group of optimum B-SAE submodel, obtains stable annotation results. Concrete steps are as follows:
S1: first build SAE model, then differentiate weak exemplar on SAE model and add the training number of times that noise increases weak exemplar, build B-SAE model with this;
S2: utilize step S1 to obtain B-SAE model to the training image grouping sub-B-SAE model of training, the weighting each group of optimum submodel that add up obtains EB-SAE model, as shown in Figure 1;
S3: unknown images is input to EB-SAE model that step S2 obtains and exports annotation results.
Further, in the present embodiment, realize building B-SAE model according to following step in step sl:
Step S11: definition encoder fθWith demoder gθ', encoder fθInput picture x is converted to hidden layer and expresses h, demoder gθ'H is reconstructed into the vector x consistent with x dimension '. fθX ()=σ (W x+b), wherein, θ={ W, b}, W are network weight, meet W'=WT, b is bias vector, For activation function.Wherein, θ '={ W', b'}.
Step S12: learn a function and make output x'=gθ'(fθ(x)) and x be similar to, definition loss function be L (x, x')=(x-x')2, then this model learns by minimum losses function: &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N L ( x i , g &theta; &prime; ( f &theta; ( x i ) ) ) .
Step S13: perform feed forward process and also use Back Propagation Algorithm tuning, it is assumed that the SAE model for image labeling has L layer, with sequence number l ∈ 1 ..., L} represents. Use hlRepresent the output vector (h of l layer0=x represents input, hLRepresent and export). WlAnd blRepresent the network weight of l layer and it is biased. According to noted earlier, { Wl,bl, l ∈ 1 ..., L} uses AE by layer pre-training. The feedforward operation of SAE can be expressed as: hl+1=σ (Wl+1hl+bl+1), l ∈ 0 ..., and L-1}, whole model Back Propagation Algorithm tuning: &theta; * = arg m i n &theta; &Sigma; i = 1 N L ( F &theta; ( x i ) , Y i ) , Wherein, F &theta; ( x ) = &sigma; &theta; L ( ... ( &sigma; &theta; 1 ( x ) ) ) It is the synthesis function of multiple AE model, and θlFor parameter { Wl,bl, l ∈ 1 ..., L}, loss function is defined as L (x, y)=(x-y)2
Step S14: definition bound variable, order vector C=(c1,c2,...,cM),Represent keyword yiThe number of times occurred in training set P,Represent the average occurrence number of keyword. Like this, we can obtain a vector, represents the i-th width image xiEach keyword Yi j, j ∈ 1,2 ..., the number of times Y that M} occurs in training setC,i=C*Yi(* represents that two vectorial corresponding points are multiplied and obtains a new vector).Thus obtain at image xiThe keyword that middle occurrence number is minimum is
If step S15: definition Φ (x) function, allows model be judged by learning sample in the training process, sample x, is also input picture x, and this sample more than k, is then added suitable noise by the number comprising low frequency tags. Definition Γ (x) function, increases training strength to sample x, if this sample is comprised the occurrence number of label lower than certain threshold value, then increases its training number of times, and in the present embodiment, this threshold value is generally got &Pi; = 1 M &Sigma; j = 1 M c j .
&Gamma; ( x i ) = &alpha; &CenterDot; &Pi; &Lambda; x i = &alpha; &CenterDot; 1 M &Sigma; j = 1 M c j arg min j ( Y C , i j ) , &Lambda; x i < = &beta; &CenterDot; &Pi; 1 , O t h e r s ,
Wherein, α and β is constant coefficient, and β is used for determining which sample needs to increase the weight of training, and α is for controlling the training strength needing to increase the weight of the sample of training.
&Phi; ( x i ) = &chi; &CenterDot; ( 1 d &Sigma; j = 1 d x i j ) &CenterDot; R a n ( &CenterDot; ) , &Lambda; x i < = &beta; &CenterDot; &Pi; x i , O t h e r s ,
Wherein, χ is constant coefficient, and for the intensity that control noises adds, d is image xiThe dimension degree of feature,Represent image xiThe value of jth dimension degree, Ran () is random number functions, and such as, the random function of Ran () desirable obedience (0,1) Gaussian distribution or value are the equally distributed random function of 0 to 1.
Step S16: adjusting and optimizing equation obtains B-SAE model,It is adjusted to &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N { 1 &Gamma; ( x i ) &Sigma; j = 1 &Gamma; ( x i ) L ( &Phi; ( x i ) , g &theta; &prime; ( f &theta; ( &Phi; ( x i ) ) ) ) } , &theta; * = arg m i n &theta; &Sigma; i = 1 N L ( F &theta; ( x i ) , Y i ) It is adjusted to &theta; * = arg m i n &theta; &Sigma; i = 1 N &Sigma; j = 1 &Gamma; ( x i ) L ( F &theta; ( &Phi; ( x i ) ) , Y i ) , After model training is good, the output of last layer of B-SAE is the prediction distribution D of the keyword of predicted picture.
Further, in the present embodiment, step S2 realizes training EB-SAE model according to following step:
Step S21: the grouping sub-B-SAE model of training, B-SAE model is made an uproar the different group of model split by different adding, and divides submodel according to different hidden neuron numbers in each groupT represents that Model B-SAE adopts t kind to add the mode of making an uproar, and k represents the hidden neuron number that the sub-B-SAE model of kth is arranged.
Step S22: initial weight is set and calculates sub-B-SAE model classification specific inaccuracy, training data is arranged weights as follows:
W=(w11,...,w1i,...,w1N), w 1 i = 1 N , i = 1 , 2 , ... , N ,
Like this,Classification specific inaccuracy can calculate like this: e k t = &Sigma; i = 1 N w t i &CenterDot; S g n ( B - SAE k t ( x i ) &NotEqual; Y i ) , Wherein, S g n ( x ) = 1 , x = t r u e 0 , x = f a l s e , B - SAE k t ( x i ) &NotEqual; Y i The meaning represented is, it is assumed that image xiTrue tag collection YiComprise c keyword, and pass through modelPrediction obtains label collection Yi *Number be also c, if Yi=Yi *, thenFor false, otherwise it is true.
Step S23: calculate B-SAE model weight and upgrade training data weights, according to all sons in groupThe classification specific inaccuracy of model, it is possible to obtain the Model B that this component class specific inaccuracy is minimum-SAEtWith the classification specific inaccuracy e of correspondencet, B-SAEtWeight can calculate like this:After the model training of t group is complete, it is necessary to upgrade the weights of training data, better to obtain the weight of next group model, the mode upgrading training data weights is as follows:
Wt+1={ wt+1,1,...,wt+1,i,...,wt+1,N}, w t + 1 , i = w t i &CenterDot; e ( - &alpha; t &CenterDot; Y i &CenterDot; B - SAE t ( x i ) ) &Sigma; i = 1 N w t i &CenterDot; e ( - &alpha; t &CenterDot; Y i &CenterDot; B - SAE t ( x i ) ) , i = 1 , 2 , ... , N ,
Step S24: the weighting sub-B-SAE model that adds up obtains EB-SAE model, after all having trained when all groups, so that it may to obtain keyword prediction distribution:
Being more than the better embodiment of the present invention, all changes done according to technical solution of the present invention, when the function produced does not exceed the scope of technical solution of the present invention, all belong to protection scope of the present invention.

Claims (3)

1. the automatic image marking method based on enhancement type stack compiler, it is characterised in that, realize in accordance with the following steps:
Step S1: build stack compiler model, differentiate weak exemplar on described stack compiler model, and add noise to increase the training number of times of described weak exemplar, and then build balance stack compiler model;
Step S2: by described balance stack compiler model to training image grouping training quantum balancing stack compiler model, the weighting each group of optimum submodel that add up is enhanced balance stack compiler model;
Step S3: unknown images is input to described enhancing balance stack compiler model and exports annotation results.
2. a kind of automatic image marking method based on enhancement type stack compiler according to claim 1, it is characterised in that, in described step S1, also comprise the steps:
Step S11: definition encoder fθWith demoder gθ'; Described encoder fθInput picture x is converted to hidden layer and expresses h, demoder gθ'By described hidden layer express h be reconstructed into the vector x consistent with described input picture x dimension '; Wherein, fθX ()=σ (W x+b), θ={ W, b}, W are network weight, meet W'=WT, b is bias vector, &sigma; ( x ) = 1 ( 1 + e ( - x ) ) For activation function;θ '=W ', b ' };
Step S12: learn a function and make output x'=gθ'(fθ(x)) and described input picture x be similar to, and definition loss function be L (x, x')=(x-x')2, and learnt by minimum losses function: &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N L ( x i , g &theta; &prime; ( f &theta; ( x i ) ) ) ;
Step S13: note be used for image labeling SAE model have L layer, and with sequence number l ∈ 1 ..., L} represents; Use hlRepresent the output vector of l layer, WlAnd blRepresent the network weight of l layer and it is biased, by compiler to { Wl,bl, l ∈ 1 ..., L} is by layer pre-training;
Step S14: perform feed forward process and use Back Propagation Algorithm tuning; The feedforward operation of described stack compiler model is expressed as: hl+1=σ (Wl+1hl+bl+1), l ∈ 0 ..., L-1}; The Back Propagation Algorithm tuning of described stack compiler model is expressed as:Wherein,It is the synthesis function of multiple compiler model, and θlFor parameter { Wl,bl, l ∈ 1 ..., L}, loss function is L (x, y)=(x-y)2;
Step S15: definition bound variable, order vector C=(c1,c2,...,cM),Represent keyword yiThe number of times occurred in training set P,Represent the average occurrence number of keyword; Vector C=(c1,c2,...,cM) represent the i-th width image xiEach keyword Yi j, j ∈ 1,2 ..., the number of times Y that M} occurs in training setC,i=C*Yi; Thus obtain at image xiThe keyword that middle occurrence number is minimum is &Lambda; x i = arg m i n j ( Y C , i j ) ;
Step S16: definition Φ (x) function, learning sample is judged by described stack compiler model in the training process, if input picture x comprises the number of low frequency tags more than k, then this input picture x is added suitable noise; Definition Γ (x) function, increases training strength to input picture x, if this input picture x is comprised the occurrence number of label lower than predetermined threshold value, then increases training number of times, and wherein, function gamma (x) is:
&Gamma; ( x i ) = &alpha; &CenterDot; &Pi; &Lambda; x i = &alpha; &CenterDot; 1 M &Sigma; j = 1 M c j arg min j ( Y C , i j ) , &Lambda; x i < = &beta; &CenterDot; &Pi; 1 , O t h e r s ,
Wherein, α and β is constant coefficient, and β is for determining the sample needing to increase the weight of training, and α is for controlling the training strength needing to increase the weight of the sample of training;
Function phi (x) is:
&Phi; ( x i ) = &chi; &CenterDot; ( 1 d &Sigma; j = 1 d x i j ) &CenterDot; R a n ( &CenterDot; ) , &Lambda; x i < = &beta; &CenterDot; &Pi; x i , O t h e r s ,
Wherein, χ is constant coefficient, and for the intensity that control noises adds, d is image xiThe dimension degree of feature,Represent image xiThe value of jth dimension degree, Ran () is random number functions;
Step S17: adjusting and optimizing equation is balanced stack compiler model; Will &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N L ( x i , g &theta; &prime; ( f &theta; ( x i ) ) ) It is adjusted to &theta; * , &theta; &prime; * = arg min &theta; , &theta; &prime; 1 N &Sigma; i = 1 N { 1 &Gamma; ( x i ) &Sigma; j = 1 &Gamma; ( x i ) L ( &Phi; ( x i ) , g &theta; &prime; ( f &theta; ( &Phi; ( x i ) ) ) ) } , Will &theta; * = arg m i n &theta; &Sigma; i = 1 N L ( F &theta; ( x i ) , Y i ) It is adjusted to &theta; * = arg min &theta; &Sigma; i = 1 N &Sigma; j = 1 &Gamma; ( x i ) L ( F &theta; ( &Phi; ( x i ) ) , Y i ) , After model training is good, the output of last layer of described balance stack compiler model is the prediction distribution D of the keyword of predicted picture.
3. a kind of automatic image marking method based on enhancement type stack compiler according to claim 1, it is characterised in that, in described step S2, also comprise the steps:
Step S21: grouping training quantum balancing stack compiler model, makes an uproar balance stack compiler model the different group of model split by different adding, divides submodel according to different hidden neuron numbers in each groupT represents that balance stack compiler model adopts t kind to add the mode of making an uproar, and k represents the hidden neuron number that the sub-B-SAE model of kth is arranged;
Step S22: initial weight is set and calculates quantum balancing stack compiler model model classification specific inaccuracy, training data is arranged weights as follows:
W = ( w 11 , ... , w 1 i , ... , w 1 N ) , w 1 i = 1 N , i = 1 , 2 , ... , N ,
CalculateClassification specific inaccuracy: e k t = &Sigma; i = 1 N w t i &CenterDot; S g n ( B - SAE k t ( x i ) &NotEqual; Y i ) , Wherein, S g n ( x ) = 1 , x = t r u e 0 , x = f a l s e , Represent: assume image xiTrue tag collection YiComprise c keyword, and pass through modelPrediction obtains label collection Yi *Number be also c, if Yi=Yi *, thenFor false, otherwise it is true;
Step S23: calculated equilibrium stack compiler model weight, and upgrade training data weights; According to all sons in groupThe classification specific inaccuracy of model, it is possible to obtain the Model B-SAE that this component class specific inaccuracy is minimumtAnd the classification specific inaccuracy e of correspondencet, calculate B-SAEtWeight:After the model training of t group is complete, upgrading the weights of training data, to obtain the weight of next group model, the mode upgrading training data weights is as follows:
W t + 1 = { w t + 1 , 1 , ... , w t + 1 , i , ... , w t + 1 , N } , w t + 1 , i = w t i &CenterDot; e ( - &alpha; t &CenterDot; Y i &CenterDot; B - SAE t ( x i ) ) &Sigma; i = 1 N w t i &CenterDot; e ( - &alpha; t &CenterDot; Y i &CenterDot; B - SAE t ( x i ) ) , i = 1 , 2 , ... , N , ;
Step S24: the cumulative quantum balancing stack compiler model of weighting is enhanced and balances stack compiler model, after all having trained when all groups, namely obtains keyword prediction distribution: D = &Sigma; t = 1 T &alpha; t &CenterDot; B _ SAE t ( x ) .
CN201610035975.7A 2016-01-20 2016-01-20 A kind of automatic image marking method based on enhanced stack autocoder Active CN105678340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610035975.7A CN105678340B (en) 2016-01-20 2016-01-20 A kind of automatic image marking method based on enhanced stack autocoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610035975.7A CN105678340B (en) 2016-01-20 2016-01-20 A kind of automatic image marking method based on enhanced stack autocoder

Publications (2)

Publication Number Publication Date
CN105678340A true CN105678340A (en) 2016-06-15
CN105678340B CN105678340B (en) 2018-12-25

Family

ID=56301673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610035975.7A Active CN105678340B (en) 2016-01-20 2016-01-20 A kind of automatic image marking method based on enhanced stack autocoder

Country Status (1)

Country Link
CN (1) CN105678340B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250915A (en) * 2016-07-22 2016-12-21 福州大学 A kind of automatic image marking method merging depth characteristic and semantic neighborhood
CN109271539A (en) * 2018-08-31 2019-01-25 华中科技大学 A kind of image automatic annotation method and device based on deep learning
CN111914617A (en) * 2020-06-10 2020-11-10 华南理工大学 Face attribute editing method based on balanced stack type generation countermeasure network
CN114035098A (en) * 2021-12-14 2022-02-11 北京航空航天大学 Lithium battery health state prediction method integrating future working condition information and historical state information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158830A1 (en) * 2000-04-11 2003-08-21 Adam Kowalczyk Gradient based training method for a support vector machine
CN104156736A (en) * 2014-09-05 2014-11-19 西安电子科技大学 Polarized SAR image classification method on basis of SAE and IDL
CN104166859A (en) * 2014-08-13 2014-11-26 西安电子科技大学 Polarization SAR image classification based on SSAE and FSALS-SVM
CN104679863A (en) * 2015-02-28 2015-06-03 武汉烽火众智数字技术有限责任公司 Method and system for searching images by images based on deep learning
CN105184303A (en) * 2015-04-23 2015-12-23 南京邮电大学 Image marking method based on multi-mode deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158830A1 (en) * 2000-04-11 2003-08-21 Adam Kowalczyk Gradient based training method for a support vector machine
CN104166859A (en) * 2014-08-13 2014-11-26 西安电子科技大学 Polarization SAR image classification based on SSAE and FSALS-SVM
CN104156736A (en) * 2014-09-05 2014-11-19 西安电子科技大学 Polarized SAR image classification method on basis of SAE and IDL
CN104679863A (en) * 2015-02-28 2015-06-03 武汉烽火众智数字技术有限责任公司 Method and system for searching images by images based on deep learning
CN105184303A (en) * 2015-04-23 2015-12-23 南京邮电大学 Image marking method based on multi-mode deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250915A (en) * 2016-07-22 2016-12-21 福州大学 A kind of automatic image marking method merging depth characteristic and semantic neighborhood
CN106250915B (en) * 2016-07-22 2019-08-09 福州大学 A kind of automatic image marking method of fusion depth characteristic and semantic neighborhood
CN109271539A (en) * 2018-08-31 2019-01-25 华中科技大学 A kind of image automatic annotation method and device based on deep learning
CN111914617A (en) * 2020-06-10 2020-11-10 华南理工大学 Face attribute editing method based on balanced stack type generation countermeasure network
CN111914617B (en) * 2020-06-10 2024-05-07 华南理工大学 Face attribute editing method based on balanced stack type generation type countermeasure network
CN114035098A (en) * 2021-12-14 2022-02-11 北京航空航天大学 Lithium battery health state prediction method integrating future working condition information and historical state information

Also Published As

Publication number Publication date
CN105678340B (en) 2018-12-25

Similar Documents

Publication Publication Date Title
US11868724B2 (en) Generating author vectors
US20180293313A1 (en) Video content retrieval system
Tur et al. Combining active and semi-supervised learning for spoken language understanding
CN111563143B (en) Method and device for determining new words
CN111460157B (en) Cyclic convolution multitask learning method for multi-field text classification
CN110297888B (en) Domain classification method based on prefix tree and cyclic neural network
CN104572631B (en) The training method and system of a kind of language model
CN105678340A (en) Automatic image marking method based on enhanced stack type automatic encoder
CN109446420B (en) Cross-domain collaborative filtering method and system
CN106815310A (en) A kind of hierarchy clustering method and system to magnanimity document sets
CN110825850B (en) Natural language theme classification method and device
Chen et al. Progressive EM for latent tree models and hierarchical topic detection
CN110019779B (en) Text classification method, model training method and device
CN107526805B (en) ML-kNN multi-tag Chinese text classification method based on weight
CN113987187A (en) Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN105701516B (en) A kind of automatic image marking method differentiated based on attribute
CN105701225A (en) Cross-media search method based on unification association supergraph protocol
Aziguli et al. A robust text classifier based on denoising deep neural network in the analysis of big data
Guan et al. Hierarchical neural network for online news popularity prediction
CN109670169B (en) Deep learning emotion classification method based on feature extraction
CN109190471B (en) Attention model method for video monitoring pedestrian search based on natural language description
Tian et al. Deep incremental hashing for semantic image retrieval with concept drift
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN113204975A (en) Sensitive character wind identification method based on remote supervision
Pathuri et al. Feature based sentimental analysis for prediction of mobile reviews using hybrid bag-boost algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant