CN103116893A

CN103116893A - Digital image labeling method based on multi-exampling multi-marking learning

Info

Publication number: CN103116893A
Application number: CN201310084956XA
Authority: CN
Inventors: 周志华; 黄圣君
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2013-03-15
Filing date: 2013-03-15
Publication date: 2013-05-22
Anticipated expiration: 2033-03-15
Also published as: CN103116893B

Abstract

The invention discloses a digital image labeling method based on multi-exampling multi-marking learning and aims at solving the technical problems that digital images frequently have complex semantics and cannot be effectively expressed or learned through the technical method based on single exampling. The digital image labeling method includes: initializing a marking model; randomly selecting one image and a related mark of the image from a data set, and determining a representative exampling of the mark; obtaining a non-related mark arranged in the front of the related mark through random sampling, and determining a representative exampling of the non-related mark; and performing gradient descent aiming at a triad formed by the image, the related mark and the non-related mark to update the model. According to the digital image labeling method, online learning is performed by aid of a random gradient descent algorithm, time and internal memory expanse are greatly reduced, marking accuracy is ensured, and marking efficiency is improved.

Description

Digital picture mask method based on many examples Multi-label learning

Technical field

The present invention relates to digital picture label technology field, particularly relate to a kind of digital picture mask method based on many examples Multi-label learning.

Background technology

Popular along with the universal and all kinds of social network sites of digital product has the digital picture of magnanimity to produce and propagate every day.Provide relevant service on so large-scale view data, one most crucial is also that the most difficult task is to allow the semanteme of computer understanding image, and image labeling is gordian technique wherein.

The visual signature that the task of automatic image annotation device is based on digital picture comes the semantic marker of predicted picture.Concrete, at first annotation equipment can extract visual signature and represent these images from digital picture, then based on these character representations, train a marking model from the sets of image data of existing semantic marker.After the character representation input marking model that will not have markd digital picture, model just can dope their semantic marker.

Present automatic image annotation technology often is shown as image table single example.But image often has complicated semanteme, comprises a plurality of object entities, and the expression of so single example can cause information loss, semanteme that can't the accurate description image, thus can't the Accurate Prediction image tagged.More the effective ways input that is based on many examples multiple labeling machine learning (be called for short MIML) represents mode, and the set that forms with a plurality of examples of features represents piece image, wherein often fairly simple entity of correspondence and semanteme of each example.Minority is arranged at present based on the automatic image annotation technology of MIML input expression, but their model complexity increases greatly and sharply along with the change of representation space, causes very poor efficiency of these technology, can't be applied on large-scale image labeling task.Therefore, a kind ofly can demand urgently proposing based on the efficient automatic image annotation technology of MIML input expression.

Summary of the invention

Technical matters: often have complicated semantic for digital picture, and can't carry out effective expression and study to it based on the technology of single example, and technical matters such as mark inefficiency etc., the present invention proposes a kind of digital picture mask method based on many examples Multi-label learning.

Technical scheme: the digital picture mask method based on many examples Multi-label learning, comprise the steps,

(1) initialization marking model;

(2) select at random a mark of correlation of piece image and this image from data acquisition, and determine the example that represents of this mark;

(3) obtain a uncorrelated mark that comes the mark of correlation front by stochastic sampling, and determine the example that represents of this uncorrelated mark;

(4) for image, the tlv triple that mark of correlation and uncorrelated mark consist of is carried out the Gradient Descent Renewal model;

(5) judge whether this model reaches requirement, if return to (2); Otherwise finish and export marking model.

The present invention adopts technique scheme, has following beneficial effect: the present invention is based on many examples Multi-label learning framework, each mark to image, the example that represents of selecting a "current" model to approve most from many examples multiple labeling machine learning input expression represents this image, thereby can take full advantage of the more information that the machine learning of many examples multiple labeling brings.Simultaneously, utilize stochastic gradient descent algorithm to carry out on-line study, greatly reduce time and memory cost, thereby both guaranteed the degree of accuracy of mark, improved again mark efficient.

Description of drawings

Fig. 1 is the process flow diagram of the training marking model of the embodiment of the present invention;

Fig. 2 is the method flow diagram of the embodiment of the present invention;

Fig. 3 is the sampled images of the embodiment of the present invention and the process flow diagram that represents example of determining its mark of correlation;

Fig. 4 is the uncorrelated mark of the sampling of the embodiment of the present invention and determines that it represents the process flow diagram of example;

Fig. 5 is the process flow diagram of the renewal marking model of the embodiment of the present invention.

Embodiment

Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment only is used for explanation the present invention and is not used in and limits the scope of the invention, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.

Fig. 1 is the process flow diagram of the digital picture automatic marking device training marking model of the embodiment of the present invention.Suppose that the training image data acquisition is comprised of N width image, every width image all marks.Device extracts the feature that meets the input of many examples Multi-label learning to the image in the data set, each width image is by a stack features vector representation, and each proper vector is called an example.As preferably, feature extraction can use the classical way in the machine learning textbook to generate applicable characteristics of image, for example first carries out image segmentation, then each image block is extracted the features such as color, texture, shape.After training out marking model with this data acquisition, device will carry out same feature extraction to the unmarked image of input, and predict with marking model, and the output mark of correlation is as annotation results.

Figure 2 shows that the method flow diagram of the embodiment of the present invention.Step S20 carries out initialization to marking model, mainly comprises two matrix W and V initialize.Suppose that the characteristics of image dimension that extracts is d, always have L possible mark, W is the matrix of a L * 100 sizes, and V is the matrix of a d * 100 sizes.The numerical value 100 here can change other value into according to user's needs, usually selects larger value can improve the mark accuracy, and less value can pick up speed.W and V are endowed random value and guarantee that the average of their each row is 0, and standard deviation is

Step S21 is stochastic sampling piece image X from the training data set, supposes that its mark of correlation set is Y, and uncorrelated tag set is

Device is random from Y selects a mark, is assumed to be y, and selects from set X for y and represent example x, and detailed process is seen Fig. 3.Step S22 is from the uncorrelated tag set of this image

In one by one stochastic sampling mark, until run into the uncorrelated mark that a quilt comes the y front And determine to represent example for it

Detailed process is seen Fig. 4.Step S23 carries out the Gradient Descent Renewal model, and detailed process is seen Fig. 5.Whether step S24 judgment models reaches requirement, is to finish training process, otherwise gets back to step S21.Here whether judgment models reaches standard and can adopt in machine learning or pattern-recognition textbook method commonly used, reaches the number of times of user's appointment such as iteration wheel number.

Figure 3 shows that the detailed process of step S21 in Fig. 2.Step S210 is for beginning action.In step S211, randomly draw piece image X in the training image data acquisition, suppose that it comprises n example { x altogether ₁..., x _n, the mark of correlation set Y of image comprises m mark { y altogether ₁... y _m.Step S212 randomly draws a mark y from Y.The initial i of step S213 is 1, begins the sample count in X.Whether step S214 judges i greater than n, and if greater than example in X traveled through, jump to step S217, otherwise enter in next step S215.Step S215 is to example x _iCalculate the value shown in following formula:

f _y(x _i)＝W _yV ^Tx _i (1)

W wherein _yThe y of expression W is capable, f _y(x _i) can be understood as example x _iDegree of correlation with mark y.Step S216 will count i and add 1, then return to step S214.Step S217 is all f that calculate relatively _y(x _i) value, and the example of selective value maximum example is assumed to be x as representing, corresponding f _y(x) can be understood as the degree of correlation of this width image and mark y.Whole process ends at step S218.

Figure 4 shows that the detailed process of step S23 in Fig. 2.Suppose the uncorrelated tag set of image

Comprise altogether t mark

Step S230 is origination action.Step S231 is mark y calculated value f according to formula (1) _y(x), we suppose that x is the example that represents of y here.Step S232 initialization i is 1, begins counting, and introduces an indieating variable Q, is initialized as 0.Step S233 judges whether that t uncorrelated mark all traveled through, finishes if i＞t jumps to step S238, otherwise enters next step.Step S234 is mark according to formula (1)

Calculate

Here we suppose

For

Represent example.Step S235 relatively two mark y and

Order, if

Enter step S236, will count i and add and return in the lump step S234; If incoherent mark

Come the front of mark of correlation y, namely

, jump to step S237, be i with the Q assignment, be illustrated in the uncorrelated mark that a violation order has been found in the i time sampling, and the uncorrelated mark that will find and represent that example is recorded as respectively

With

, then enter step S238 and finish.

Figure 5 shows that the detailed process by the Gradient Descent Renewal model.Step S240 is origination action.Whether step S241 judgement has sampled the mark of violation order in step S23, namely whether Q is greater than 0.If Q=0, expression does not sample the mark of violation order, directly enters step S245 and finishes, otherwise enter step S242, estimates the rank r of mark of correlation y according to the Q value, and specific formula for calculation is as follows:

Wherein t is

The number of middle mark,

Symbolic representation rounds downwards.Step S243 is a with the y line item of model variable W, the

Line item is b.It is capable that step S244 upgrades the y of W according to formula (3) respectively, upgrades the of W by formula (4)

OK, and by formula (5) upgrade V.

W_{y} = a + γ Σ_{i = 1}^{r} \frac{1}{i} V^{T} x - - - (3)

W_{\overset{&OverBar;}{y}} = b - γ Σ_{i = 1}^{r} \frac{1}{i} V^{T} \overset{&OverBar;}{x} - - - (4)

V = V - γ Σ_{i = 1}^{r} \frac{1}{i} (\overset{&OverBar;}{x} b - xa) - - - (5)

Wherein γ is the step-length of stochastic gradient descent algorithm, and its value can be according to method setting commonly used in machine learning or pattern-recognition textbook.After having upgraded model variable, whole process ends at step S245.

Claims

1. based on the digital picture mask method of many examples Multi-label learning, it is characterized in that, comprise the steps,

(1) initialization marking model;

(4) for selected image, the tlv triple Renewal model that mark of correlation and uncorrelated mark consist of;

2. the digital picture mask method based on many examples Multi-label learning as claimed in claim 1, it is characterized in that, the method of described training marking model is, marking image for input, adopt data image automatic marking device the image in the training image data acquisition to be extracted the feature that meets the input of many examples Multi-label learning, each width image is by a stack features vector representation, and each proper vector is called an example; After training out marking model, and predict with marking model, the output mark of correlation is as annotation results; Described device carries out same feature extraction and predictive marker process to the unmarked image of input.

3. the digital picture mask method based on many examples Multi-label learning as claimed in claim 2, is characterized in that, under many examples Multi-label learning input expression, the example that represents of each mark is determined by current marking model.

4. the digital picture mask method based on many examples Multi-label learning as claimed in claim 1, is characterized in that, the tlv triple Renewal model that utilizes stochastic gradient descent algorithm to consist of for image, mark of correlation and uncorrelated mark.

5. the digital picture mask method based on many examples Multi-label learning as claimed in claim 1, is characterized in that, whether judgment models reach the required standards comprises that iteration wheel number reaches the number of times of user's appointment.