CN104216882A - File labeling method and device - Google Patents

File labeling method and device Download PDF

Info

Publication number
CN104216882A
CN104216882A CN201310206739.3A CN201310206739A CN104216882A CN 104216882 A CN104216882 A CN 104216882A CN 201310206739 A CN201310206739 A CN 201310206739A CN 104216882 A CN104216882 A CN 104216882A
Authority
CN
China
Prior art keywords
file
vector
marked
label
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310206739.3A
Other languages
Chinese (zh)
Other versions
CN104216882B (en
Inventor
邓志鸿
张�杰
赖博彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Huawei Technologies Co Ltd
Original Assignee
Peking University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Huawei Technologies Co Ltd filed Critical Peking University
Priority to CN201310206739.3A priority Critical patent/CN104216882B/en
Publication of CN104216882A publication Critical patent/CN104216882A/en
Application granted granted Critical
Publication of CN104216882B publication Critical patent/CN104216882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file labeling method. The file labeling method comprises the steps that a feature vector of a file to be labeled is obtained according to the file to be labeled; a label indicating vector of the file to be labeled is obtained according to the feature vector of the file to be labeled and the transformational relation between the feature vector of the file to be labeled and the label indicating vector of the file to be labeled; a classification indicating vector of the file to be labeled is obtained according to the label indicating vector of the file to be labeled and the transformational relation of the label indicating vector of the file to be labeled and the classification indicating vector of the file to be labeled; the file to be labeled is labeled according to the label indicating vector of the file to be labeled and the classification indicating vector of the file to be labeled. Due to the fact that the classification indicating vector normally has quite good normalization and low noise, by means of the mode that the file is labeled by combining the label indicating vector and the classification indicating vector, the adverse effect on automatic labeling by noise existing during the user defined data labeling can be reduced, and the accurate rate and the hit rate of automatic labeling can be improved.

Description

Files-designated injecting method and device
Technical field
The present invention relates to document classification field, be specifically related to a kind of files-designated injecting method and device.
Background technology
At present, image labeling can be divided into manually mark and automatic marking two class.Wherein, although manually image can effectively mark by mark, efficiency is low.Along with the sharp increase of image file quantity in internet, manual mark has been difficult to meet the demands, and automatic marking arises at the historic moment for this reason.
Conventional images automatic marking method comprises feature (such as color, texture, the local feature etc.) vector of the every pictures extracted in training set; After the proper vector obtaining every pictures, in the feature space that the proper vector of all pictures forms, application support vector machine (SUPPORT VECTOR MACHINE, SVM) method, for each label training sorter, obtain multiple sorter; Use the proper vector of sorter to picture to be marked to classify, obtain the label for labelling result of picture to be marked.
But, in existing image automatic annotation method, what the label of image adopted is User Defined labeled data, User Defined labeled data contains a large amount of irregular and do not have significant label (such as, " qeiox ", " 2000 " etc.), there is very large noise, have impact on the performance of sorter, reduce accuracy rate and the hit rate of mark.Therefore, how reducing above-mentioned noise to the harmful effect that automatic marking brings is current problem demanding prompt solution.
Summary of the invention
Thering is provided a kind of files-designated injecting method and device in the embodiment of the present invention, the harmful effect that the noise existed for reducing User Defined labeled data brings automatic marking, improving accuracy rate and the hit rate of automatic marking.
A kind of files-designated injecting method that first aspect present invention provides, comprising:
Obtain file to be marked;
According to described file to be marked, obtain the proper vector of described file to be marked;
Label according to the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked indicates vectorial transformational relation, obtains the label instruction vector of described file to be marked;
Indicate the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtain the classification instruction vector of described file to be marked;
According to the label instruction vector sum of described file to be marked, the classification instruction vector of file to be marked, marks described file to be marked.
In conjunction with a first aspect of the present invention, in the implementation that the first is possible, before the step of the label instruction vector of the described file to be marked of described acquisition, also comprise:
Obtain the training file in file mark training set;
According to described training file, obtain the proper vector of described training file, described training file label instruction vector sum described in train file classification instruction vector;
According to the proper vector of described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file, and indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, the transformational relation of vector is indicated to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
In conjunction with the first possible implementation of a first aspect of the present invention, in the implementation that the second is possible, the proper vector of described acquisition described training file and the label of described training file indicate the transformational relation of vector and the label of described training file to indicate vector and the classification of described training file to indicate the step of the transformational relation of vector to comprise:
Set up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, λ 1, λ 2represent the parameter preset, V trepresent the transposed matrix of V;
According to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*VT with the classification of described training file.
In conjunction with the first possible implementation of a first aspect of the present invention, first aspect or the possible implementation of the second of first aspect, in the implementation that the third is possible, the step of the described file to be marked of described mark comprises:
In the label instruction vector of described file to be marked, select the component of predetermined number as the label for labelling result of described file to be marked, mark described file to be marked, wherein, be greater than the value of non-selected component by the value of the component selected.
In conjunction with the first possible implementation of a first aspect of the present invention, first aspect or the possible implementation of the second of first aspect, in the 4th kind of possible implementation, the step of the described file to be marked of described mark comprises: using the maximal value component in the classification of described file to be marked instruction vector as the classification annotation result of described file to be marked, mark described file to be marked.
In conjunction with the first possible implementation of a first aspect of the present invention, first aspect or the possible implementation of the second of first aspect, in the implementation that the third is possible, described file to be marked is picture file, audio file, video file or text.
A second aspect of the present invention provides a kind of files-designated dispensing device, comprising:
First acquiring unit, for obtaining file to be marked;
Second acquisition unit, for according to described file to be marked, obtains the proper vector of described file to be marked;
First converting unit, for indicating the transformational relation of vector according to the label of the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked, obtains the label instruction vector of described file to be marked;
Second converting unit, for indicating the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtains the classification instruction vector of described file to be marked;
Mark unit, for the classification instruction vector of file to be marked according to the label instruction vector sum of described file to be marked, marks described file to be marked.
In conjunction with a second aspect of the present invention, in the implementation that the first is possible, described files-designated dispensing device also comprises:
3rd acquiring unit, for obtaining the training file in file mark training set;
4th acquiring unit, for according to described training file, trains the classification instruction vector of file described in the label instruction vector sum obtaining the proper vector of described training file, described training file;
Solve transformational relation unit, for the proper vector according to described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file, and indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, the transformational relation of vector is indicated to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
In conjunction with the first possible implementation of a second aspect of the present invention, in the implementation that the second is possible, described in solve transformational relation unit and be further used for setting up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, λ 1, λ 2represent the parameter preset, V trepresent the transposed matrix of V, and for according to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*V with the classification of described training file t.
In conjunction with a second aspect of the present invention, the implementation that the first and the second of second aspect are possible, in the implementation that the third is possible, described mark unit is further used for selecting the component of predetermined number as the label for labelling result of described file to be marked in the label instruction vector of described file to be marked, mark described file to be marked, wherein, the value of non-selected component is greater than by the value of the component selected, and for indicating the maximal value component in vector as the classification annotation result of described file to be marked the classification of described file to be marked, mark described file to be marked.
There is provided a kind of files-designated injecting method in the embodiment of the present invention, the classification instruction vector of file to be marked described in its label according to described file to be marked instruction vector sum, marks described file to be marked.Because classification instruction vector generally has goodish standardization, noise is little, therefore, the present invention carries out marking the harmful effect that can reduce noise that User Defined labeled data exists and bring automatic marking to file in the mode that combines of label instruction vector sum classification instruction vector, improve accuracy rate and the hit rate of automatic marking.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the process flow diagram of a kind of files-designated injecting method that the embodiment of the present invention provides;
Fig. 2 is obtaining the flow chart of steps before the label instruction vector of described file to be marked;
Fig. 3 is the structural representation of a kind of files-designated dispensing device that the embodiment of the present invention provides;
Fig. 4 is the structural representation of the another kind of files-designated dispensing device that the embodiment of the present invention provides;
Fig. 5 is the structural representation of another files-designated dispensing device that the embodiment of the present invention provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, a kind of files-designated injecting method of the embodiment of the present invention, comprising:
101, file to be marked is obtained.
Described file to be marked can be picture file, audio file, video file or text.
102, according to described file to be marked, the proper vector of described file to be marked is obtained.
When described file to be marked is picture, scale invariant feature can be adopted to change (Scale-invariant feature transform, SIFT) algorithm and feature bunch (Bag ofFeature, BoF) method calculate the proper vector of every pictures.
103, indicate vectorial transformational relation according to the label of the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked, obtain the label instruction vector of described file to be marked.
After the proper vector obtaining described file to be marked, indicate the transformational relation of vector according to the proper vector of described file to be marked and the label of described file to be marked, calculate the label instruction vector of described file to be marked.
104, indicate the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtain the classification instruction vector of described file to be marked.
After the label instruction vector obtaining described file to be marked, indicate the transformational relation of vector according to label instruction vector and the classification of described file to be marked of described file to be marked, the classification instruction vector of described file to be marked can be calculated.
105, according to the label instruction vector sum of described file to be marked, the classification instruction of file to be marked is vectorial, marks described file to be marked.
After described in the label instruction vector sum obtaining described file to be marked, the classification of file to be marked indicates vector, the classification instruction vector of file to be marked described in the label of described file to be marked instruction vector sum is marked on described file to be marked.Because classification instruction vector generally has goodish standardization, noise is little, therefore, the embodiment of the present invention carries out marking the harmful effect that can reduce noise that User Defined labeled data exists and bring automatic marking to file in the mode that combines of label instruction vector sum classification instruction vector, improve accuracy rate and the hit rate of automatic marking.
In order to improve accuracy rate and the hit rate of automatic marking further, after described in the label instruction vector sum obtaining described file to be marked, the classification of file to be marked indicates vector, in the label instruction vector of described file to be marked, select the component of predetermined number as the label for labelling result of described file to be marked, mark described file to be marked, wherein, the value of non-selected component is greater than by the value of the component selected.Namely in the label instruction vector of described file to be marked, select the maximal value component of predetermined number as the label for labelling result of described file to be marked, mark described file to be marked, and/or, using the maximal value component in the classification of described file to be marked instruction vector as the classification annotation result of described file to be marked, mark described file to be marked.
The training file that the proper vector of the file described to be marked in step 103 and the label of described file to be marked indicate the label of the file described to be marked in the transformational relation of vector and step 104 to indicate vector and the classification of described file to be marked to indicate the transformational relation of vector can be marked in training set by file carries out solving obtaining.Concrete, as shown in Figure 2, before the step of the label instruction vector of the described file to be marked of described acquisition, described files-designated injecting method also comprises:
201, the training file in file mark training set is obtained.
Such as, file to be marked is picture, then obtain picture and practice the training picture x concentrated 1, x 2..., x n.As shown in table 1, every pictures has the every pictures of corresponding label for labelling to have corresponding label and classification.
Table 1
Training picture Label Classification
Picture 1 Lotus, landscape, summer Plant
Picture 2 Sunset clouds, the setting sun, landscape, sea Landscape
…… …… ……
Picture n Sofa, doggie Pet
202, according to described training file, obtain the proper vector of described training file, described training file label instruction vector sum described in train file classification instruction vector.
Such as, the proper vector x of the every pictures in training set can be calculated by SIFT algorithm and BoF method i∈ R d, R is vector space, thus obtains n pictures eigenmatrix X=[x 1, x 2..., x n] t∈ R n*d, wherein d is characteristic dimension, n be greater than 0 integer; Simultaneously obtain sample label matrix Y=[y from the mark of picture and grouped data 1, y 2..., y n] t∈ R n*m, wherein, y i∈ { 0,1} mrepresent the label instruction vector (m represents that this picture is noted as m label when dimension is 1) of the i-th pictures, wherein m is the quantity of label, and picture classification matrix T=[t 1, t 2..., t n] t∈ R n*k, wherein, t i∈ { 0,1} krepresent the classification instruction vector (representing when kth dimension is 1 that this picture is divided into a kth classification) of the i-th pictures, wherein k is the quantity of classification.
203, according to the label instruction vector sum of the proper vector of described training file, described training file, train the classification instruction vector of file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file.
204, indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, indicate the transformational relation of vector to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
In embodiments of the present invention, the proper vector of file can pass through transition matrix, tries to achieve label instruction vector, and label instruction vector can try to achieve classification instruction vector by another transition matrix.Concrete, the proper vector of described acquisition described training file and the label of described training file indicate the transformational relation of vector and the label of described training file to indicate vector and the classification of described training file to indicate the step of the transformational relation of vector to comprise:
Set up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, V trepresent the transposed matrix of V, λ 1, λ 2represent the parameter preset, the data set that these two default parameters form according to X, Y, T is determined, determines after different numerical value specifically can be selected repeatedly to attempt.For each training, described two default parameters are fixed values;
According to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*V with the classification of described training file t.
Here it should be noted that, the norm in objective function can be 2,1-norm, also can be 2,2-norm.Therefore, objective function specifically can be expressed as:
Min||W|| 2,1+ λ 1|| X*W-Y|| 2,1+ λ 2|| (X*W) * V t-T|| 2,1, || W|| 2,1represent 2, the 1-norms of W, || W|| 2,1=∑ i|| w i|| 2, w ii-th row vector of representing matrix W;
Objective function also can be expressed as: min||W|| 2,2+ λ 1|| X*W-Y|| 2,2+ λ 2|| (X*W) * V t-T|| 2,2, wherein, || W|| 2,2represent 2, the 2-norms of W, also referred to as fro norm, the wherein trace function of Tr representing matrix.
After setting up described objective function, matrix W and matrix V can be obtained by asking local derviation and gradient descent method to solve to objective function, W ∈ R d*m, V ∈ R k*m, d is file characteristic dimension, and m is number of labels, and k is classification quantity.Concrete, single order local derviation is carried out to the W in described objective function, obtains formula:
D w -1W+λ 1X TD u -1u+λ 2X TD v -1vV
Wherein temporary variable u=X*W-Y, v=(X*W) * V t-T, D w=diag (|| w i|| 2), D v=diag (|| v i|| 2).Utilize gradient descent method can solve W and V in above formula, false code is as follows:
Input: picture feature matrix X=[x 1, x 2..., x n] t
Label oriental matrix Y=[y 1, y 2..., y n] t
Classification oriental matrix T=[t 1, t 2..., t n] t
Parameter lambda 1and λ 2
Critical value ε
Export: parameter W and V
The algorithm solved is as follows:
W=random (d, m), V=random (k, m) // give the initial random value of transition matrix W and V mono-
opt=||W|| 2,11||X*W-Y|| 2,12||(X*W)*V T-T|| 2,1
Do//iterative computation is until the variation range of objective function converges to critical value ε
opt’=opt
Dw=diag(||wi|| 2)
u=X*W-Y,v=(X*W)*V T-T
D u=diag(||ui|| 2),D v=diag(||vi|| 2)
Δ w=D w -1w+ λ 1xTD u -1u+ λ 2xTD v -1vV//the gradient of difference calculating target function on W and V direction
Δv=λ 2(D v -1v) T*X*W
W=W-Δ w//constantly revises the value of W and V, and objective function is restrained to minimum value
V=V-Δv
opt=||W|| 2,11||X*W-Y|| 2,12||(X*W)*V T-T|| 2,1
while|opt-opt’|≤ε
output?W,V
As can be seen here, according to described objective function, can solve and obtain described W and described V.Therefore, the label of the proper vector and described training file that can obtain described training file indicates the transformational relation of vector, i.e. Y=X*W, and the label instruction vector obtaining described training file indicates vectorial transformational relation with the classification of described training file, i.e. T=Y*V t.
Utilizing after training set obtains transition matrix W and V, when marking file to be marked, according to Y=X*W, the proper vector of file to be marked can be utilized, obtaining the label instruction vector matrix of file to be marked, according to T=Y*V tutilize the label instruction vector of file to be marked, obtain the classification instruction vector matrix of this file to be marked, select the label of this file to be marked to indicate several maximal value components front in vector matrix and the maximal value component in selection sort instruction vector matrix as the label for labelling result of this file to be marked and classification results, thus realize the automatic marking to this file to be marked.
The embodiment of the present invention has been fused in the middle of a majorized function by the information of classifying the documents and file characteristic, when having document classification information data, can improve accuracy rate and hit rate that files-designated injection system improves mark.Here it is important to note that the file in the embodiment of the present invention can be picture file, audio file, video file or text.Namely the inventive method not only can mark picture, can also mark audio frequency, video or text, improves accuracy rate and the hit rate of automatic marking.
Be directed to the method for above-mentioned file mark, the embodiment of the present invention additionally provides corresponding files-designated dispensing device.As shown in Figure 3, a kind of files-designated dispensing device, comprising:
First acquiring unit 301, for obtaining file to be marked;
Second acquisition unit 302, for according to described file to be marked, obtains the proper vector of described file to be marked;
First converting unit 303, for indicating the transformational relation of vector according to the label of the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked, obtains the label instruction vector of described file to be marked;
Second converting unit 304, for indicating the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtains the classification instruction vector of described file to be marked;
Mark unit 305, for the classification instruction vector of file to be marked according to the label instruction vector sum of described file to be marked, marks described file to be marked.
As shown in Figure 4, the embodiment of the present invention also provides another kind of files-designated dispensing device, comprising:
First acquiring unit 401, for obtaining file to be marked;
Second acquisition unit 402, for according to described file to be marked, obtains the proper vector of described file to be marked;
First converting unit 403, for indicating the transformational relation of vector according to the label of the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked, obtains the label instruction vector of described file to be marked;
Second converting unit 404, for indicating the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtains the classification instruction vector of described file to be marked;
Mark unit 408, for the classification instruction vector of file to be marked according to the label instruction vector sum of described file to be marked, marks described file to be marked.
Described files-designated dispensing device also comprises:
3rd acquiring unit 405, for obtaining the training file in file mark training set;
4th acquiring unit 406, for according to described training file, trains the classification instruction vector of file described in the label instruction vector sum obtaining the proper vector of described training file, described training file;
Solve transformational relation unit 407, for the proper vector according to described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file, and indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, the transformational relation of vector is indicated to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
Preferably, solve transformational relation unit 407 described in be further used for setting up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, λ 1, λ 2represent the parameter preset, V trepresent the transposed matrix of V, and for according to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*V with the classification of described training file t.
Preferably, described mark unit 408 is further used for selecting the component of predetermined number as the label for labelling result of described file to be marked in the label instruction vector of described file to be marked, mark described file to be marked, wherein, the value of non-selected component is greater than by the value of the component selected, and for using the classification of described file to be marked instruction vector in maximal value component as the classification annotation result of described file to be marked, mark described file to be marked.
Preferably, described file to be marked is picture file, audio file, video file or text.
As shown in Figure 5, the embodiment of the present invention additionally provides another files-designated dispensing device, this device can be server also can be other computer equipments, this device comprises: input interface 501, storer 502 and processor 503, described input interface 501, storer 502 can be connected by bus 504 or other modes with processor 503, wherein, Fig. 5 is for bus connecting mode.In described files-designated dispensing device, described input interface 501 is for obtaining file to be marked, described processor 503 is for according to described file to be marked, obtain the proper vector of described file to be marked, and for indicating the transformational relation of vector according to the label of the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked, obtain the label instruction vector of described file to be marked, and for indicating the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtain the classification instruction vector of described file to be marked, and it is vectorial for the classification instruction of file to be marked according to the label instruction vector sum of described file to be marked, mark described file to be marked.
Preferably, described input interface 501 is also for obtaining the training file in file mark training set; described processor 503 is also for according to described training file, obtain the proper vector of described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, and for the proper vector according to described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file, and indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, the transformational relation of vector is indicated to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
Preferably, described processor 503 is also for setting up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, λ 1, λ 2represent the parameter preset, V trepresent the transposed matrix of V, and for according to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*V with the classification of described training file t.
Preferably, described processor 503 also for selecting the component of predetermined number as the label for labelling result of described file to be marked in the label instruction vector of described file to be marked, mark described file to be marked, wherein, the value of non-selected component is greater than by the value of the component selected, and for using the classification of described file to be marked instruction vector in maximal value component as the classification annotation result of described file to be marked, mark described file to be marked.
Above to a kind of files-designated injecting method that the embodiment of the present invention provides, and corresponding device is described in detail, apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (10)

1. a files-designated injecting method, is characterized in that, comprising:
Obtain file to be marked;
According to described file to be marked, obtain the proper vector of described file to be marked;
Label according to the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked indicates vectorial transformational relation, obtains the label instruction vector of described file to be marked;
Indicate the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtain the classification instruction vector of described file to be marked;
According to the label instruction vector sum of described file to be marked, the classification instruction vector of file to be marked, marks described file to be marked.
2. files-designated injecting method according to claim 1, is characterized in that, before the step of the label instruction vector of the described file to be marked of described acquisition, also comprises:
Obtain the training file in file mark training set;
According to described training file, obtain the proper vector of described training file, described training file label instruction vector sum described in train file classification instruction vector;
According to the proper vector of described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file, and indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, the transformational relation of vector is indicated to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
3. files-designated injecting method according to claim 2, it is characterized in that, the proper vector of described acquisition described training file and the label of described training file indicate the transformational relation of vector and the label of described training file to indicate vector and the classification of described training file to indicate the step of the transformational relation of vector to comprise:
Set up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, λ 1, λ 2represent the parameter preset, V trepresent the transposed matrix of V;
According to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*V with the classification of described training file t.
4. the files-designated injecting method according to claims 1 to 3 any one, is characterized in that, the step of the described file to be marked of described mark comprises:
In the label instruction vector of described file to be marked, select the component of predetermined number as the label for labelling result of described file to be marked, mark described file to be marked, wherein, be greater than the value of non-selected component by the value of the component selected.
5. the files-designated injecting method according to claims 1 to 3 any one, is characterized in that, the step of the described file to be marked of described mark comprises:
Using the maximal value component in the classification of described file to be marked instruction vector as the classification annotation result of described file to be marked, mark described file to be marked.
6. the files-designated injecting method according to claims 1 to 3 any one, is characterized in that, described file to be marked is picture file, audio file, video file or text.
7. a files-designated dispensing device, is characterized in that, comprising:
First acquiring unit, for obtaining file to be marked;
Second acquisition unit, for according to described file to be marked, obtains the proper vector of described file to be marked;
First converting unit, for indicating the transformational relation of vector according to the label of the proper vector of described file to be marked and the proper vector of described file to be marked and described file to be marked, obtains the label instruction vector of described file to be marked;
Second converting unit, for indicating the transformational relation of vector according to the label instruction vector of described file to be marked and the label instruction vector of described file to be marked with the classification of described file to be marked, obtains the classification instruction vector of described file to be marked;
Mark unit, for the classification instruction vector of file to be marked according to the label instruction vector sum of described file to be marked, marks described file to be marked.
8. files-designated dispensing device according to claim 7, is characterized in that, also comprise:
3rd acquiring unit, for obtaining the training file in file mark training set;
4th acquiring unit, for according to described training file, trains the classification instruction vector of file described in the label instruction vector sum obtaining the proper vector of described training file, described training file;
Solve transformational relation unit, for the proper vector according to described training file, the classification instruction vector of file is trained described in the label instruction vector sum of described training file, the label of the proper vector and described training file that obtain described training file indicates the transformational relation of vector and the label of described training file to indicate vector to indicate the transformational relation of vector with the classification of described training file, and indicate the transformational relation of vector to indicate the transformational relation of vector as the proper vector of described file to be marked and the label of described file to be marked on the label of the proper vector of described training file and described training file, the transformational relation of vector is indicated to indicate vector to indicate the transformational relation of vector with the classification of described file to be marked as the label of described file to be marked the label of described training file instruction vector and the classification of described training file.
9. files-designated dispensing device according to claim 8, is characterized in that, described in solve transformational relation unit and be further used for setting up objective function: min||W||+ λ 1|| X*W-Y||+ λ 2|| (X*W) * V t-T||, wherein, X represents the eigenvectors matrix of described training file, Y represents the label instruction vector matrix of described training file, T represents that the classification of described training file indicates vector matrix, W represents the conversion parameter matrix between described X and described Y, V represents the conversion parameter matrix between described Y and described T, and min||W|| represents the minimum norm of W, λ 1, λ 2represent the parameter preset, V trepresent the transposed matrix of V, and for according to described objective function, obtain described W and described V, make the label of the proper vector of described training file and described training file indicate the transformational relation of vector to be Y=X*W, and make the label of described training file indicate vector to indicate the transformational relation of vector to be T=Y*V with the classification of described training file t.
10. the files-designated dispensing device according to claim 7 to 9 any one, it is characterized in that, described mark unit is further used for selecting the component of predetermined number as the label for labelling result of described file to be marked in the label instruction vector of described file to be marked, mark described file to be marked, wherein, the value of non-selected component is greater than by the value of the component selected, and for using the classification of described file to be marked instruction vector in maximal value component as the classification annotation result of described file to be marked, mark described file to be marked.
CN201310206739.3A 2013-05-29 2013-05-29 File mask method and device Active CN104216882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310206739.3A CN104216882B (en) 2013-05-29 2013-05-29 File mask method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310206739.3A CN104216882B (en) 2013-05-29 2013-05-29 File mask method and device

Publications (2)

Publication Number Publication Date
CN104216882A true CN104216882A (en) 2014-12-17
CN104216882B CN104216882B (en) 2018-05-11

Family

ID=52098388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310206739.3A Active CN104216882B (en) 2013-05-29 2013-05-29 File mask method and device

Country Status (1)

Country Link
CN (1) CN104216882B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407407A (en) * 2016-09-22 2017-02-15 江苏通付盾科技有限公司 A file tagging system and method
CN106897424A (en) * 2017-02-24 2017-06-27 北京时间股份有限公司 Information labeling system and method
CN107330069A (en) * 2017-06-30 2017-11-07 北京金山安全软件有限公司 Multimedia data processing method and device, server and storage medium
CN110555137A (en) * 2018-03-31 2019-12-10 华为技术有限公司 Label filling method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207966A (en) * 2011-06-01 2011-10-05 华南理工大学 Video content quick retrieving method based on object tag
CN102737244A (en) * 2012-06-06 2012-10-17 哈尔滨工程大学 Method for determining corresponding relationships between areas and annotations in annotated image
US20120269436A1 (en) * 2011-04-20 2012-10-25 Xerox Corporation Learning structured prediction models for interactive image labeling
CN102819746A (en) * 2012-07-10 2012-12-12 电子科技大学 Method for automatically marking category of remote sensing image based on author-genre theme model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120269436A1 (en) * 2011-04-20 2012-10-25 Xerox Corporation Learning structured prediction models for interactive image labeling
CN102207966A (en) * 2011-06-01 2011-10-05 华南理工大学 Video content quick retrieving method based on object tag
CN102737244A (en) * 2012-06-06 2012-10-17 哈尔滨工程大学 Method for determining corresponding relationships between areas and annotations in annotated image
CN102819746A (en) * 2012-07-10 2012-12-12 电子科技大学 Method for automatically marking category of remote sensing image based on author-genre theme model

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407407A (en) * 2016-09-22 2017-02-15 江苏通付盾科技有限公司 A file tagging system and method
CN106407407B (en) * 2016-09-22 2019-10-15 江苏通付盾科技有限公司 A kind of file labeling system and method
CN106897424A (en) * 2017-02-24 2017-06-27 北京时间股份有限公司 Information labeling system and method
CN107330069A (en) * 2017-06-30 2017-11-07 北京金山安全软件有限公司 Multimedia data processing method and device, server and storage medium
CN107330069B (en) * 2017-06-30 2020-10-23 北京金山安全软件有限公司 Multimedia data processing method and device, server and storage medium
CN110555137A (en) * 2018-03-31 2019-12-10 华为技术有限公司 Label filling method and device

Also Published As

Publication number Publication date
CN104216882B (en) 2018-05-11

Similar Documents

Publication Publication Date Title
CN103049446B (en) A kind of image search method and device
Sun et al. Pronet: Learning to propose object-specific boxes for cascaded neural networks
Qu et al. Distillation using oracle queries for transformer-based human-object interaction detection
CN104156433B (en) Image retrieval method based on semantic mapping space construction
Uricchio et al. Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging
CN110598603A (en) Face recognition model acquisition method, device, equipment and medium
CN104216882A (en) File labeling method and device
CN103745233B (en) The hyperspectral image classification method migrated based on spatial information
CN104504368A (en) Image scene recognition method and image scene recognition system
CN105118068A (en) Medical image automatic annotation method under small sample condition
Xie et al. Incorporating visual adjectives for image classification
CN103473308A (en) High-dimensional multimedia data classifying method based on maximum margin tensor study
CN103295026B (en) Based on the image classification method of space partial polymerization description vectors
CN107491521A (en) A kind of image search system and method
CN110851627A (en) Method for describing sun black subgroup in full-sun image
CN104331717B (en) The image classification method that a kind of integration characteristics dictionary structure is encoded with visual signature
CN111814508B (en) Character recognition method, system and equipment
CN101504723B (en) Projection space establishing method and apparatus
CN102034102A (en) Image-based significant object extraction method as well as complementary significance graph learning method and system
CN104462458A (en) Data mining method of big data system
CN105574494A (en) Multi-classifier posture identification method and method
CN112699908B (en) Method for labeling picture, electronic terminal, computer readable storage medium and equipment
KR101656604B1 (en) Method and apparatus for selecting feature used to classify multi-label
CN105260741A (en) Digital image marking method based on higher-order graph structure p-Laplacian sparse codes
CN103793706B (en) Character recognition device and character identifying method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant