CN104216882B - File mask method and device - Google Patents

File mask method and device Download PDF

Info

Publication number
CN104216882B
CN104216882B CN201310206739.3A CN201310206739A CN104216882B CN 104216882 B CN104216882 B CN 104216882B CN 201310206739 A CN201310206739 A CN 201310206739A CN 104216882 B CN104216882 B CN 104216882B
Authority
CN
China
Prior art keywords
file
marked
vector
label
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310206739.3A
Other languages
Chinese (zh)
Other versions
CN104216882A (en
Inventor
邓志鸿
张�杰
赖博彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Huawei Technologies Co Ltd
Original Assignee
Peking University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Huawei Technologies Co Ltd filed Critical Peking University
Priority to CN201310206739.3A priority Critical patent/CN104216882B/en
Publication of CN104216882A publication Critical patent/CN104216882A/en
Application granted granted Critical
Publication of CN104216882B publication Critical patent/CN104216882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of file mask method, it includes:According to file to be marked, the feature vector of file to be marked is obtained;It is vectorial according to the feature vector of file to be marked and the feature vector of file to be marked and the transformational relation of the label instruction vector of file to be marked, the label instruction for obtaining file to be marked;The vectorial transformational relation vectorial with the classification instruction of file to be marked of the label instruction of file vectorial and to be marked is indicated according to the label of file to be marked, obtains the classification instruction vector of file to be marked;The classification for indicating vector sum file to be marked according to the label of file to be marked indicates vector, marks file to be marked.Since classification instruction vector generally has fairly good normalization, noise is small, therefore, the present invention is labeled file the harmful effect that can be reduced noise existing for User Defined labeled data and be brought to automatic marking in a manner of label instruction vector sum classification instruction vector is combined, and improves the accuracy rate and hit rate of automatic marking.

Description

File mask method and device
Technical field
The present invention relates to document classification field, and in particular to a kind of file mask method and device.
Background technology
At present, image labeling can be divided into mark and two class of automatic marking manually.Wherein, although mark can be effective manually Image is labeled, but efficiency is low.With the sharp increase of image file quantity in internet, mark is difficult manually To meet the requirements, automatic marking comes into being for this.
Conventional images automatic marking method includes extracting feature (such as color, texture, the office of every pictures in training set Portion's feature etc.) vector;After the feature vector per pictures is obtained, in the feature space that the feature vector of all pictures forms In, using support vector machines (SUPPORT VECTOR MACHINE, SVM) method, for each one grader of label training, Obtain multiple graders;Classified using grader to the feature vector of picture to be marked, obtain picture to be marked Label for labelling result.
However, in existing image automatic annotation method, the label of image using User Defined labeled data, User Defined labeled data contains substantial amounts of irregular and does not have significant label (for example, " qeiox ", " 2000 " etc.), In the presence of very big noise, the performance of grader is have impact on, reduces the accuracy rate and hit rate of mark.Therefore, how to reduce It is current urgent problem to be solved to the harmful effect that automatic marking is brought to state noise.
The content of the invention
A kind of file mask method and device are provided in the embodiment of the present invention, deposited for reducing User Defined labeled data The harmful effect that is brought to automatic marking of noise, improve the accuracy rate and hit rate of automatic marking.
A kind of file mask method that first aspect present invention provides, including:
Obtain file to be marked;
According to the file to be marked, the feature vector of the acquisition file to be marked;
According to the feature vector of the file to be marked and the feature vector of the file to be marked with it is described to be marked The transformational relation of the label instruction vector of file, obtains the label instruction vector of the file to be marked;
Label instruction vector and the institute of vectorial and described file to be marked are indicated according to the label of the file to be marked The transformational relation of the classification instruction vector of file to be marked is stated, obtains the classification instruction vector of the file to be marked;
The classification of file to be marked indicates vector according to the label of the file to be marked indicates vector sum, marks institute State file to be marked.
With reference to the first aspect of the present invention, in the first possible implementation, the text to be marked is obtained described Before the step of the label instruction vector of part, further include:
Obtain the training file in file mark training set;
According to the trained file, obtain the feature vector of the trained file, the label of the trained file indicate to The classification instruction vector of amount and the trained file;
The training file according to the label of the feature vector of the trained file, the trained file indicates vector sum Classification instruction vector, obtains the transformational relation of the feature vector of the trained file and the label instruction vector of the trained file The transformational relation of the vectorial classification instruction vector with the trained file is indicated with the label of the trained file, and by described in The feature vector of the training file transformational relation vectorial with the label instruction of the trained file is as the file to be marked Feature vector and the transformational relation of the label instruction vector of the file to be marked, vector is indicated by the label of the trained file With the trained file classification instruction vector transformational relation as the file to be marked label instruction it is vectorial with it is described The transformational relation of the classification instruction vector of file to be marked.
With reference to the first possible implementation of the first aspect of the present invention, in second of possible implementation, Transformational relation and the instruction of the feature vector for obtaining the trained file with the label instruction vector of the trained file Practicing vectorial the step of the indicating vectorial transformational relation of classifying with the trained file of label instruction of file includes:
Establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the training The eigenvectors matrix of file, Y represent that the label instruction vector matrix of the trained file, T represent point of the trained file Class indicates vector matrix, and W represents the conversion parameter matrix between the X and Y, and V represents turn between the Y and T Parameter matrix is changed, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V;
According to the object function, the W and the V are obtained so that the feature vector of the trained file and the instruction The transformational relation for practicing the label instruction vector of file is Y=X*W, and causes label instruction vector and the institute of the trained file The transformational relation for stating the classification instruction vector of trained file is T=Y*VT.
With reference to the second of the first possible implementation or first aspect of the first aspect of the present invention, first aspect The possible implementation of kind, in the third possible implementation, described the step of marking the file to be marked, includes:
The component of predetermined number is selected as the file to be marked in the label instruction vector of the file to be marked Label for labelling as a result, the mark file to be marked, wherein, the value of selected component is more than non-selected component Value.
With reference to the second of the first possible implementation or first aspect of the first aspect of the present invention, first aspect The possible implementation of kind, in the 4th kind of possible implementation, described the step of marking the file to be marked, includes:Will The file to be marked classification instruction vector in maximum component as the file to be marked classification annotation as a result, mark Note the file to be marked.
With reference to second of the first possible implementation or first aspect of the first aspect of the present invention, first aspect Possible implementation, in the third possible implementation, the file to be marked is picture file, audio file, regards Frequency file or text.
The second aspect of the present invention provides a kind of file annotation equipment, including:
First acquisition unit, for obtaining file to be marked;
Second acquisition unit, for according to the file to be marked, the feature vector of the acquisition file to be marked;
First converting unit, for the feature vector according to the file to be marked and the feature of the file to be marked The transformational relation of the vectorial instruction vector with the label of the file to be marked, obtain the file to be marked label indicate to Amount;
Second converting unit, for indicating vectorial and described file to be marked according to the label of the file to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction, obtains the classification of the file to be marked Instruction vector;
Unit is marked, the classification for the file to be marked according to the label of the file to be marked instruction vector sum refers to Show vector, mark the file to be marked.
With reference to the second aspect of the present invention, in the first possible implementation, the file annotation equipment further includes:
3rd acquiring unit, for obtaining the training file in file mark training set;
4th acquiring unit, for according to the trained file, obtaining the feature vector of the trained file, the training The classification instruction vector of training file described in the label instruction vector sum of file;
Transformational relation unit is solved, the label for the feature vector according to the trained file, the trained file refers to Show the classification instruction vector of training file described in vector sum, obtain the feature vector of the trained file and the trained file The label instruction of the transformational relation and the trained file of label instruction vector is vectorial indicate with trained file classification to The transformational relation of amount, and the conversion of the label of the feature vector of the trained file and trained file instruction vector is closed It is the feature vector and the transformational relation of the label instruction vector of the file to be marked as the file to be marked, by described in The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of training file is as the text to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of part.
With reference to the first possible implementation of the second aspect of the present invention, in second of possible implementation, The solution transformational relation unit is further used for establishing object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT-T | |, wherein, X represents the eigenvectors matrix of the trained file, and Y represents the label instruction vector matrix of the trained file, T Represent the classification instruction vector matrix of the trained file, W represents the conversion parameter matrix between the X and Y, and V is represented Conversion parameter matrix between the Y and the T, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VT Represent the transposed matrix of V, and for according to the object function, obtaining the W and the V so that the trained file Feature vector and the transformational relation of the label instruction vector of the trained file are Y=X*W, and cause the trained file The transformational relation of the vectorial classification instruction vector with the trained file of label instruction is T=Y*VT
With reference to the first and second of possible implementation of the second aspect of the present invention, second aspect, at the third In possible implementation, the mark unit is further used for selecting in the label instruction vector of the file to be marked pre- The component of number is determined as the label for labelling of the file to be marked as a result, the mark file to be marked, wherein, it is selected The value of component is more than the value of non-selected component, and for the maximum in the classification instruction vector by the file to be marked It is worth component as the classification annotation of the file to be marked as a result, the mark file to be marked.
A kind of file mask method is provided in the embodiment of the present invention, it indicates vector according to the label of the file to be marked Classification with the file to be marked indicates vector, and the file to be marked is labeled.Since classification instruction vector is general With fairly good normalization, noise is small, and therefore, the present invention is in a manner of label instruction vector sum classification instruction vector is combined File, which is labeled, can reduce the harmful effect that noise existing for User Defined labeled data brings automatic marking, carry The accuracy rate and hit rate of high automatic marking.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is a kind of flow chart of file mask method provided in an embodiment of the present invention;
Fig. 2 is the step flow chart before the label instruction vector of the file to be marked is obtained;
Fig. 3 is a kind of structure diagram of file annotation equipment provided in an embodiment of the present invention;
Fig. 4 is the structure diagram of another file annotation equipment provided in an embodiment of the present invention;
Fig. 5 is the structure diagram of another file annotation equipment provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts Embodiment, belongs to the scope of protection of the invention.
As shown in Figure 1, a kind of file mask method of the embodiment of the present invention, including:
101st, file to be marked is obtained.
The file to be marked can be picture file, audio file, video file or text.
102nd, according to the file to be marked, the feature vector of the acquisition file to be marked.
When the file to be marked is picture, scale invariant feature conversion (Scale-invariant can be used Feature transform, SIFT) every pictures are calculated in algorithm and feature cluster (Bag ofFeature, BoF) method Feature vector.
103rd, treated according to the feature vector of the feature vector of the file to be marked and the file to be marked with described The transformational relation of the label instruction vector of file is marked, obtains the label instruction vector of the file to be marked.
After the feature vector of the file to be marked is obtained, according to the feature vector of the file to be marked with it is described The transformational relation of the label instruction vector of file to be marked, is calculated the label instruction vector of the file to be marked.
104th, the label for indicating vectorial and described file to be marked according to the label of the file to be marked indicates vector With the transformational relation of the classification instruction vector of the file to be marked, the classification for obtaining the file to be marked indicates vector.
After the label instruction vector of the file to be marked is obtained, according to the label of the file to be marked indicate to Amount and the transformational relation of the classification instruction vector of the file to be marked, the classification that the file to be marked can be calculated refer to Show vector.
105th, the classification of file to be marked according to the label of the file to be marked indicates vector sum indicates vector, mark Note the file to be marked.
After the classification instruction vector of file to be marked described in the label instruction vector sum of the file to be marked is obtained, By the classification instruction vector mark of file to be marked described in the label instruction vector sum of the file to be marked in described to be marked On file.Since classification instruction vector generally has fairly good normalization, noise is small, and therefore, the embodiment of the present invention is with label User Defined labeled data can be reduced and deposit by indicating that the mode that is combined of vector sum classification instruction vector is labeled file The harmful effect that is brought to automatic marking of noise, improve the accuracy rate and hit rate of automatic marking.
In order to further improve the accuracy rate and hit rate that improve automatic marking, the label of the file to be marked is being obtained After the classification instruction vector for indicating file to be marked described in vector sum, selected in the label instruction vector of the file to be marked The component of predetermined number is selected as the label for labelling of the file to be marked as a result, the mark file to be marked, wherein, it is chosen The value for the component selected is more than the value of non-selected component.Selected in the label instruction vector of the file to be marked predetermined The maximum component of number as the file to be marked label for labelling as a result, the mark file to be marked, and/or, will The file to be marked classification instruction vector in maximum component as the file to be marked classification annotation as a result, mark Note the file to be marked.
The feature vector of the file to be marked in step 103 turns with the label instruction vector of the file to be marked The label instruction of the file to be marked changed in relation and step 104 is vectorial indicate with the file to be marked classification to The transformational relation of amount can mark the training file in training set by file and be solved to obtain.Specifically, as shown in Fig. 2, Before the step of the label instruction vector for obtaining the file to be marked, the file mask method further includes:
201st, the training file in file mark training set is obtained.
For example, file to be marked is picture, then obtains picture and practice the training picture x concentrated1、x2..., xn.Such as the institute of table 1 Show there is corresponding label for labelling to have corresponding label and classification per pictures per pictures.
Table 1
Training picture Label Classification
Picture 1 Lotus, landscape, summer Plant
Picture 2 Sunset clouds, the setting sun, landscape, sea Landscape
…… …… ……
Picture n Sofa, doggie Pet
202nd, according to the trained file, the feature vector of the trained file is obtained, the label of the trained file refers to Show the classification instruction vector of training file described in vector sum.
For example, the feature vector x of every pictures in training set can be calculated by SIFT algorithms and BoF methodsi ∈Rd, R is vector space, so as to obtain n pictures eigenmatrix X=[x1, x2..., xn]T∈Rn*d, wherein d is characterized dimension Degree, n are the integer more than 0;At the same time sample label matrix Y=[y are obtained with grouped data from the mark of picture1, y2..., yn]T ∈Rn*m, wherein, yi∈ { 0,1 }mRepresent that the label of the i-th pictures indicates that (m represents that the picture is noted as to vector when dimension is 1 M-th of label), wherein m is the quantity of label, and picture classification matrix T=[t1, t2..., tn]T∈Rn*k, wherein, ti∈ { 0,1 }kRepresent the classification instruction vector (kth dimension represents that the picture is divided into k-th of classification when being 1) of the i-th pictures, wherein k For the quantity of classification.
203rd, the training text according to the label of the feature vector of the trained file, the trained file indicates vector sum The classification instruction vector of part, obtains the conversion of the feature vector of the trained file and the label instruction vector of the trained file The transformational relation of the vectorial classification instruction vector with the trained file of the label of relation and trained file instruction.
204th, the transformational relation of the label of the feature vector of the trained file and trained file instruction vector is made The transformational relation of vector is indicated with the label of the file to be marked for the feature vector of the file to be marked, by the training The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of file is as the file to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction.
In embodiments of the present invention, the feature vector of file can try to achieve label instruction vector, label by transition matrix Instruction vector can try to achieve classification instruction vector by another transition matrix.Specifically, the spy for obtaining the trained file Levy the vectorial label with the transformational relation of the label of trained file instruction vector and the trained file and indicate vector and institute The step of transformational relation for stating the classification instruction vector of trained file, includes:
Establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the training The eigenvectors matrix of file, Y represent that the label instruction vector matrix of the trained file, T represent point of the trained file Class indicates vector matrix, and W represents the conversion parameter matrix between the X and Y, and V represents turn between the Y and T Parameter matrix is changed, min | | W | | represent the minimum norm of W, VTRepresent the transposed matrix of V, λ1、λ2Represent default parameter, this two A default parameter is determined according to X, Y, T data set formed, can specifically select different numerical value to carry out after repeatedly attempting Determine.For training every time, described two default parameters are fixed values;
According to the object function, the W and the V are obtained so that the feature vector of the trained file and the instruction The transformational relation for practicing the label instruction vector of file is Y=X*W, and causes label instruction vector and the institute of the trained file The transformational relation for stating the classification instruction vector of trained file is T=Y*VT
Need exist for illustrating, the norm in object function can be 2,1- norms, or 2,2- norms.Cause This, object function can be specifically expressed as:
min||W||2,11||X*W-Y||2,12||(X*W)*VT-T||2,1, | | W | |2,1Represent 2, the 1- norms of W, | | W ||2,1=∑i||wi||2, wiThe i-th row vector of representing matrix W;
Object function can also be expressed as:min||W||2,21||X*W-Y||2,22||(X*W)*VT-T||2,2, wherein, ||W||2,2Represent 2, the 2- norms of W, also referred to as fro norms,Wherein Tr representing matrixes Trace function.
, can be by asking object function local derviation and gradient descent method to solve to obtain square after the object function is established Battle array W and matrix V, W ∈ Rd*m, V ∈ Rk*m, d is file characteristic dimension, and m is number of labels, and k is classification quantity.Specifically, to institute The W stated in object function carries out single order local derviation, obtains formula:
Dw -1W+λ1XTDu -1u+λ2XTDv -1vV
Wherein temporary variable u=X*W-Y, v=(X*W) * VT- T, Dw=diag (| | wi||2), Dv=diag (| | vi||2)。 The W and V in above formula can be solved using gradient descent method, pseudocode is as follows:
Input:Picture feature matrix X=[x1, x2..., xn]T
Label oriental matrix Y=[y1, y2..., yn]T
Classify oriental matrix T=[t1, t2..., tn]T
Parameter lambda1And λ2
Critical value ε
Output:Parameter W and V
The algorithm of solution is as follows:
W=random (d, m), V=random (k, m) // imparting transition matrix mono- initial random value of W and V
Opt=| | W | |2,11||X*W-Y||2,12||(X*W)*VT-T||2,1
Do//iterative calculation converges to critical value ε until the excursion of object function
Opt '=opt
Dw=diag (| | wi | |2)
U=X*W-Y, v=(X*W) * VT-T
Du=diag (| | ui | |2), Dv=diag (| | vi | |2)
Δ w=Dw -1W+λ1XTDu -1u+λ2XTDv -1VV//gradient of the difference calculating target function on W and V directions
Δ v=λ2(Dv -1v)T*X*W
W=W- Δs w//constantly corrects the value of W and V, object function is restrained to minimum value
V=V- Δs v
Opt=| | W | |2,11||X*W-Y||2,12||(X*W)*VT-T||2,1
while|opt-opt’|≤ε
Output W, V
It can be seen from the above that according to the object function, can solve to obtain the W and the V.Therefore, can obtain described The feature vector of training file and the transformational relation of the label instruction vector of the trained file, i.e. Y=X*W, and obtain institute State the transformational relation of the vectorial classification instruction vector with the trained file of label instruction of trained file, i.e. T=Y*VT
, can be according to Y=when being labeled to file to be marked after transition matrix W and V are obtained using training set X*W, using the feature vector of file to be marked, the label for obtaining file to be marked indicates vector matrix, according to T=Y*VT, profit Vector is indicated with the label of file to be marked, is obtained the classification instruction vector matrix of the file to be marked, is selected the text to be marked The maximum point in several preceding maximum components and selection sort instruction vector matrix in the label instruction vector matrix of part The label for labelling result and classification results as the file to be marked are measured, so as to fulfill the automatic marking to the file to be marked.
The embodiment of the present invention is having by classifying the documents information and file characteristic has been fused among a majorized function In the case of having document classification information data, accuracy rate and hit rate that file labeling system improves mark can be improved.Here It is important to note that the file in the embodiment of the present invention can be picture file, audio file, video file or text text Part.I.e. the method for the present invention can not only be labeled picture, and audio, video or text can also be labeled, and improve certainly The accuracy rate and hit rate of dynamic mark.
The method of above-mentioned file mark is directed to, the embodiment of the present invention additionally provides corresponding file annotation equipment.Such as Fig. 3 It is shown, a kind of file annotation equipment, including:
First acquisition unit 301, for obtaining file to be marked;
Second acquisition unit 302, for according to the file to be marked, the feature vector of the acquisition file to be marked;
First converting unit 303, for the feature vector according to the file to be marked and the file to be marked Feature vector and the transformational relation of the label instruction vector of the file to be marked, obtain the label instruction of the file to be marked Vector;
Second converting unit 304, for indicating vectorial and described text to be marked according to the label of the file to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of part, obtains the file to be marked Classification instruction vector;
Unit 305 is marked, point for the file to be marked according to the label of the file to be marked instruction vector sum Class instruction vector, marks the file to be marked.
As shown in figure 4, the embodiment of the present invention also provides another file annotation equipment, including:
First acquisition unit 401, for obtaining file to be marked;
Second acquisition unit 402, for according to the file to be marked, the feature vector of the acquisition file to be marked;
First converting unit 403, for the feature vector according to the file to be marked and the file to be marked Feature vector and the transformational relation of the label instruction vector of the file to be marked, obtain the label instruction of the file to be marked Vector;
Second converting unit 404, for indicating vectorial and described text to be marked according to the label of the file to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of part, obtains the file to be marked Classification instruction vector;
Unit 408 is marked, point for the file to be marked according to the label of the file to be marked instruction vector sum Class instruction vector, marks the file to be marked.
The file annotation equipment further includes:
3rd acquiring unit 405, for obtaining the training file in file mark training set;
4th acquiring unit 406, for according to the trained file, obtaining the feature vector, described of the trained file The classification instruction vector of training file described in the label instruction vector sum of training file;
Transformational relation unit 407 is solved, for the feature vector according to the trained file, the label of the trained file Indicate the classification instruction vector of training file described in vector sum, obtain the feature vector of the trained file and the trained file The label instruction of transformational relation and the trained file of label instruction vector vectorial indicate with the classification of the trained file The transformational relation of vector, and the conversion by the label of the feature vector of the trained file and trained file instruction vector Feature vector and the transformational relation of the label instruction vector of the to be marked file of the relation as the file to be marked, by institute The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of trained file is stated as described to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of file.
Preferably, the solution transformational relation unit 407 is further used for establishing object function:min||W||+λ1||X* W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the eigenvectors matrix of the trained file, and Y represents the trained file Label instruction vector matrix, T represent the classification instruction vector matrix of the trained file, W is represented between the X and Y Conversion parameter matrix, V represents the conversion parameter matrix between the Y and T, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V, and for according to the object function, obtaining the W and described V so that the feature vector of the trained file transformational relation vectorial with the label instruction of the trained file is Y=X*W, with And the vectorial classification with the trained file of label instruction of the trained file is caused to indicate that the transformational relation of vector is T=Y* VT
Preferably, the mark unit 408 is further used for selecting in the label instruction vector of the file to be marked The component of predetermined number as the file to be marked label for labelling as a result, the mark file to be marked, wherein, be chosen Component value be more than non-selected component value, and for by the file to be marked classification instruction vector in most It is big to be worth component as the classification annotation of the file to be marked as a result, the mark file to be marked.
Preferably, the file to be marked is picture file, audio file, video file or text.
As shown in figure 5, the embodiment of the present invention additionally provides another file annotation equipment, which can be server Can be other computer equipments, which includes:Input interface 501, memory 502 and processor 503, the input interface 501st, memory 502 can be connected with processor 503 by bus 504 or other modes, wherein, Fig. 5 is with bus connecting mode Exemplified by.In the file annotation equipment, the input interface 501 is used to obtain file to be marked, and the processor 503 is used for According to the file to be marked, the feature vector of the file to be marked is obtained, and for according to the file to be marked Feature vector and the feature vector of the file to be marked and the transformational relation of the label instruction vector of the file to be marked, Obtain the file to be marked label instruction vector, and for according to the label of the file to be marked indicate vector and The label instruction of the file to be marked is vectorial to indicate the transformational relation of vector with classifying for the file to be marked, described in acquisition The classification instruction vector of file to be marked, and for be marked according to the label of the file to be marked instruction vector sum The classification instruction vector of file, marks the file to be marked.
Preferably, the input interface 501 is additionally operable to obtain the training file in file mark training set;The processor 503 are additionally operable to according to the trained file, obtain the feature vector of the trained file, the label of the trained file indicate to The classification instruction vector of amount and the trained file, and for the feature vector according to the trained file, the training text The classification instruction vector of training file described in the label instruction vector sum of part, obtain the feature vector of the trained file with it is described The transformational relation of label instruction vector and the label of the trained file of training file indicate vectorial and the trained file The transformational relation of classification instruction vector, and by the label of the feature vector of the trained file and the trained file indicate to Feature vector and the conversion of the label instruction vector of the to be marked file of the transformational relation of amount as the file to be marked Relation, using the transformational relation of the vectorial classification instruction vector with the trained file of the label instruction of the trained file as institute State the transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of file to be marked.
Preferably, the processor 503 is additionally operable to establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)* VT- T | |, wherein, X represents the eigenvectors matrix of the trained file, and Y represents the label instruction moment of a vector of the trained file Battle array, T represent the classification instruction vector matrix of the trained file, and W represents the conversion parameter matrix between the X and Y, V Represent the conversion parameter matrix between the Y and the T, min | | W | | represent the minimum norm of W, λ1、λ2Represent default ginseng Number, VTRepresent the transposed matrix of V, and for according to the object function, obtaining the W and the V so that the training text The feature vector of part and the transformational relation of the label instruction vector of the trained file are Y=X*W, and cause the training text The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of part is T=Y*VT
Preferably, the processor 503 is additionally operable to select predetermined in the label instruction vector of the file to be marked Several components as the file to be marked label for labelling as a result, the mark file to be marked, wherein, selected component Value be more than the value of non-selected component, and for the maximum point in the classification instruction vector by the file to be marked The classification annotation as the file to be marked is measured as a result, the mark file to be marked.
A kind of file mask method provided above the embodiment of the present invention, and corresponding device have carried out detailed Jie Continue, specific case used herein is set forth the principle of the present invention and embodiment, and the explanation of above example is only It is the method and its core concept for being used to help understand the present invention;Meanwhile for those of ordinary skill in the art, according to this hair Bright thought, there will be changes in specific embodiments and applications, in conclusion this specification content should not manage Solve as limitation of the present invention.

Claims (10)

  1. A kind of 1. file mask method, it is characterised in that including:
    Obtain file to be marked;
    According to the file to be marked, the feature vector of the acquisition file to be marked;
    According to the feature vector of the file to be marked and the feature vector of the file to be marked and the file to be marked Label instruction vector transformational relation, obtain the label instruction vector of the file to be marked;
    Indicate that the label instruction of vectorial and described file to be marked is vectorial according to the label of the file to be marked to treat with described The transformational relation of the classification instruction vector of file is marked, obtains the classification instruction vector of the file to be marked;
    The classification of file to be marked according to the label of the file to be marked indicates vector sum indicates vector, is treated described in mark File is marked, including:By the classification of file to be marked described in the label of the file to be marked instruction vector sum indicate to Amount mark indicates what classification instruction vector described in vector sum was combined to realize on the file to be marked by the label Mode is labeled the file to be marked.
  2. 2. file mask method according to claim 1, it is characterised in that in the mark for obtaining the file to be marked Before the step of label instruction vector, further include:
    Obtain the training file in file mark training set;
    According to the trained file, the feature vector of the trained file is obtained, the label of the trained file indicates vector sum The classification instruction vector of the trained file;
    The classification of training file according to the label of the feature vector of the trained file, the trained file indicates vector sum Instruction vector, obtains transformational relation and the institute of the feature vector of the trained file and the label instruction vector of the trained file State the transformational relation of the vectorial classification instruction vector with the trained file of label instruction of trained file, and by the training Feature of the feature vector of the file transformational relation vectorial with the label instruction of the trained file as the file to be marked The vectorial transformational relation with the label of the file to be marked instruction vector, by the label instruction vector of the trained file and institute The transformational relation for stating the classification instruction vector of trained file vectorial is waited to mark as the label instruction of the file to be marked with described The transformational relation of the classification instruction vector of explanatory notes part.
  3. 3. file mask method according to claim 2, it is characterised in that the feature for obtaining the trained file to Measure and indicate the vectorial and instruction with the label of the transformational relation of the label of trained file instruction vector and the trained file The step of transformational relation for practicing the classification instruction vector of file, includes:
    Establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the trained file Eigenvectors matrix, Y represents that the label instruction vector matrix of the trained file, T represent that the classification of the trained file refers to Show vector matrix, W represents the conversion parameter matrix between the X and Y, and V represents the conversion ginseng between the Y and T Matrix number, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V;
    According to the object function, the W and the V are obtained so that the feature vector of the trained file and the training text The transformational relation of the label instruction vector of part is Y=X*W, and causes the label of the trained file to indicate the vectorial and instruction The transformational relation for practicing the classification instruction vector of file is T=Y*VT
  4. 4. the file mask method according to claims 1 to 3 any one, it is characterised in that wait to mark described in the mark The step of explanatory notes part, includes:
    Mark of the component of predetermined number as the file to be marked is selected in the label instruction vector of the file to be marked Annotation results are signed, mark the file to be marked, wherein, the value of selected component is more than the value of non-selected component.
  5. 5. the file mask method according to claims 1 to 3 any one, it is characterised in that wait to mark described in the mark The step of explanatory notes part, includes:
    Classification annotation using the maximum component in the classification instruction vector of the file to be marked as the file to be marked As a result, the mark file to be marked.
  6. 6. the file mask method according to claims 1 to 3 any one, it is characterised in that the file to be marked is Picture file, audio file, video file or text.
  7. A kind of 7. file annotation equipment, it is characterised in that including:
    First acquisition unit, for obtaining file to be marked;
    Second acquisition unit, for according to the file to be marked, the feature vector of the acquisition file to be marked;
    First converting unit, for the feature vector according to the file to be marked and the feature vector of the file to be marked With the transformational relation of the label instruction vector of the file to be marked, the label instruction for obtaining the file to be marked is vectorial;
    Second converting unit, for indicating the label of vectorial and described file to be marked according to the label of the file to be marked Indicate the transformational relation of the vectorial classification instruction vector with the file to be marked, obtain the classification instruction of the file to be marked Vector;
    Mark unit, for according to the label of the file to be marked indicate vector sum described in file to be marked classification indicate to Amount, marks the file to be marked, wherein, specifically for be marked described in the label instruction vector sum by the file to be marked The classification instruction vector mark of file indicates classification described in vector sum to realize on the file to be marked by the label Indicate that the mode that vector is combined is labeled the file to be marked.
  8. 8. file annotation equipment according to claim 7, it is characterised in that further include:
    3rd acquiring unit, for obtaining the training file in file mark training set;
    4th acquiring unit, for according to the trained file, obtaining the feature vector of the trained file, the trained file Label instruction vector sum described in training file classification instruction vector;
    Solve transformational relation unit, the label for the feature vector according to the trained file, the trained file indicate to The classification instruction vector of amount and the trained file, obtains the feature vector of the trained file and the label of the trained file Indicate the vectorial classification instruction vector with the trained file of label instruction of the transformational relation and the trained file of vector Transformational relation, and the transformational relation of the label of the feature vector of the trained file and trained file instruction vector is made The transformational relation of vector is indicated with the label of the file to be marked for the feature vector of the file to be marked, by the training The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of file is as the file to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction.
  9. 9. file annotation equipment according to claim 8, it is characterised in that the solution transformational relation unit is further used In establishing object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the trained file Eigenvectors matrix, Y represent that the label instruction vector matrix of the trained file, T represent the classification instruction of the trained file Vector matrix, W represent the conversion parameter matrix between the X and Y, and V represents the conversion parameter between the Y and T Matrix, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V, and for root According to the object function, the W and the V are obtained so that the feature vector of the trained file and the mark of the trained file The transformational relation for signing instruction vector is Y=X*W, and causes the label of the trained file to indicate the vectorial and trained file Classification instruction vector transformational relation be T=Y*VT
  10. 10. the file annotation equipment according to claim 7 to 9 any one, it is characterised in that the unit that marks is into one Step is used to select the component of predetermined number as the file to be marked in the label instruction vector of the file to be marked Label for labelling as a result, the mark file to be marked, wherein, the value of selected component is more than the value of non-selected component, And the classification for the maximum component in the classification instruction vector using the file to be marked as the file to be marked Annotation results, mark the file to be marked.
CN201310206739.3A 2013-05-29 2013-05-29 File mask method and device Active CN104216882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310206739.3A CN104216882B (en) 2013-05-29 2013-05-29 File mask method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310206739.3A CN104216882B (en) 2013-05-29 2013-05-29 File mask method and device

Publications (2)

Publication Number Publication Date
CN104216882A CN104216882A (en) 2014-12-17
CN104216882B true CN104216882B (en) 2018-05-11

Family

ID=52098388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310206739.3A Active CN104216882B (en) 2013-05-29 2013-05-29 File mask method and device

Country Status (1)

Country Link
CN (1) CN104216882B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407407B (en) * 2016-09-22 2019-10-15 江苏通付盾科技有限公司 A kind of file labeling system and method
CN106897424A (en) * 2017-02-24 2017-06-27 北京时间股份有限公司 Information labeling system and method
CN107330069B (en) * 2017-06-30 2020-10-23 北京金山安全软件有限公司 Multimedia data processing method and device, server and storage medium
CN110555137A (en) * 2018-03-31 2019-12-10 华为技术有限公司 Label filling method and device
CN113127638B (en) * 2019-12-31 2024-09-24 华为技术有限公司 Data labeling method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207966A (en) * 2011-06-01 2011-10-05 华南理工大学 Video content quick retrieving method based on object tag
CN102737244A (en) * 2012-06-06 2012-10-17 哈尔滨工程大学 Method for determining corresponding relationships between areas and annotations in annotated image
CN102819746A (en) * 2012-07-10 2012-12-12 电子科技大学 Method for automatically marking category of remote sensing image based on author-genre theme model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207966A (en) * 2011-06-01 2011-10-05 华南理工大学 Video content quick retrieving method based on object tag
CN102737244A (en) * 2012-06-06 2012-10-17 哈尔滨工程大学 Method for determining corresponding relationships between areas and annotations in annotated image
CN102819746A (en) * 2012-07-10 2012-12-12 电子科技大学 Method for automatically marking category of remote sensing image based on author-genre theme model

Also Published As

Publication number Publication date
CN104216882A (en) 2014-12-17

Similar Documents

Publication Publication Date Title
CN107256246B (en) printed fabric image retrieval method based on convolutional neural network
CN103049446B (en) A kind of image search method and device
US20190347767A1 (en) Image processing method and device
CN104216882B (en) File mask method and device
WO2022199583A1 (en) Image processing method and apparatus, computer device, and storage medium
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
CN111325271B (en) Image classification method and device
CN105005980B (en) Image processing method and device
CN110097086A (en) Image generates model training method, image generating method, device, equipment and storage medium
CN103456013B (en) A kind of method representing similarity between super-pixel and tolerance super-pixel
CN110765882B (en) Video tag determination method, device, server and storage medium
CN102831244B (en) A kind of classification retrieving method of house property file and picture
CN114155445B (en) SAR image target detection method based on improvement YOLOv3
CN115482395B (en) Model training method, image classification device, electronic equipment and medium
CN108647264A (en) A kind of image automatic annotation method and device based on support vector machines
CN111177811A (en) Automatic fire point location layout method applied to cloud platform
CN103186880B (en) Generate the method and apparatus of thumbnail
CN113837965B (en) Image definition identification method and device, electronic equipment and storage medium
CN114782688A (en) Natural image segmentation method and system based on tensor subspace clustering
CN108304588B (en) Image retrieval method and system based on k neighbor and fuzzy pattern recognition
CN103955713A (en) Icon recognition method and device
CN107358244B (en) A kind of quick local invariant feature extracts and description method
CN106469437B (en) Image processing method and image processing apparatus
CN109636711B (en) Cartoon album generating method, cartoon album generating device and computer readable storage medium
CN106056575B (en) A kind of image matching method based on like physical property proposed algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant