CN104216882B - File mask method and device - Google Patents
File mask method and device Download PDFInfo
- Publication number
- CN104216882B CN104216882B CN201310206739.3A CN201310206739A CN104216882B CN 104216882 B CN104216882 B CN 104216882B CN 201310206739 A CN201310206739 A CN 201310206739A CN 104216882 B CN104216882 B CN 104216882B
- Authority
- CN
- China
- Prior art keywords
- file
- marked
- vector
- label
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of file mask method, it includes:According to file to be marked, the feature vector of file to be marked is obtained;It is vectorial according to the feature vector of file to be marked and the feature vector of file to be marked and the transformational relation of the label instruction vector of file to be marked, the label instruction for obtaining file to be marked;The vectorial transformational relation vectorial with the classification instruction of file to be marked of the label instruction of file vectorial and to be marked is indicated according to the label of file to be marked, obtains the classification instruction vector of file to be marked;The classification for indicating vector sum file to be marked according to the label of file to be marked indicates vector, marks file to be marked.Since classification instruction vector generally has fairly good normalization, noise is small, therefore, the present invention is labeled file the harmful effect that can be reduced noise existing for User Defined labeled data and be brought to automatic marking in a manner of label instruction vector sum classification instruction vector is combined, and improves the accuracy rate and hit rate of automatic marking.
Description
Technical field
The present invention relates to document classification field, and in particular to a kind of file mask method and device.
Background technology
At present, image labeling can be divided into mark and two class of automatic marking manually.Wherein, although mark can be effective manually
Image is labeled, but efficiency is low.With the sharp increase of image file quantity in internet, mark is difficult manually
To meet the requirements, automatic marking comes into being for this.
Conventional images automatic marking method includes extracting feature (such as color, texture, the office of every pictures in training set
Portion's feature etc.) vector;After the feature vector per pictures is obtained, in the feature space that the feature vector of all pictures forms
In, using support vector machines (SUPPORT VECTOR MACHINE, SVM) method, for each one grader of label training,
Obtain multiple graders;Classified using grader to the feature vector of picture to be marked, obtain picture to be marked
Label for labelling result.
However, in existing image automatic annotation method, the label of image using User Defined labeled data,
User Defined labeled data contains substantial amounts of irregular and does not have significant label (for example, " qeiox ", " 2000 " etc.),
In the presence of very big noise, the performance of grader is have impact on, reduces the accuracy rate and hit rate of mark.Therefore, how to reduce
It is current urgent problem to be solved to the harmful effect that automatic marking is brought to state noise.
The content of the invention
A kind of file mask method and device are provided in the embodiment of the present invention, deposited for reducing User Defined labeled data
The harmful effect that is brought to automatic marking of noise, improve the accuracy rate and hit rate of automatic marking.
A kind of file mask method that first aspect present invention provides, including:
Obtain file to be marked;
According to the file to be marked, the feature vector of the acquisition file to be marked;
According to the feature vector of the file to be marked and the feature vector of the file to be marked with it is described to be marked
The transformational relation of the label instruction vector of file, obtains the label instruction vector of the file to be marked;
Label instruction vector and the institute of vectorial and described file to be marked are indicated according to the label of the file to be marked
The transformational relation of the classification instruction vector of file to be marked is stated, obtains the classification instruction vector of the file to be marked;
The classification of file to be marked indicates vector according to the label of the file to be marked indicates vector sum, marks institute
State file to be marked.
With reference to the first aspect of the present invention, in the first possible implementation, the text to be marked is obtained described
Before the step of the label instruction vector of part, further include:
Obtain the training file in file mark training set;
According to the trained file, obtain the feature vector of the trained file, the label of the trained file indicate to
The classification instruction vector of amount and the trained file;
The training file according to the label of the feature vector of the trained file, the trained file indicates vector sum
Classification instruction vector, obtains the transformational relation of the feature vector of the trained file and the label instruction vector of the trained file
The transformational relation of the vectorial classification instruction vector with the trained file is indicated with the label of the trained file, and by described in
The feature vector of the training file transformational relation vectorial with the label instruction of the trained file is as the file to be marked
Feature vector and the transformational relation of the label instruction vector of the file to be marked, vector is indicated by the label of the trained file
With the trained file classification instruction vector transformational relation as the file to be marked label instruction it is vectorial with it is described
The transformational relation of the classification instruction vector of file to be marked.
With reference to the first possible implementation of the first aspect of the present invention, in second of possible implementation,
Transformational relation and the instruction of the feature vector for obtaining the trained file with the label instruction vector of the trained file
Practicing vectorial the step of the indicating vectorial transformational relation of classifying with the trained file of label instruction of file includes:
Establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the training
The eigenvectors matrix of file, Y represent that the label instruction vector matrix of the trained file, T represent point of the trained file
Class indicates vector matrix, and W represents the conversion parameter matrix between the X and Y, and V represents turn between the Y and T
Parameter matrix is changed, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V;
According to the object function, the W and the V are obtained so that the feature vector of the trained file and the instruction
The transformational relation for practicing the label instruction vector of file is Y=X*W, and causes label instruction vector and the institute of the trained file
The transformational relation for stating the classification instruction vector of trained file is T=Y*VT.
With reference to the second of the first possible implementation or first aspect of the first aspect of the present invention, first aspect
The possible implementation of kind, in the third possible implementation, described the step of marking the file to be marked, includes:
The component of predetermined number is selected as the file to be marked in the label instruction vector of the file to be marked
Label for labelling as a result, the mark file to be marked, wherein, the value of selected component is more than non-selected component
Value.
With reference to the second of the first possible implementation or first aspect of the first aspect of the present invention, first aspect
The possible implementation of kind, in the 4th kind of possible implementation, described the step of marking the file to be marked, includes:Will
The file to be marked classification instruction vector in maximum component as the file to be marked classification annotation as a result, mark
Note the file to be marked.
With reference to second of the first possible implementation or first aspect of the first aspect of the present invention, first aspect
Possible implementation, in the third possible implementation, the file to be marked is picture file, audio file, regards
Frequency file or text.
The second aspect of the present invention provides a kind of file annotation equipment, including:
First acquisition unit, for obtaining file to be marked;
Second acquisition unit, for according to the file to be marked, the feature vector of the acquisition file to be marked;
First converting unit, for the feature vector according to the file to be marked and the feature of the file to be marked
The transformational relation of the vectorial instruction vector with the label of the file to be marked, obtain the file to be marked label indicate to
Amount;
Second converting unit, for indicating vectorial and described file to be marked according to the label of the file to be marked
The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction, obtains the classification of the file to be marked
Instruction vector;
Unit is marked, the classification for the file to be marked according to the label of the file to be marked instruction vector sum refers to
Show vector, mark the file to be marked.
With reference to the second aspect of the present invention, in the first possible implementation, the file annotation equipment further includes:
3rd acquiring unit, for obtaining the training file in file mark training set;
4th acquiring unit, for according to the trained file, obtaining the feature vector of the trained file, the training
The classification instruction vector of training file described in the label instruction vector sum of file;
Transformational relation unit is solved, the label for the feature vector according to the trained file, the trained file refers to
Show the classification instruction vector of training file described in vector sum, obtain the feature vector of the trained file and the trained file
The label instruction of the transformational relation and the trained file of label instruction vector is vectorial indicate with trained file classification to
The transformational relation of amount, and the conversion of the label of the feature vector of the trained file and trained file instruction vector is closed
It is the feature vector and the transformational relation of the label instruction vector of the file to be marked as the file to be marked, by described in
The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of training file is as the text to be marked
The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of part.
With reference to the first possible implementation of the second aspect of the present invention, in second of possible implementation,
The solution transformational relation unit is further used for establishing object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT-T
| |, wherein, X represents the eigenvectors matrix of the trained file, and Y represents the label instruction vector matrix of the trained file, T
Represent the classification instruction vector matrix of the trained file, W represents the conversion parameter matrix between the X and Y, and V is represented
Conversion parameter matrix between the Y and the T, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VT
Represent the transposed matrix of V, and for according to the object function, obtaining the W and the V so that the trained file
Feature vector and the transformational relation of the label instruction vector of the trained file are Y=X*W, and cause the trained file
The transformational relation of the vectorial classification instruction vector with the trained file of label instruction is T=Y*VT。
With reference to the first and second of possible implementation of the second aspect of the present invention, second aspect, at the third
In possible implementation, the mark unit is further used for selecting in the label instruction vector of the file to be marked pre-
The component of number is determined as the label for labelling of the file to be marked as a result, the mark file to be marked, wherein, it is selected
The value of component is more than the value of non-selected component, and for the maximum in the classification instruction vector by the file to be marked
It is worth component as the classification annotation of the file to be marked as a result, the mark file to be marked.
A kind of file mask method is provided in the embodiment of the present invention, it indicates vector according to the label of the file to be marked
Classification with the file to be marked indicates vector, and the file to be marked is labeled.Since classification instruction vector is general
With fairly good normalization, noise is small, and therefore, the present invention is in a manner of label instruction vector sum classification instruction vector is combined
File, which is labeled, can reduce the harmful effect that noise existing for User Defined labeled data brings automatic marking, carry
The accuracy rate and hit rate of high automatic marking.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is a kind of flow chart of file mask method provided in an embodiment of the present invention;
Fig. 2 is the step flow chart before the label instruction vector of the file to be marked is obtained;
Fig. 3 is a kind of structure diagram of file annotation equipment provided in an embodiment of the present invention;
Fig. 4 is the structure diagram of another file annotation equipment provided in an embodiment of the present invention;
Fig. 5 is the structure diagram of another file annotation equipment provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts
Embodiment, belongs to the scope of protection of the invention.
As shown in Figure 1, a kind of file mask method of the embodiment of the present invention, including:
101st, file to be marked is obtained.
The file to be marked can be picture file, audio file, video file or text.
102nd, according to the file to be marked, the feature vector of the acquisition file to be marked.
When the file to be marked is picture, scale invariant feature conversion (Scale-invariant can be used
Feature transform, SIFT) every pictures are calculated in algorithm and feature cluster (Bag ofFeature, BoF) method
Feature vector.
103rd, treated according to the feature vector of the feature vector of the file to be marked and the file to be marked with described
The transformational relation of the label instruction vector of file is marked, obtains the label instruction vector of the file to be marked.
After the feature vector of the file to be marked is obtained, according to the feature vector of the file to be marked with it is described
The transformational relation of the label instruction vector of file to be marked, is calculated the label instruction vector of the file to be marked.
104th, the label for indicating vectorial and described file to be marked according to the label of the file to be marked indicates vector
With the transformational relation of the classification instruction vector of the file to be marked, the classification for obtaining the file to be marked indicates vector.
After the label instruction vector of the file to be marked is obtained, according to the label of the file to be marked indicate to
Amount and the transformational relation of the classification instruction vector of the file to be marked, the classification that the file to be marked can be calculated refer to
Show vector.
105th, the classification of file to be marked according to the label of the file to be marked indicates vector sum indicates vector, mark
Note the file to be marked.
After the classification instruction vector of file to be marked described in the label instruction vector sum of the file to be marked is obtained,
By the classification instruction vector mark of file to be marked described in the label instruction vector sum of the file to be marked in described to be marked
On file.Since classification instruction vector generally has fairly good normalization, noise is small, and therefore, the embodiment of the present invention is with label
User Defined labeled data can be reduced and deposit by indicating that the mode that is combined of vector sum classification instruction vector is labeled file
The harmful effect that is brought to automatic marking of noise, improve the accuracy rate and hit rate of automatic marking.
In order to further improve the accuracy rate and hit rate that improve automatic marking, the label of the file to be marked is being obtained
After the classification instruction vector for indicating file to be marked described in vector sum, selected in the label instruction vector of the file to be marked
The component of predetermined number is selected as the label for labelling of the file to be marked as a result, the mark file to be marked, wherein, it is chosen
The value for the component selected is more than the value of non-selected component.Selected in the label instruction vector of the file to be marked predetermined
The maximum component of number as the file to be marked label for labelling as a result, the mark file to be marked, and/or, will
The file to be marked classification instruction vector in maximum component as the file to be marked classification annotation as a result, mark
Note the file to be marked.
The feature vector of the file to be marked in step 103 turns with the label instruction vector of the file to be marked
The label instruction of the file to be marked changed in relation and step 104 is vectorial indicate with the file to be marked classification to
The transformational relation of amount can mark the training file in training set by file and be solved to obtain.Specifically, as shown in Fig. 2,
Before the step of the label instruction vector for obtaining the file to be marked, the file mask method further includes:
201st, the training file in file mark training set is obtained.
For example, file to be marked is picture, then obtains picture and practice the training picture x concentrated1、x2..., xn.Such as the institute of table 1
Show there is corresponding label for labelling to have corresponding label and classification per pictures per pictures.
Table 1
Training picture | Label | Classification |
Picture 1 | Lotus, landscape, summer | Plant |
Picture 2 | Sunset clouds, the setting sun, landscape, sea | Landscape |
…… | …… | …… |
Picture n | Sofa, doggie | Pet |
202nd, according to the trained file, the feature vector of the trained file is obtained, the label of the trained file refers to
Show the classification instruction vector of training file described in vector sum.
For example, the feature vector x of every pictures in training set can be calculated by SIFT algorithms and BoF methodsi
∈Rd, R is vector space, so as to obtain n pictures eigenmatrix X=[x1, x2..., xn]T∈Rn*d, wherein d is characterized dimension
Degree, n are the integer more than 0;At the same time sample label matrix Y=[y are obtained with grouped data from the mark of picture1, y2..., yn]T
∈Rn*m, wherein, yi∈ { 0,1 }mRepresent that the label of the i-th pictures indicates that (m represents that the picture is noted as to vector when dimension is 1
M-th of label), wherein m is the quantity of label, and picture classification matrix T=[t1, t2..., tn]T∈Rn*k, wherein, ti∈
{ 0,1 }kRepresent the classification instruction vector (kth dimension represents that the picture is divided into k-th of classification when being 1) of the i-th pictures, wherein k
For the quantity of classification.
203rd, the training text according to the label of the feature vector of the trained file, the trained file indicates vector sum
The classification instruction vector of part, obtains the conversion of the feature vector of the trained file and the label instruction vector of the trained file
The transformational relation of the vectorial classification instruction vector with the trained file of the label of relation and trained file instruction.
204th, the transformational relation of the label of the feature vector of the trained file and trained file instruction vector is made
The transformational relation of vector is indicated with the label of the file to be marked for the feature vector of the file to be marked, by the training
The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of file is as the file to be marked
The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction.
In embodiments of the present invention, the feature vector of file can try to achieve label instruction vector, label by transition matrix
Instruction vector can try to achieve classification instruction vector by another transition matrix.Specifically, the spy for obtaining the trained file
Levy the vectorial label with the transformational relation of the label of trained file instruction vector and the trained file and indicate vector and institute
The step of transformational relation for stating the classification instruction vector of trained file, includes:
Establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the training
The eigenvectors matrix of file, Y represent that the label instruction vector matrix of the trained file, T represent point of the trained file
Class indicates vector matrix, and W represents the conversion parameter matrix between the X and Y, and V represents turn between the Y and T
Parameter matrix is changed, min | | W | | represent the minimum norm of W, VTRepresent the transposed matrix of V, λ1、λ2Represent default parameter, this two
A default parameter is determined according to X, Y, T data set formed, can specifically select different numerical value to carry out after repeatedly attempting
Determine.For training every time, described two default parameters are fixed values;
According to the object function, the W and the V are obtained so that the feature vector of the trained file and the instruction
The transformational relation for practicing the label instruction vector of file is Y=X*W, and causes label instruction vector and the institute of the trained file
The transformational relation for stating the classification instruction vector of trained file is T=Y*VT。
Need exist for illustrating, the norm in object function can be 2,1- norms, or 2,2- norms.Cause
This, object function can be specifically expressed as:
min||W||2,1+λ1||X*W-Y||2,1+λ2||(X*W)*VT-T||2,1, | | W | |2,1Represent 2, the 1- norms of W, | | W
||2,1=∑i||wi||2, wiThe i-th row vector of representing matrix W;
Object function can also be expressed as:min||W||2,2+λ1||X*W-Y||2,2+λ2||(X*W)*VT-T||2,2, wherein,
||W||2,2Represent 2, the 2- norms of W, also referred to as fro norms,Wherein Tr representing matrixes
Trace function.
, can be by asking object function local derviation and gradient descent method to solve to obtain square after the object function is established
Battle array W and matrix V, W ∈ Rd*m, V ∈ Rk*m, d is file characteristic dimension, and m is number of labels, and k is classification quantity.Specifically, to institute
The W stated in object function carries out single order local derviation, obtains formula:
Dw -1W+λ1XTDu -1u+λ2XTDv -1vV
Wherein temporary variable u=X*W-Y, v=(X*W) * VT- T, Dw=diag (| | wi||2), Dv=diag (| | vi||2)。
The W and V in above formula can be solved using gradient descent method, pseudocode is as follows:
Input:Picture feature matrix X=[x1, x2..., xn]T
Label oriental matrix Y=[y1, y2..., yn]T
Classify oriental matrix T=[t1, t2..., tn]T
Parameter lambda1And λ2
Critical value ε
Output:Parameter W and V
The algorithm of solution is as follows:
W=random (d, m), V=random (k, m) // imparting transition matrix mono- initial random value of W and V
Opt=| | W | |2,1+λ1||X*W-Y||2,1+λ2||(X*W)*VT-T||2,1
Do//iterative calculation converges to critical value ε until the excursion of object function
Opt '=opt
Dw=diag (| | wi | |2)
U=X*W-Y, v=(X*W) * VT-T
Du=diag (| | ui | |2), Dv=diag (| | vi | |2)
Δ w=Dw -1W+λ1XTDu -1u+λ2XTDv -1VV//gradient of the difference calculating target function on W and V directions
Δ v=λ2(Dv -1v)T*X*W
W=W- Δs w//constantly corrects the value of W and V, object function is restrained to minimum value
V=V- Δs v
Opt=| | W | |2,1+λ1||X*W-Y||2,1+λ2||(X*W)*VT-T||2,1
while|opt-opt’|≤ε
Output W, V
It can be seen from the above that according to the object function, can solve to obtain the W and the V.Therefore, can obtain described
The feature vector of training file and the transformational relation of the label instruction vector of the trained file, i.e. Y=X*W, and obtain institute
State the transformational relation of the vectorial classification instruction vector with the trained file of label instruction of trained file, i.e. T=Y*VT。
, can be according to Y=when being labeled to file to be marked after transition matrix W and V are obtained using training set
X*W, using the feature vector of file to be marked, the label for obtaining file to be marked indicates vector matrix, according to T=Y*VT, profit
Vector is indicated with the label of file to be marked, is obtained the classification instruction vector matrix of the file to be marked, is selected the text to be marked
The maximum point in several preceding maximum components and selection sort instruction vector matrix in the label instruction vector matrix of part
The label for labelling result and classification results as the file to be marked are measured, so as to fulfill the automatic marking to the file to be marked.
The embodiment of the present invention is having by classifying the documents information and file characteristic has been fused among a majorized function
In the case of having document classification information data, accuracy rate and hit rate that file labeling system improves mark can be improved.Here
It is important to note that the file in the embodiment of the present invention can be picture file, audio file, video file or text text
Part.I.e. the method for the present invention can not only be labeled picture, and audio, video or text can also be labeled, and improve certainly
The accuracy rate and hit rate of dynamic mark.
The method of above-mentioned file mark is directed to, the embodiment of the present invention additionally provides corresponding file annotation equipment.Such as Fig. 3
It is shown, a kind of file annotation equipment, including:
First acquisition unit 301, for obtaining file to be marked;
Second acquisition unit 302, for according to the file to be marked, the feature vector of the acquisition file to be marked;
First converting unit 303, for the feature vector according to the file to be marked and the file to be marked
Feature vector and the transformational relation of the label instruction vector of the file to be marked, obtain the label instruction of the file to be marked
Vector;
Second converting unit 304, for indicating vectorial and described text to be marked according to the label of the file to be marked
The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of part, obtains the file to be marked
Classification instruction vector;
Unit 305 is marked, point for the file to be marked according to the label of the file to be marked instruction vector sum
Class instruction vector, marks the file to be marked.
As shown in figure 4, the embodiment of the present invention also provides another file annotation equipment, including:
First acquisition unit 401, for obtaining file to be marked;
Second acquisition unit 402, for according to the file to be marked, the feature vector of the acquisition file to be marked;
First converting unit 403, for the feature vector according to the file to be marked and the file to be marked
Feature vector and the transformational relation of the label instruction vector of the file to be marked, obtain the label instruction of the file to be marked
Vector;
Second converting unit 404, for indicating vectorial and described text to be marked according to the label of the file to be marked
The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of part, obtains the file to be marked
Classification instruction vector;
Unit 408 is marked, point for the file to be marked according to the label of the file to be marked instruction vector sum
Class instruction vector, marks the file to be marked.
The file annotation equipment further includes:
3rd acquiring unit 405, for obtaining the training file in file mark training set;
4th acquiring unit 406, for according to the trained file, obtaining the feature vector, described of the trained file
The classification instruction vector of training file described in the label instruction vector sum of training file;
Transformational relation unit 407 is solved, for the feature vector according to the trained file, the label of the trained file
Indicate the classification instruction vector of training file described in vector sum, obtain the feature vector of the trained file and the trained file
The label instruction of transformational relation and the trained file of label instruction vector vectorial indicate with the classification of the trained file
The transformational relation of vector, and the conversion by the label of the feature vector of the trained file and trained file instruction vector
Feature vector and the transformational relation of the label instruction vector of the to be marked file of the relation as the file to be marked, by institute
The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of trained file is stated as described to be marked
The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of file.
Preferably, the solution transformational relation unit 407 is further used for establishing object function:min||W||+λ1||X*
W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the eigenvectors matrix of the trained file, and Y represents the trained file
Label instruction vector matrix, T represent the classification instruction vector matrix of the trained file, W is represented between the X and Y
Conversion parameter matrix, V represents the conversion parameter matrix between the Y and T, min | | W | | represent the minimum norm of W,
λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V, and for according to the object function, obtaining the W and described
V so that the feature vector of the trained file transformational relation vectorial with the label instruction of the trained file is Y=X*W, with
And the vectorial classification with the trained file of label instruction of the trained file is caused to indicate that the transformational relation of vector is T=Y*
VT。
Preferably, the mark unit 408 is further used for selecting in the label instruction vector of the file to be marked
The component of predetermined number as the file to be marked label for labelling as a result, the mark file to be marked, wherein, be chosen
Component value be more than non-selected component value, and for by the file to be marked classification instruction vector in most
It is big to be worth component as the classification annotation of the file to be marked as a result, the mark file to be marked.
Preferably, the file to be marked is picture file, audio file, video file or text.
As shown in figure 5, the embodiment of the present invention additionally provides another file annotation equipment, which can be server
Can be other computer equipments, which includes:Input interface 501, memory 502 and processor 503, the input interface
501st, memory 502 can be connected with processor 503 by bus 504 or other modes, wherein, Fig. 5 is with bus connecting mode
Exemplified by.In the file annotation equipment, the input interface 501 is used to obtain file to be marked, and the processor 503 is used for
According to the file to be marked, the feature vector of the file to be marked is obtained, and for according to the file to be marked
Feature vector and the feature vector of the file to be marked and the transformational relation of the label instruction vector of the file to be marked,
Obtain the file to be marked label instruction vector, and for according to the label of the file to be marked indicate vector and
The label instruction of the file to be marked is vectorial to indicate the transformational relation of vector with classifying for the file to be marked, described in acquisition
The classification instruction vector of file to be marked, and for be marked according to the label of the file to be marked instruction vector sum
The classification instruction vector of file, marks the file to be marked.
Preferably, the input interface 501 is additionally operable to obtain the training file in file mark training set;The processor
503 are additionally operable to according to the trained file, obtain the feature vector of the trained file, the label of the trained file indicate to
The classification instruction vector of amount and the trained file, and for the feature vector according to the trained file, the training text
The classification instruction vector of training file described in the label instruction vector sum of part, obtain the feature vector of the trained file with it is described
The transformational relation of label instruction vector and the label of the trained file of training file indicate vectorial and the trained file
The transformational relation of classification instruction vector, and by the label of the feature vector of the trained file and the trained file indicate to
Feature vector and the conversion of the label instruction vector of the to be marked file of the transformational relation of amount as the file to be marked
Relation, using the transformational relation of the vectorial classification instruction vector with the trained file of the label instruction of the trained file as institute
State the transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction of file to be marked.
Preferably, the processor 503 is additionally operable to establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*
VT- T | |, wherein, X represents the eigenvectors matrix of the trained file, and Y represents the label instruction moment of a vector of the trained file
Battle array, T represent the classification instruction vector matrix of the trained file, and W represents the conversion parameter matrix between the X and Y, V
Represent the conversion parameter matrix between the Y and the T, min | | W | | represent the minimum norm of W, λ1、λ2Represent default ginseng
Number, VTRepresent the transposed matrix of V, and for according to the object function, obtaining the W and the V so that the training text
The feature vector of part and the transformational relation of the label instruction vector of the trained file are Y=X*W, and cause the training text
The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of part is T=Y*VT。
Preferably, the processor 503 is additionally operable to select predetermined in the label instruction vector of the file to be marked
Several components as the file to be marked label for labelling as a result, the mark file to be marked, wherein, selected component
Value be more than the value of non-selected component, and for the maximum point in the classification instruction vector by the file to be marked
The classification annotation as the file to be marked is measured as a result, the mark file to be marked.
A kind of file mask method provided above the embodiment of the present invention, and corresponding device have carried out detailed Jie
Continue, specific case used herein is set forth the principle of the present invention and embodiment, and the explanation of above example is only
It is the method and its core concept for being used to help understand the present invention;Meanwhile for those of ordinary skill in the art, according to this hair
Bright thought, there will be changes in specific embodiments and applications, in conclusion this specification content should not manage
Solve as limitation of the present invention.
Claims (10)
- A kind of 1. file mask method, it is characterised in that including:Obtain file to be marked;According to the file to be marked, the feature vector of the acquisition file to be marked;According to the feature vector of the file to be marked and the feature vector of the file to be marked and the file to be marked Label instruction vector transformational relation, obtain the label instruction vector of the file to be marked;Indicate that the label instruction of vectorial and described file to be marked is vectorial according to the label of the file to be marked to treat with described The transformational relation of the classification instruction vector of file is marked, obtains the classification instruction vector of the file to be marked;The classification of file to be marked according to the label of the file to be marked indicates vector sum indicates vector, is treated described in mark File is marked, including:By the classification of file to be marked described in the label of the file to be marked instruction vector sum indicate to Amount mark indicates what classification instruction vector described in vector sum was combined to realize on the file to be marked by the label Mode is labeled the file to be marked.
- 2. file mask method according to claim 1, it is characterised in that in the mark for obtaining the file to be marked Before the step of label instruction vector, further include:Obtain the training file in file mark training set;According to the trained file, the feature vector of the trained file is obtained, the label of the trained file indicates vector sum The classification instruction vector of the trained file;The classification of training file according to the label of the feature vector of the trained file, the trained file indicates vector sum Instruction vector, obtains transformational relation and the institute of the feature vector of the trained file and the label instruction vector of the trained file State the transformational relation of the vectorial classification instruction vector with the trained file of label instruction of trained file, and by the training Feature of the feature vector of the file transformational relation vectorial with the label instruction of the trained file as the file to be marked The vectorial transformational relation with the label of the file to be marked instruction vector, by the label instruction vector of the trained file and institute The transformational relation for stating the classification instruction vector of trained file vectorial is waited to mark as the label instruction of the file to be marked with described The transformational relation of the classification instruction vector of explanatory notes part.
- 3. file mask method according to claim 2, it is characterised in that the feature for obtaining the trained file to Measure and indicate the vectorial and instruction with the label of the transformational relation of the label of trained file instruction vector and the trained file The step of transformational relation for practicing the classification instruction vector of file, includes:Establish object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the trained file Eigenvectors matrix, Y represents that the label instruction vector matrix of the trained file, T represent that the classification of the trained file refers to Show vector matrix, W represents the conversion parameter matrix between the X and Y, and V represents the conversion ginseng between the Y and T Matrix number, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V;According to the object function, the W and the V are obtained so that the feature vector of the trained file and the training text The transformational relation of the label instruction vector of part is Y=X*W, and causes the label of the trained file to indicate the vectorial and instruction The transformational relation for practicing the classification instruction vector of file is T=Y*VT。
- 4. the file mask method according to claims 1 to 3 any one, it is characterised in that wait to mark described in the mark The step of explanatory notes part, includes:Mark of the component of predetermined number as the file to be marked is selected in the label instruction vector of the file to be marked Annotation results are signed, mark the file to be marked, wherein, the value of selected component is more than the value of non-selected component.
- 5. the file mask method according to claims 1 to 3 any one, it is characterised in that wait to mark described in the mark The step of explanatory notes part, includes:Classification annotation using the maximum component in the classification instruction vector of the file to be marked as the file to be marked As a result, the mark file to be marked.
- 6. the file mask method according to claims 1 to 3 any one, it is characterised in that the file to be marked is Picture file, audio file, video file or text.
- A kind of 7. file annotation equipment, it is characterised in that including:First acquisition unit, for obtaining file to be marked;Second acquisition unit, for according to the file to be marked, the feature vector of the acquisition file to be marked;First converting unit, for the feature vector according to the file to be marked and the feature vector of the file to be marked With the transformational relation of the label instruction vector of the file to be marked, the label instruction for obtaining the file to be marked is vectorial;Second converting unit, for indicating the label of vectorial and described file to be marked according to the label of the file to be marked Indicate the transformational relation of the vectorial classification instruction vector with the file to be marked, obtain the classification instruction of the file to be marked Vector;Mark unit, for according to the label of the file to be marked indicate vector sum described in file to be marked classification indicate to Amount, marks the file to be marked, wherein, specifically for be marked described in the label instruction vector sum by the file to be marked The classification instruction vector mark of file indicates classification described in vector sum to realize on the file to be marked by the label Indicate that the mode that vector is combined is labeled the file to be marked.
- 8. file annotation equipment according to claim 7, it is characterised in that further include:3rd acquiring unit, for obtaining the training file in file mark training set;4th acquiring unit, for according to the trained file, obtaining the feature vector of the trained file, the trained file Label instruction vector sum described in training file classification instruction vector;Solve transformational relation unit, the label for the feature vector according to the trained file, the trained file indicate to The classification instruction vector of amount and the trained file, obtains the feature vector of the trained file and the label of the trained file Indicate the vectorial classification instruction vector with the trained file of label instruction of the transformational relation and the trained file of vector Transformational relation, and the transformational relation of the label of the feature vector of the trained file and trained file instruction vector is made The transformational relation of vector is indicated with the label of the file to be marked for the feature vector of the file to be marked, by the training The transformational relation of the vectorial classification instruction vector with the trained file of label instruction of file is as the file to be marked The transformational relation of the vectorial classification instruction vector with the file to be marked of label instruction.
- 9. file annotation equipment according to claim 8, it is characterised in that the solution transformational relation unit is further used In establishing object function:min||W||+λ1||X*W-Y||+λ2||(X*W)*VT- T | |, wherein, X represents the trained file Eigenvectors matrix, Y represent that the label instruction vector matrix of the trained file, T represent the classification instruction of the trained file Vector matrix, W represent the conversion parameter matrix between the X and Y, and V represents the conversion parameter between the Y and T Matrix, min | | W | | represent the minimum norm of W, λ1、λ2Represent default parameter, VTRepresent the transposed matrix of V, and for root According to the object function, the W and the V are obtained so that the feature vector of the trained file and the mark of the trained file The transformational relation for signing instruction vector is Y=X*W, and causes the label of the trained file to indicate the vectorial and trained file Classification instruction vector transformational relation be T=Y*VT。
- 10. the file annotation equipment according to claim 7 to 9 any one, it is characterised in that the unit that marks is into one Step is used to select the component of predetermined number as the file to be marked in the label instruction vector of the file to be marked Label for labelling as a result, the mark file to be marked, wherein, the value of selected component is more than the value of non-selected component, And the classification for the maximum component in the classification instruction vector using the file to be marked as the file to be marked Annotation results, mark the file to be marked.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310206739.3A CN104216882B (en) | 2013-05-29 | 2013-05-29 | File mask method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310206739.3A CN104216882B (en) | 2013-05-29 | 2013-05-29 | File mask method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104216882A CN104216882A (en) | 2014-12-17 |
CN104216882B true CN104216882B (en) | 2018-05-11 |
Family
ID=52098388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310206739.3A Active CN104216882B (en) | 2013-05-29 | 2013-05-29 | File mask method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104216882B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407407B (en) * | 2016-09-22 | 2019-10-15 | 江苏通付盾科技有限公司 | A kind of file labeling system and method |
CN106897424A (en) * | 2017-02-24 | 2017-06-27 | 北京时间股份有限公司 | Information labeling system and method |
CN107330069B (en) * | 2017-06-30 | 2020-10-23 | 北京金山安全软件有限公司 | Multimedia data processing method and device, server and storage medium |
CN110555137A (en) * | 2018-03-31 | 2019-12-10 | 华为技术有限公司 | Label filling method and device |
CN113127638B (en) * | 2019-12-31 | 2024-09-24 | 华为技术有限公司 | Data labeling method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102207966A (en) * | 2011-06-01 | 2011-10-05 | 华南理工大学 | Video content quick retrieving method based on object tag |
CN102737244A (en) * | 2012-06-06 | 2012-10-17 | 哈尔滨工程大学 | Method for determining corresponding relationships between areas and annotations in annotated image |
CN102819746A (en) * | 2012-07-10 | 2012-12-12 | 电子科技大学 | Method for automatically marking category of remote sensing image based on author-genre theme model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8774515B2 (en) * | 2011-04-20 | 2014-07-08 | Xerox Corporation | Learning structured prediction models for interactive image labeling |
-
2013
- 2013-05-29 CN CN201310206739.3A patent/CN104216882B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102207966A (en) * | 2011-06-01 | 2011-10-05 | 华南理工大学 | Video content quick retrieving method based on object tag |
CN102737244A (en) * | 2012-06-06 | 2012-10-17 | 哈尔滨工程大学 | Method for determining corresponding relationships between areas and annotations in annotated image |
CN102819746A (en) * | 2012-07-10 | 2012-12-12 | 电子科技大学 | Method for automatically marking category of remote sensing image based on author-genre theme model |
Also Published As
Publication number | Publication date |
---|---|
CN104216882A (en) | 2014-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107256246B (en) | printed fabric image retrieval method based on convolutional neural network | |
CN103049446B (en) | A kind of image search method and device | |
US20190347767A1 (en) | Image processing method and device | |
CN104216882B (en) | File mask method and device | |
WO2022199583A1 (en) | Image processing method and apparatus, computer device, and storage medium | |
CN111950723B (en) | Neural network model training method, image processing method, device and terminal equipment | |
CN111325271B (en) | Image classification method and device | |
CN105005980B (en) | Image processing method and device | |
CN110097086A (en) | Image generates model training method, image generating method, device, equipment and storage medium | |
CN103456013B (en) | A kind of method representing similarity between super-pixel and tolerance super-pixel | |
CN110765882B (en) | Video tag determination method, device, server and storage medium | |
CN102831244B (en) | A kind of classification retrieving method of house property file and picture | |
CN114155445B (en) | SAR image target detection method based on improvement YOLOv3 | |
CN115482395B (en) | Model training method, image classification device, electronic equipment and medium | |
CN108647264A (en) | A kind of image automatic annotation method and device based on support vector machines | |
CN111177811A (en) | Automatic fire point location layout method applied to cloud platform | |
CN103186880B (en) | Generate the method and apparatus of thumbnail | |
CN113837965B (en) | Image definition identification method and device, electronic equipment and storage medium | |
CN114782688A (en) | Natural image segmentation method and system based on tensor subspace clustering | |
CN108304588B (en) | Image retrieval method and system based on k neighbor and fuzzy pattern recognition | |
CN103955713A (en) | Icon recognition method and device | |
CN107358244B (en) | A kind of quick local invariant feature extracts and description method | |
CN106469437B (en) | Image processing method and image processing apparatus | |
CN109636711B (en) | Cartoon album generating method, cartoon album generating device and computer readable storage medium | |
CN106056575B (en) | A kind of image matching method based on like physical property proposed algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |