CN103425666B - Information processor and information processing method - Google Patents

Information processor and information processing method Download PDF

Info

Publication number
CN103425666B
CN103425666B CN201210152699.4A CN201210152699A CN103425666B CN 103425666 B CN103425666 B CN 103425666B CN 201210152699 A CN201210152699 A CN 201210152699A CN 103425666 B CN103425666 B CN 103425666B
Authority
CN
China
Prior art keywords
initial data
data
initial
neighbour
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210152699.4A
Other languages
Chinese (zh)
Other versions
CN103425666A (en
Inventor
刘曦
刘汝杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201210152699.4A priority Critical patent/CN103425666B/en
Publication of CN103425666A publication Critical patent/CN103425666A/en
Application granted granted Critical
Publication of CN103425666B publication Critical patent/CN103425666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of information processor and a kind of information processing method, for multiple initial datas with many dimension labels are carried out eigentransformation.This information processing method includes: calculates each initial data and other initial data label similarity each and determines that whether these other initial datas are the related datas of this initial data based on this, select multiple neighbour's related datas of this initial data, form the relevant figure of neighbour, and solve the object transformation matrix for carrying out eigentransformation, this object transformation matrix representative makes object function obtain the linear transformation of maximum, the weighting length sum negative correlation in the feature space through described linear transformation of the whole limits in this object function figure relevant to neighbour.Technical scheme according to the disclosure, it is possible to achieve multiple initial datas with many dimension labels are carried out locality preserving projections (LPP) eigentransformation, and then can preferably initial data be classified.

Description

Information processor and information processing method
Technical field
It relates to a kind of information processor and information processing method, particularly relate to a kind of for having multidimensional mark The multiple initial datas signed carry out information processor and the information processing method of eigentransformation.
Background technology
In categorizing process, generally require and first data are carried out eigentransformation data are carried out, so that eigentransformation Become the key technology of data classification.This is in order at following reason.On the one hand, in data sorting task, such as image Or the data of text typically obtain at the submanifold up-sampling around theorem in Euclid space, say, that these data are not Being distributed across in the theorem in Euclid space of " smooth ", the primitive character of these data is not appropriate for being analyzed in theorem in Euclid space, It is thus desirable to these data are carried out eigentransformation.On the other hand, the primitive character of these data often has higher dimension, Directly these data carry out classification and will run into dimension calamity and (see " the On adaptive of R.Bellman and R.Kalab a Control processes ", IRE Trans actions onAutomatic Control, roll up 4,1959).
Currently, locality preserving projections (Locality Preserving Projection, LPP) eigentransformation method is one Planting very conventional local keeps eigentransformation method (to see " the Locality preserving of X.F.He and P.Niyogi Projections ", Advances in neuralinformation processing systems, roll up 16,2004).At this In method, first primitive character and data category according to data comes for one adjacent non-directed graph of all data construct, so After minimize Laplce's item of this non-directed graph, in the hope of projective transformation matrix (matrix of a linear transformation).Due to what LPP was carried out it is Linear transformation and can retain the partial structurtes of data, the operand needed for therefore carrying out LPP eigentransformation is relatively small, can To perform quickly and to be suitable for processing to up-sample, at manifold structure, the data obtained.
Summary of the invention
But the shortcoming of LPP is that it only remains the local neighbor structure of data (that is, in the primitive character of data Local neighbor information), but can not utilize the label information that data are had.It addition, the method cannot process has many dimension labels Data.
Therefore, present disclosure proposes a kind of letter for multiple initial datas with many dimension labels being carried out eigentransformation Breath processing means and information processing method, it can be while the local neighbor information in the primitive character retaining data, profit The label information being had by data.Additionally, information processor and information processing method according to the disclosure the most alternatively can Enough consider that the association in the presence of many dimension labels that data are had is to carry out eigentransformation.
According to embodiment of the disclosure, it is provided that a kind of information processor, for having the multiple former of many dimension labels Beginning data carry out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for each Initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is configured For for each initial data, generating the label vector representing many dimension labels that this initial data is had;Label similarity is true Cell, it is configured to for each initial data, calculates this initial data with other initial datas each at label vector Label similarity in space;Related data determines unit, and it is configured to for each initial data, and other are former based on each The label similarity of beginning data and this initial data determines that whether these other initial datas are the related datas of this initial data; Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and other original number each According to the characteristic similarity in original feature vector space;Neighbour's related data selects unit, and it is configured to for each former The characteristic similarity of beginning data, each related data based on this initial data and this initial data, in the phase of this initial data Close the multiple neighbour's related datas selecting this initial data in data;Neighbour is correlated with figure signal generating unit, and it is configured to each Neighbour's related data of initial data and this initial data as node, every corresponding to this initial data and this initial data Form limit between the node of individual neighbour's related data, and be that each limit sets the weight more than or equal to zero, thus form neighbour Relevant figure;And eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to institute Stating multiple initial data and carry out eigentransformation, wherein, described object transformation matrix representative makes object function obtain maximum Linear transformation, the whole limits in described object function figure relevant to described neighbour are in the feature space through described linear transformation Weighting length sum negative correlation.
According to embodiment of the disclosure, additionally provide a kind of information processor, for having the multiple of many dimension labels Initial data carries out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for often Individual initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is joined It is set to, for each initial data, generate the label vector representing many dimension labels that this initial data is had;Label similarity Determining unit, it is configured to for each initial data, calculate this initial data and other initial datas each label to Label similarity in quantity space;Non-relevant data determines unit, and it is configured to for each initial data, based on each its His initial data and the label similarity of this initial data determine that whether these other initial datas are the non-phases of this initial data Close data;Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each other Initial data characteristic similarity in original feature vector space;Neighbour's non-relevant data selects unit, and it is configured to pin To each initial data, each non-relevant data based on this initial data and the characteristic similarity of this initial data, former at this The non-relevant data of beginning data selects multiple neighbour's non-relevant data of this initial data;Neighbour's irrelevant figure signal generating unit, It is configured to using neighbour's non-relevant data of each initial data and this initial data as node, corresponding to this original number According to forming limit between the node of each neighbour's non-relevant data of this initial data, and it is that each limit sets more than or equal to zero Weight, thus form the irrelevant figure of neighbour;And eigentransformation unit, it is configured to solve object transformation matrix basis This object transformation matrix carries out eigentransformation to multiple initial datas, and wherein, this object transformation matrix representative makes object function Obtaining the linear transformation of maximum, this object function and the whole limits in the irrelevant figure of described neighbour are through this linear transformation Weighting length sum positive correlation in feature space.
According to embodiment of the disclosure, additionally providing a kind of information processing method, it is for many to having many dimension labels Individual initial data carries out eigentransformation.This information processing method includes: for each initial data, generates and represents this initial data The original feature vector of primitive character;For each initial data, generate and represent many dimension labels that this initial data is had Label vector;For each initial data, calculate this initial data and other initial datas each in label vector space Label similarity;For each initial data, label similarity based on other initial datas each with this initial data are come Determine that whether these other initial datas are the related datas of this initial data;For each initial data, calculate this initial data With other initial datas each characteristic similarity in original feature vector space;For each initial data, former based on this Each related data of beginning data and the characteristic similarity of this initial data, select this former in the related data of this initial data Multiple neighbour's related datas of beginning data;Using neighbour's related data of each initial data and this initial data as node, Corresponding to forming limit between the node of this initial data and each neighbour's related data of this initial data, and set for each limit Surely the weight more than or equal to zero, thus form the relevant figure of neighbour;And solve object transformation matrix and according to this object transformation square Battle array carries out eigentransformation to multiple initial datas, and wherein, this object transformation matrix representative makes object function obtain maximum Linear transformation, the whole limits in this object function figure relevant to described neighbour are in the feature space through described linear transformation Weighting length sum negative correlation.
According to embodiment of the disclosure, additionally providing a kind of program, it is used for so that computer equipment performs above-mentioned information Processing method, for carrying out eigentransformation to multiple initial datas with many dimension labels.
According to embodiment of the disclosure, additionally providing corresponding computer-readable recording medium, this computer-readable stores On medium, storage has the program that can be performed by calculating equipment, and described program can make described calculating equipment perform upon execution State information processing method.
The information processor proposed according to the disclosure and information processing method, it is possible at the primitive character retaining data In local neighbor information while, utilize the label information that data are had.Additionally, according to the information processor of the disclosure The association in the presence of the many dimension labels being had in view of data also it is optionally able to carry out feature with information processing method Conversion.
The brief overview of the technical scheme about the disclosure given above, in order to technical side of this disclosure is provided Basic comprehension in terms of some of case.It is not the exhaustive of the technical scheme about the disclosure it should be appreciated that outlined above Property general introduction.Key or the pith being not intended to determine the technical scheme of the disclosure outlined above, is not intended limitation The scope of the technical scheme of the disclosure.Its purpose is only to provide some concept in simplified form, in this, as discussing after a while Preamble in greater detail.
By the detailed description below in conjunction with accompanying drawing preferred embodiment of this disclosure, these of the technical scheme of the disclosure And other advantages will be apparent from.
Accompanying drawing explanation
The technical scheme of the disclosure can be by with reference to preferably being managed below in association with the description given by accompanying drawing Solve, employ same or analogous reference the most in all of the figs to represent same or like parts.Described attached Figure comprises in this manual together with detailed description below and forms the part of this specification, and is used for into one Step illustrates preferred embodiment of the present disclosure and explains the principle and advantage of the disclosure.In the accompanying drawings:
Fig. 1 is the block diagram of the structure schematically showing the information processor according to first embodiment of the present disclosure;
Fig. 2 is the flow chart schematically showing the information processing method according to first embodiment of the present disclosure;
Fig. 3 is the block diagram of the structure schematically showing the information processor according to second embodiment of the present disclosure;
Fig. 4 is the flow chart schematically showing the information processing method according to second embodiment of the present disclosure;
Fig. 5 is the block diagram of the structure schematically showing the information processor according to third embodiment of the present disclosure;
Fig. 6 is the flow chart schematically showing the information processing method according to third embodiment of the present disclosure;
Fig. 7 is to schematically show to can be used to realize at according to the information processing method that embodiment of the disclosure and information A kind of structure diagram of the hardware configuration of the possible messaging device of reason device.
It will be appreciated by those skilled in the art that each building block in accompanying drawing be only used to simple and clear for the sake of and show Go out, and be not necessarily drawn to scale.Such as, in accompanying drawing, the size of some building block may form relative to other Parts are exaggerated, in order to be favorably improved understanding of the embodiments of the disclosed embodiments.
Detailed description of the invention
It is described hereinafter in connection with accompanying drawing preferred embodiment of this disclosure.For clarity and conciseness, exist Description does not describe all features of actual embodiment.It should be understood, however, that developing any this practical embodiments During must make much specific to the decision of embodiment, in order to realize the objectives of developer, such as, meet Those restrictive conditions relevant to system and business, and these restrictive conditions may along with the difference of embodiment Change.Additionally, it also should be appreciated that, although development is likely to be extremely complex and time-consuming, but to having benefited from the disclosure For the those skilled in the art held, this development is only routine task.
Here, also need to explanation a bit, in order to avoid having obscured the technical side of the disclosure because of unnecessary details Case, illustrate only and the closely-related apparatus structure of technical scheme according to the disclosure and/or process step in the accompanying drawings, and Eliminate other details little with the technical scheme relation of the disclosure.
First aspect according to the disclosure, it is provided that a kind of information processor, for having the multiple of many dimension labels Initial data carries out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for often Individual initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is joined It is set to, for each initial data, generate the label vector representing many dimension labels that this initial data is had;Label similarity Determining unit, it is configured to for each initial data, calculate this initial data and other initial datas each label to Label similarity in quantity space;Related data determines unit, and it is configured to for each initial data, based on each other The label similarity of initial data and this initial data determines that whether these other initial datas are the dependency numbers of this initial data According to;Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are original Data characteristic similarity in original feature vector space;Neighbour's related data selects unit, and it is configured to for each The characteristic similarity of initial data, each related data based on this initial data and this initial data, at this initial data Related data selects multiple neighbour's related datas of this initial data;Neighbour is correlated with figure signal generating unit, and it is configured to often Individual initial data, as node, forms limit between this initial data and each neighbour's related data of this initial data, and Set the weight more than or equal to zero for each limit, thus form the relevant figure of neighbour;And eigentransformation unit, it is configured to ask Solve object transformation matrix and according to this object transformation matrix, multiple initial datas carried out eigentransformation, wherein, this object transformation Matrix representative makes object function obtain the linear transformation of maximum, the whole limits in this object function figure relevant to described neighbour In the weighting length sum negative correlation in the feature space of this linear transformation.
According to the first aspect of the disclosure, additionally providing a kind of information processing method, it is for having many dimension labels Multiple initial datas carry out eigentransformation.This information processing method includes: for each initial data, generates and represents this original number According to the original feature vector of primitive character;For each initial data, generate and represent the multidimensional mark that this initial data is had The label vector signed;For each initial data, calculate this initial data with other initial datas each in label vector space In label similarity;For each initial data, label similarity based on other initial datas each Yu this initial data Determine that whether these other initial datas are the related datas of this initial data;For each initial data, calculate this original number According to other initial datas each characteristic similarity in original feature vector space;For each initial data, based on this Each related data of initial data and the characteristic similarity of this initial data, selecting in the related data of this initial data should Multiple neighbour's related datas of initial data;Using neighbour's related data of each initial data and this initial data as node, Between the node of each neighbour's related data corresponding to this initial data and this initial data, form limit, and be each limit Set the weight more than or equal to zero, thus form the relevant figure of neighbour;And solve object transformation matrix and according to this object transformation Matrix carries out eigentransformation to multiple initial datas, and wherein, this object transformation matrix representative makes object function obtain maximum Linear transformation, the whole limits in this object function figure relevant to described neighbour are in the feature space through described linear transformation Weighting length sum negative correlation.
Second aspect according to the disclosure, it is provided that a kind of information processor, for having the multiple of many dimension labels Initial data carries out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for often Individual initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is joined It is set to, for each initial data, generate the label vector representing many dimension labels that this initial data is had;Label similarity Determining unit, it is configured to for each initial data, calculate this initial data and other initial datas each label to Label similarity in quantity space;Non-relevant data determines unit, and it is configured to for each initial data, based on each its His initial data and the label similarity of this initial data determine that whether these other initial datas are the non-phases of this initial data Close data;Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each other Initial data characteristic similarity in original feature vector space;Neighbour's non-relevant data selects unit, and it is configured to pin To each initial data, each non-relevant data based on this initial data and the characteristic similarity of this initial data, former at this The related data of beginning data selects multiple neighbour's non-relevant data of this initial data;Neighbour's irrelevant figure signal generating unit, its Be configured to using each initial data as node, this initial data and this initial data each neighbour's non-relevant data it Between form limit, and be that each limit sets the weight being more than or equal to zero, thus form the irrelevant figure of neighbour;And eigentransformation list Unit, it is configured to solve object transformation matrix and according to this object transformation matrix, multiple initial datas carried out eigentransformation, Wherein, this object transformation matrix representative makes object function obtain the linear transformation of maximum, this object function and described neighbour Whole limits in relevant figure are in the weighting length sum positive correlation in the feature space of this linear transformation.
According to the second aspect of the disclosure, additionally providing a kind of information processing method, it is for having many dimension labels Multiple initial datas carry out eigentransformation.This information processing method includes: for each initial data, generates and represents this original number According to the original feature vector of primitive character;For each initial data, generate and represent the multidimensional mark that this initial data is had The label vector signed;For each initial data, calculate this initial data with other initial datas each in label vector space In label similarity;For each initial data, label similarity based on other initial datas each Yu this initial data Determine that whether these other initial datas are the non-relevant data of this initial data;For each initial data, calculate this original Data and other initial datas each characteristic similarity in original feature vector space;For each initial data, based on Each non-relevant data of this initial data and the characteristic similarity of this initial data, in the non-relevant data of this initial data Select multiple neighbour's non-relevant data of this initial data;Neighbour's non-relevant data with each initial data He this initial data As node, between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data, form limit, And it is that each limit sets the weight more than or equal to zero, thus forms the irrelevant figure of neighbour;And solve object transformation matrix also According to this object transformation matrix, multiple initial datas being carried out eigentransformation, wherein, this object transformation matrix representative makes target Function obtains the linear transformation of maximum, the whole limits in this object function and the irrelevant figure of described neighbour through described linearly Weighting length sum positive correlation in the feature space of conversion.
(first embodiment)
First, by describing the information processor 100 according to first embodiment of the present disclosure with reference to Fig. 1, as basis The example of the information processor that the first aspect of the disclosure provides.
Information processor 100 includes original feature vector signal generating unit 101, label vector signal generating unit 102, label phase Determine that unit 103, related data determine that unit 104, characteristic similarity determine that unit 105, neighbour's related data select like degree single Be correlated with figure signal generating unit 107, eigentransformation unit 108, non-relevant data of unit 106, neighbour determines that unit 114, neighbour are irrelevant Data selection unit 116 and neighbour's irrelevant figure signal generating unit 117.
Original feature vector signal generating unit 101 is according to the initial data with many dimension labels received, for each former Beginning data generate the original feature vector of the primitive character representing this initial data, and provide it to characteristic similarity and determine Unit 105.Such as, original feature vector signal generating unit 101 makes initial data a1, a2..., anIt is respectively provided with x1, x2..., xnAs its original feature vector.Wherein, i is the natural number of the total n less than or equal to initial data, aiRepresent i-th Individual initial data, xiRepresent aiCharacteristic vector, such as, xiIt is aiD dimension in the original feature vector space of d dimension to Amount.The original feature vector space of d dimension is the vector space of all primitive characters representing initial data, is generally of higher Dimension.
Label vector signal generating unit 102 is according to the initial data with many dimension labels received, for each original number According to generating the label vector representing many dimension labels that this initial data is had, and provide it to label similarity and determine list Unit 103.Such as, original feature vector signal generating unit 101 makes initial data a1, a2..., anIt is respectively provided with y1, y2..., ynAs its label vector.Wherein, yiRepresent aiLabel vector, such as, yiIt is aiIn the label vector space of k dimension one K dimensional vector, this k dimensional vector can be the 0-1 vector of k dimension, wherein, if k is dimensional vector yiValue in jth dimension is 0, then table Show aiNot there is the jth label in k label, if k is dimensional vector yiValue in jth dimension is 1, then it represents that aiThere is k Jth label in label, wherein j is less than the natural number equal to k.Certainly, yiCan also be outside 0-1 vector k dimension to Amount, such as, if each initial data is the photo containing a personage, and a label of initial data is height number Value, another label is body weight numerical value, and the label vector of the most each initial data is a bivector, and this vector is often Value in one-dimensional is all a positive number.
Label similarity determines that unit 103 is vectorial based on the label about each data received, for each original Data, calculate this initial data and other initial datas each label similarity in label vector space, and are provided Determine that be correlated with figure signal generating unit 107, non-relevant data of unit 104, neighbour determines that unit 114 and neighbour are irrelevant to related data Figure signal generating unit 117.Label similarity can be relevant with label according to two initial datas distance in label vector space Property matrix calculates.Such as can calculate initial data a according to following formula (1)iAnd ajBetween label similarity SL, ij
Sl,ij=y 'iCyj(1)
Wherein C is the label correlation matrix that k takes advantage of k, and it can artificially give (as not having between unit matrix I, i.e. label Association), it is also possible to utilize following formula (2) such as to calculate:
C a , b = < Y a , Y b > | | Y a | | &CenterDot; | | Y b | | , 1≤a, b≤k (2)
Wherein YaAnd YbIt is n-dimensional vector, YaAnd YbWith yiBetween just like the relation described by following formula (3), (4):
Yai=yia(3)
Ybi=yib(4)
In other words, YaValue in i-th dimension is yiValue in a dimension, YbValue in i-th dimension is yiIn b dimension Value.
Foregoing merely illustrate a kind of mode determining label similarity.It will be understood by those skilled in the art that can In order to determining label similarity in other ways, such as, can be based only upon initial data distance in Label space and determine Label similarity, and the distance used can be COS distance, Euclidean distance or the distance of other suitable types.
Related data determine unit 104 for each initial data and other initial datas each, based on received Label similarity, determines that whether these other initial datas are the related datas of this initial data, and will determine that result is supplied to closely Adjacent related data selects unit 106.Wherein it is possible to determine in various ways between initial data the most each other for dependency number According to.
A kind of feasible mode is, if aiWith ajLabel similarity be aiWith label phase in every other initial data One of m that seemingly degree is the highest, is also a simultaneouslyjOne of the m the highest with label similarity in every other initial data, then aiWith ajBeing relatively related data, wherein m is less than the natural number of total n of initial data, and m can be previously given, it is possible to Being according to the distribution character of such as initial data or other are suitable because usually determining.
Another kind of feasible mode is, if aiWith ajLabel similarity greater than or equal to predetermined the first label threshold value Thr, then aiWith ajIt is relatively related data.Similar to above-mentioned natural number m, the first label threshold value Thr can be previously given , it is also possible to it is according to the distribution character of such as initial data or other are suitable because usually determining.
Non-relevant data determines that unit 114 determines that with related data unit 104 is similar, and difference is, non-relevant data is true Cell 114, for each initial data and other initial datas each, based on the label similarity received, determines this its Whether his initial data is the non-relevant data of this initial data, and will determine that result is supplied to neighbour's non-relevant data and selects single Unit 116.Likewise it is possible to determine in various ways between initial data the most each other for non-relevant data.
A kind of feasible mode is, if aiWith ajLabel similarity be aiWith label phase in every other initial data Seemingly spend one of minimum r, be also a simultaneouslyjOne of the r minimum with label similarity in every other initial data, then aiWith ajBeing relatively non-relevant data, wherein r is less than the natural number of total n of initial data, and r can be previously given, also Can be according to the distribution character of such as initial data or other are suitable because usually determining.
Another kind of feasible mode is, if aiWith ajLabel similarity less than predetermined the second label threshold value Thir, then aiWith ajIt is relatively non-relevant data.Similar to above-mentioned natural number n, the second label threshold value Thir can be previously given, Can also be according to the distribution character of such as initial data or other are suitable because usually determining.
Preferably, when using the second label threshold while using the first label threshold value Thr to determine related data relation When value Thir determines non-relevant data relation, the first label threshold value Thr is more than or equal to the second label threshold value Thir.Therefore, may be used To guarantee that two initial datas can not be related data simultaneously, it it is again non-relevant data.
Characteristic similarity determines unit 105 original feature vector based on the initial data received, to each original number According to, calculate this initial data and other initial datas each characteristic similarity in original feature vector space, and carried Supply neighbour's related data selection unit 106, neighbour is correlated with figure signal generating unit 107, neighbour's non-relevant data selects unit 116 and Neighbour's irrelevant figure signal generating unit 117.Characteristic similarity can be according to two initial datas in original feature vector space Distance calculates.Such as can calculate initial data a according to following formula (5)iAnd ajBetween characteristic similarity Sv,ij
S v , ij = exp ( - | | x i - x j | | 2 &sigma; 2 ) - - - ( 5 )
Wherein σ=mean (| | xi-xj||2, 1≤i ≠ j≤n) be all initial datas between any two primitive character to Average distance in quantity space.
It will be understood by those skilled in the art that initial data distance in original feature vector space can be Europe Must be apart from, manhatton distance, card side's distance or the distance of other suitable types in several.
Neighbour's related data select unit 106 based on received about the related data relation between initial data, with And the characteristic similarity received, for each initial data, the related data of this initial data selects this initial data Multiple neighbour's related datas, and provide it to neighbour and be correlated with figure signal generating unit 107.Can come in various ways for each Initial data selects neighbour's related data.
A kind of feasible mode is, for each initial data, selects the feature with this initial data in its related data Q related data of similarity maximum is as neighbour's related data of this initial data.Wherein q is less than the sum of initial data The natural number of n, q can be previously given, it is also possible to is according to the distribution character of such as initial data or other are suitable Because of usually determine.
Another kind of feasible mode is, for each initial data, by feature with this initial data in its related data Similarity is more than first neighbour's threshold value Th1, then this related data is neighbour's related data of this initial data.With above-mentioned nature Q are similar for number, and first neighbour's threshold value Th1 can be previously given, it is also possible to be according to the distribution character of such as initial data or Other are suitable because usually determining for person.
It will be appreciated by one of ordinary skill in the art that can there are other modes for selecting neighbour's related data, Such as in the neighbour's related data selected according to first kind of way, remove little with the characteristic similarity of targeted initial data Related data in first neighbour's threshold value Th1.
Neighbour is correlated with figure signal generating unit 107 neighbour's related data based on each initial data received, with each former Neighbour's related data of beginning data and this initial data is as node, each neighbour with this initial data and this initial data Form limit between the node of related data, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour, And provide it to eigentransformation unit 108.When setting weight for each limit, two joints that can be connected based on this edge The characteristic similarity between two initial datas corresponding to Dian and at least one in label similarity are carried out.Such as, may be used To set weight for each limit based in the following manner.
If neighbour's related data selection unit 106 is for each initial data, at all dependency numbers of this initial data According to, q related data of the characteristic similarity maximum of selection and this initial data is as neighbour's dependency number of this initial data According to, then neighbour's figure signal generating unit 107 of being correlated with can for each edge, be connected based on that received, this limit in such a way Two initial datas corresponding to two nodes between characteristic similarity make a distinction different situations set weight.
If other are former for the characteristic similarity of the initial data corresponding to a node being connected with this limit maximum q Beginning data include the initial data corresponding to another node being connected with this edge, then be 1 by the weight setting on this limit, Otherwise by the feature phase between two initial datas that the weight setting on this limit is two nodes being connected with corresponding to this limit Like at least one positive correlation in degree and label similarity and less than or equal to 1, such as, it is corresponding to this by the weight setting on this limit Characteristic similarity between two initial datas of two nodes that limit is connected and the linear combination of label similarity, as following Shown in formula (6):
Wherein, WR, ijIt is to connect in neighbour is correlated with figure corresponding to initial data aiAnd ajThe weight on limit of node, α is Regulation parameter, is the real number between 0 to 1.
In formula (6), for each initial data ai, from the set N of its related data compositionrI () is found front q With initial data aiThe maximum data of characteristic similarity, define NrqI () is the set of this q initial data composition;Meanwhile, right In each initial data ai, never include initial data aiAll initial datas in find front q and initial data aiFeature The initial data that similarity is maximum, defines NqI () is the set of this q data composition.
According to the weight setting mode shown in formula (6):
For belonging simultaneously to gather Nrq(i) and NqThe initial data a of (i)j, by connection corresponding to initial data aiAnd aj's The weight on the limit of node is set to maximum 1;
For belonging simultaneously to gather Nrq(j) and NqThe initial data a of (j)i, by connection corresponding to initial data aiAnd aj's The weight on the limit of node is also set to maximum 1;
For in addition, the initial data a that there is limit between corresponding nodeiAnd aj, the weight on this limit is set It is set to initial data aiAnd ajBetween characteristic similarity and the linear combination of label similarity, wherein by regulation parameter alpha adjust The ratio that joint characteristic similarity and label similarity are the most shared in weight;
For there is not the initial data a on limit between corresponding nodeiAnd aj, by WR, ijBeing set as 0, this can manage Xie Wei, the weight on all non-existent limits the most necessarily 0.
It should be noted that owing to the figure of being correlated with of neighbour herein is non-directed graph, therefore each limit in figure is not have direction , therefore WR, ijNecessarily equal to WR, ji
Foregoing merely illustrating a kind of mode setting weight, those of ordinary skill in the art are it is contemplated that use its other party Formula sets weight.For example, it is possible to by weight W of each edgeR, ijTwo nodes being set as being connected with corresponding to this limit Two initial data aiAnd ajBetween characteristic similarity and at least one positive correlation in label similarity.More specifically, such as Can be by weight W of each edgeR, ijIt is set to correspond to two initial data W of two nodes that this limit is connectedR, ijBetween Characteristic similarity and the linear combination of label similarity.
It is of course also possible to the weight of each edge is all set to 1.That is so that each edge all has identical weight.
Neighbour's non-relevant data selects unit 116 to select unit 106 similar with neighbour's related data, and difference is, neighbour Non-relevant data select unit 116 based on received about the non-relevant data relation between initial data, and received Characteristic similarity, for each initial data, the non-relevant data of this initial data selects the multiple of this initial data Neighbour's non-relevant data, and provide it to neighbour's irrelevant figure signal generating unit 117.Can come in various ways for each former Beginning data select neighbour's non-relevant data.
A kind of feasible mode is, for each initial data, selects the spy with this initial data in its non-relevant data Levy p the maximum related data of similarity neighbour's non-relevant data as this initial data.Wherein p is less than initial data The natural number of sum n, p can be previously given, it is also possible to be according to the distribution character of such as initial data or other fit When because usually determining.
Another kind of feasible mode is, for each initial data, by spy with this initial data in its non-relevant data Levy similarity and be more than second neighbour's threshold value Th2, then this non-relevant data is neighbour's non-relevant data of this initial data.With above-mentioned Natural number p similar, second neighbour's threshold value Th2 can be previously given, it is also possible to is the distribution according to such as initial data Characteristic or other are suitable because usually determining.
It will be appreciated by one of ordinary skill in the art that can be to exist other modes for selecting the irrelevant number of neighbour According to, such as in the neighbour's non-relevant data selected according to first kind of way, remove the feature phase with targeted initial data The non-relevant data of second neighbour's threshold value Th2 it is less than like degree.
Neighbour's irrelevant figure signal generating unit 117 figure signal generating unit relevant to neighbour 107 is similar to, and difference is, the non-phase of neighbour Pass figure signal generating unit 117 neighbour's non-relevant data based on each initial data received, former with this with each initial data Neighbour's non-relevant data of beginning data is as node, in each neighbour's non-relevant data with this initial data and this initial data Node between form limit, and be that each limit sets the weight being more than or equal to zero, thus form the irrelevant figure of neighbour, and by it It is supplied to eigentransformation unit 108.When setting weight for each limit, two the nodes institutes that can be connected based on this edge are right Characteristic similarity between two initial datas answered and at least one in label similarity are carried out.For example, it is possible to based on In the following manner to set weight for each limit.
If neighbour's non-relevant data selection unit 116 is for each initial data, in all non-phase of this initial data Close in data, select p the related data maximum with the characteristic similarity of this initial data non-as the neighbour of this initial data Related data, then neighbour's irrelevant figure signal generating unit 117 can in such a way, for each edge, based on that received, should Label similarity between two initial datas corresponding to two nodes that limit is connected is to set weight.
If the characteristic similarity between two initial datas corresponding to two nodes that this limit is connected is more than wherein One initial data and the maximum in the characteristic similarity of all related datas of this initial data, then set the weight on this limit It is set to the characteristic similarity positive correlation between two initial datas corresponding to two nodes being connected with this limit, such as, sets For the characteristic similarity between the two initial data, it is otherwise 0 by the weight setting on this limit.
If for each initial data ai, from the set N of its uncorrelated data compositionirI () is found front p and number According to aiThe maximum uncorrelated data of characteristic similarity, define NirkI () is the set of this p uncorrelated data composition, and right In each initial data ai, calculate its characteristic similarity maximum with the characteristic similarity of its all related datas and determine Justice is MaxRS (i), and above weight setting method can be expressed as: for belonging to set Nir(i) and with initial data aiSpy Levy the similarity initial data a more than MaxRS (i)j, by it and initial data aiThe power on the corresponding limit between two nodes Reset and be set to initial data aiAnd ajBetween characteristic similarity.Above establishing method can also be represented by following formula (7):
Wherein, WIr, ijIt is to connect corresponding to initial data a in the irrelevant figure of neighbouriAnd ajThe weight on limit of node.
It should be noted that owing to the irrelevant figure of neighbour herein is non-directed graph, therefore each limit in figure is the most square To, therefore WIr, ijNecessarily equal to WIr, ji
Those of ordinary skill in the art be also to be understood that can be by the weight setting of each edge by being connected with corresponding to this limit Characteristic similarity between two initial datas of two nodes connect and at least one positive correlation in label similarity, such as It is set as the linear combination of the characteristic similarity between the two initial data and label similarity.
It is of course also possible to the weight of each edge is all set to 1.That is so that each edge all has identical weight.
Eigentransformation unit 108, based on the relevant figure of the neighbour received and the irrelevant figure of neighbour, solves object transformation matrix And according to this object transformation matrix the plurality of initial data carried out eigentransformation, and export after carrying out eigentransformation former Beginning data.Wherein, this object transformation matrix representative makes object function obtain the linear transformation of maximum, and this object function is with near Whole limits in adjacent relevant figure are in the weighting length sum negative correlation in the feature space of linear transformation and non-with neighbour Whole limits in relevant figure are in the weighting length sum positive correlation in the feature space of linear transformation.It should be noted that this In the length on described limit refer to two points that limit connected distance in space.Those of ordinary skill in the art should Recognize, the distance length as limit of suitable type can be selected here.
The purpose that eigentransformation unit 108 carries out eigentransformation is so that the neighbour after eigentransformation is correlated with in figure each Weighted distance between neighbour's related data is furthered as far as possible, and each in the irrelevant figure of neighbour after eigentransformation is near simultaneously Weighted distance between adjacent non-relevant data is zoomed out as far as possible, namely realizes the object function in formula (8) and formula (9).
min∑I, j(aTxi-aTxj)2Wr,ij(8)
max∑I, j(aTxi-aTxj)2WIr, ij(9)
Wherein αTIt it is the linear transformation (that is, eigentransformation) that initial data is carried out.
As the institute's common method in the retaining projection eigentransformation method of local, utilize the relevant figure of neighbour and neighbour irrelevant Laplce's item of figure, can ask the optimization that the object function realized in formula (8) and formula (9) is converted into as shown in following formula (10) Topic:
arg max a a T X ( &beta;L ir - ( 1 - &beta; ) L r ) X T a - - - ( 10 )
s.t.aTXDrXTA=1
Wherein Dr=diag (sum (Wr)), Dir=diag (sum (Wir)), neighbour is correlated with the Laplce item L of figurer=Dr- Wr, the Laplce item L of the irrelevant figure of neighbourir=Dir-Wir, and β is that a relevant figure of regulation neighbour is each with the irrelevant figure of neighbour From the scale parameter of shared weight, 0≤β≤1.
Because Lr=Dr-Wr, therefore the optimization problem shown in formula (10) can be equivalent to the optimization problem shown in formula (11), And due to boundary condition aTXDrXTA=1, and then the optimization problem shown in formula (11) can be equivalent to the optimization shown in formula (12) Problem.
arg max a a T X ( &beta;L ir - ( 1 - &beta; ) ( D r - W r ) ) X T a - - - ( 11 )
s.t.aTXDrXTA=1
arg max a a T X ( &beta; L ir + ( 1 - &beta; ) W r ) X T a - - - ( 12 )
s.t.aTXDrXTA=1
Formula (12) is solved the generalized eigenvalue problem being equivalent to solve as the formula (13):
X(βLir+(1-β)Wr)XTA=λ XDrXTA (13)
If α12,……,αmIt is by eigenvalue λ in formula (13)12>·……>λmThe distinguished characteristic of correspondence of order Vector, then can obtain eigentransformation matrix A=(a1, a2..., am), wherein yi=ATxiIt it is the feature after conversion.
It should be noted that the whole limits in object function figure relevant to neighbour herein are empty in the feature through linear transformation Weighting length sum negative correlation between, and the whole limits in figure irrelevant with neighbour are at the feature space through linear transformation In weighting length sum positive correlation.
Thus, information processor 100 has obtained the neighbour after making eigentransformation and has been correlated with each neighbour's dependency number in figure Weighted distance between according to is furthered as far as possible, simultaneously the irrelevant number of each neighbour in the irrelevant figure of neighbour after eigentransformation The linear transformation that Weighted distance between according to is zoomed out as far as possible, and can and then obtain after this linear transformation former Beginning data.The initial data that information processor 100 is also based on after this linear transformation is classified.Especially Ground, the initial data of such as image or text can be processed by information processor 100, with according to image or text It is classified by the primitive character of the many dimension labels being had and image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor (K- Nearest Neighbor, KNN) the nearest neighbour classification algorithm of algorithm.
Hereinafter, by with reference to Fig. 2 describe according to first embodiment of the present disclosure, to having the multiple original of many dimension labels The information processing 120 that data are carried out, as the example of the information processing method that the first aspect according to the disclosure provides.At information Reason 120 such as can be performed by information processor 100.
After information processing 120 starts, initially enter step S101.In step S101, according to having the former of many dimension labels Beginning data, generate the original feature vector of the primitive character representing this initial data, and process for each initial data Proceed to step S102.Step S101 such as can be performed by original feature vector signal generating unit 101, at this to its details not Repeat again.
In step s 102, according to having the initial data of many dimension labels, generating representative for each initial data should The label of many dimension labels that initial data is had is vectorial, and process proceeds to step S103.Step S102 such as can be by Label vector signal generating unit 102 performs, and repeats no more its details at this.
In step s 103, based on the label vector about each data, for each initial data, this original number is calculated According to other initial datas each label similarity in label vector space, and process proceeds to step S104.Step By label similarity, S103 such as can determine that unit 103 performs, repeat no more its details at this.
In step S104, for each initial data and other initial datas each, based on label similarity, determine Whether these other initial datas are the related datas of this initial data, and process proceeds to step S105.Step S104 is such as Can be determined that unit 104 performs by related data, at this, its details be repeated no more.
In step S105, for each initial data and other initial datas each, based on label similarity, determine Whether these other initial datas are the non-relevant data of this initial data, and process proceeds to step S106.Step S105 example As determined that unit 114 performs by non-relevant data, at this, its details is repeated no more.
In step s 106, original feature vector based on initial data, to each initial data, calculate this initial data With other initial datas each characteristic similarity in original feature vector space, and process proceeds to step S107.Step By characteristic similarity, rapid S106 such as can determine that unit 105 performs, repeat no more its details at this.
In step s 107, based on about the related data relation between initial data and characteristic similarity, for often Individual initial data, selects multiple neighbour's related datas of this initial data in the related data of this initial data, and processes Proceed to step S108.Step S107 such as can be selected unit 106 to perform, at this to its details not by neighbour's related data Repeat again.
In step S108, neighbour's related data based on each initial data, with each initial data and this original number According to neighbour's related data as node, with the node of each neighbour's related data of this initial data and this initial data it Between form limit, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour, and process proceeds to walk Rapid S109.Step S108 such as can be performed by neighbour's figure signal generating unit 107 of being correlated with, and repeats no more its details at this.
In step S109, based on about the non-relevant data relation between initial data and characteristic similarity, for Each initial data, selects multiple neighbour's non-relevant data of this initial data in the non-relevant data of this initial data, and And process proceeds to step S110.Step S109 such as can be selected unit 116 to perform by neighbour's non-relevant data, right at this Its details repeats no more.
In step s 110, neighbour's non-relevant data based on each initial data is original with this with each initial data Neighbour's non-relevant data of data as node, with each neighbour's non-relevant data of this initial data and this initial data Form limit between node, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour, and process Proceed to step S111.Step S110 such as can be performed, at this to its details not by neighbour's irrelevant figure signal generating unit 117 Repeat again.
In step S111, based on the relevant figure of neighbour and the irrelevant figure of neighbour, solve object transformation matrix and according to this mesh Mark transformation matrix carries out eigentransformation to the plurality of initial data, and terminates to process.Wherein, this object transformation matrix representative Making object function obtain the linear transformation of maximum, the whole limits in this object function figure relevant to neighbour are becoming through linear Weighting length sum negative correlation in the feature space changed, and the whole limits in figure irrelevant with neighbour are through linear transformation Feature space in weighting length sum positive correlation.Step S111 such as can be performed by eigentransformation unit 108, at this Its details is repeated no more.
Thus, obtained the neighbour after making eigentransformation by information processing 120 to be correlated with each neighbour's dependency number in figure Weighted distance between according to is furthered as far as possible, simultaneously the irrelevant number of each neighbour in the irrelevant figure of neighbour after eigentransformation The linear transformation that Weighted distance between according to is zoomed out as far as possible, and can and then obtain after this linear transformation former Beginning data.
Classify further, it is also possible to be based further on the initial data after this linear transformation.Especially, may be used Process with the initial data to such as image or text, with the many dimension labels being had according to image or text and It is classified by the primitive character of image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor algorithm Nearest neighbour classification algorithm.
(the second embodiment)
First, by describing the information processor 200 according to second embodiment of the present disclosure with reference to Fig. 3, as basis The another example of the information processor that the first aspect of the disclosure provides.
Information processor 200 includes original feature vector signal generating unit 201, label vector signal generating unit 202, label phase Determine that unit 203, related data determine that unit 204, characteristic similarity determine that unit 205, neighbour's related data select like degree single Unit 206, neighbour are correlated with figure signal generating unit 207 and eigentransformation unit 208.These component units are functionally each and at information The reason original feature vector signal generating unit 101 of device 100, label vector signal generating unit 102, label similarity determine unit 103, Related data determines that unit 104, characteristic similarity determine that unit 105, neighbour's related data select the relevant figure of unit 106, neighbour Signal generating unit 107 and eigentransformation unit 108 are similar, therefore below for each component units of information processor 200, right Will not be described in great detail in the function similar to the component units of information processor 100 and operation.
Original feature vector signal generating unit 201 is according to the initial data with many dimension labels received, for each former Beginning data generate the original feature vector of the primitive character representing this initial data, and provide it to characteristic similarity and determine Unit 205.
Label vector signal generating unit 202 is according to the initial data with many dimension labels received, for each original number According to generating the label vector representing many dimension labels that this initial data is had, and provide it to label similarity and determine list Unit 203.
Label similarity determines that unit 203 is vectorial based on the label about each data received, for each original Data, calculate this initial data and other initial datas each label similarity in label vector space, and are provided Unit 204 figure signal generating unit 207 relevant with neighbour is determined to related data
Related data determine unit 204 for each initial data and other initial datas each, based on received Label similarity, determines that whether these other initial datas are the related datas of this initial data, and will determine that result is supplied to closely Adjacent related data selects unit 206.
Characteristic similarity determines unit 205 original feature vector based on the initial data received, to each original number According to, calculate this initial data and other initial datas each characteristic similarity in original feature vector space, and carried Supply neighbour's related data selects unit 206 figure signal generating unit 207 relevant with neighbour.
Neighbour's related data select unit 206 based on received about the related data relation between initial data, with And the characteristic similarity received, for each initial data, the related data of this initial data selects this initial data Multiple neighbour's related datas, and provide it to neighbour and be correlated with figure signal generating unit 207.
Neighbour is correlated with figure signal generating unit 207 neighbour's related data based on each initial data received, with each former Neighbour's related data of beginning data and this initial data is as node, each neighbour with this initial data and this initial data Form limit between the node of related data, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour, And provide it to eigentransformation unit 208.
Eigentransformation unit 208, based on the relevant figure of the neighbour received, solves object transformation matrix and becomes according to this target Change matrix and the plurality of initial data is carried out eigentransformation, and export the initial data after carrying out eigentransformation.Wherein, should Object transformation matrix representative makes object function obtain the linear transformation of maximum, complete in this object function figure relevant to neighbour Limit, portion is in the weighting length sum negative correlation in the feature space of linear transformation.
The purpose that eigentransformation unit 208 carries out eigentransformation is so that the neighbour after eigentransformation is correlated with in figure each Weighted distance between neighbour's related data is furthered as far as possible, namely realizes the object function in formula (14).
min∑I, j(aTxi-aTxj)2Wr,ij(14)
Wherein αTIt it is the linear transformation (that is, eigentransformation) that initial data is carried out.
As the institute's common method in the retaining projection eigentransformation method of local, neighbour is utilized to be correlated with the Laplce of figure , the object function realized in formula (14) can be converted into the optimization problem as shown in following formula (15):
arg min a a T XL r X T a - - - ( 15 )
s.t.aTXDrXTA=1
Wherein Dr=diag (sum (Wr)), neighbour is correlated with the Laplce item L of figurer=Dr-Wr
Formula (15) is solved the generalized eigenvalue problem being equivalent to solve as the formula (16):
XLrXTA=λ XDrXTA (16)
If α12,……,αmIt is by eigenvalue 0 < λ in formula (16)12<·……<λmOrder distinguished correspondence spy Levy vector, then can obtain eigentransformation matrix A=(a1, a2..., am), wherein yi=ATxiIt it is the feature after conversion.
It should be noted that herein, need to make its whole limits obtained in the object function of minima figure relevant to neighbour In the weighting length sum positive correlation in the feature space of linear transformation.In other words, if needed so that object function obtains To maximum, then the whole limits in this object function figure relevant to neighbour are long in the weighting in the feature space of linear transformation Degree sum negative correlation.
Thus, information processor 200 has obtained the neighbour after making eigentransformation and has been correlated with each neighbour's dependency number in figure The linear transformation that Weighted distance between according to is furthered as far as possible, and can and then obtain after this linear transformation former Beginning data.The initial data that information processor 200 is also based on after this linear transformation is classified.Especially Ground, the initial data of such as image or text can be processed by information processor 200, with according to image or text It is classified by the primitive character of the many dimension labels being had and image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor algorithm Nearest neighbour classification algorithm.
Hereinafter, by with reference to Fig. 4 describe according to second embodiment of the present disclosure, to having the multiple original of many dimension labels The information processing 220 that data are carried out, as the another example of the information processing method that the first aspect according to the disclosure provides.Letter Breath processes 220 and such as can be performed by information processor 200.
After information processing 220 starts, initially enter step S201.In step s 201, according to having the former of many dimension labels Beginning data, generate the original feature vector of the primitive character representing this initial data, and process for each initial data Proceed to step S202.Step S201 such as can be performed by original feature vector signal generating unit 201.
In step S202, according to having the initial data of many dimension labels, generating representative for each initial data should The label of many dimension labels that initial data is had is vectorial, and process proceeds to step S203.Step S202 such as can be by Label vector signal generating unit 202 performs.
In step S203, based on the label vector about each data, for each initial data, calculate this original number According to other initial datas each label similarity in label vector space, and process proceeds to step S204.Step By label similarity, S203 such as can determine that unit 203 performs.
In step S204, for each initial data and other initial datas each, based on label similarity, determine Whether these other initial datas are the related datas of this initial data, and process proceeds to step S205.Step S204 is such as Can be determined that unit 204 performs by related data.
In step S205, original feature vector based on initial data, to each initial data, calculate this initial data With other initial datas each characteristic similarity in original feature vector space, and process proceeds to step S206.Step By characteristic similarity, rapid S205 such as can determine that unit 205 performs.
In step S206, based on about the related data relation between initial data and characteristic similarity, for often Individual initial data, selects multiple neighbour's related datas of this initial data in the related data of this initial data, and processes Proceed to step S207.Step S206 such as can be selected unit 206 to perform by neighbour's related data.
In step S207, neighbour's related data based on each initial data, with each initial data and this original number According to neighbour's related data as node, with the node of each neighbour's related data of this initial data and this initial data it Between form limit, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour, and process proceeds to walk Rapid S208.Step S207 such as can be performed by neighbour's figure signal generating unit 207 of being correlated with.
In step S208, based on the relevant figure of neighbour, solve object transformation matrix and according to this object transformation matrix to institute State multiple initial data and carry out eigentransformation, and terminate to process.Wherein, this object transformation matrix representative makes object function obtain To the linear transformation of maximum, the whole limits in this object function figure relevant to neighbour are in the feature space of linear transformation Weighting length sum negative correlation.Step S208 such as can be performed by eigentransformation unit 208.
Thus, obtained the neighbour after making eigentransformation by information processing 220 to be correlated with each neighbour's dependency number in figure The linear transformation that Weighted distance between according to is furthered as far as possible, and can and then obtain after this linear transformation former Beginning data.
Classify further, it is also possible to be based further on the initial data after this linear transformation.Especially, may be used Process with the initial data to such as image or text, with the many dimension labels being had according to image or text and It is classified by the primitive character of image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor (KNN) The nearest neighbour classification algorithm of algorithm.
(the 3rd embodiment)
First, by describing the information processor 200 according to third embodiment of the present disclosure with reference to Fig. 5, as basis The another example of the information processor that the second aspect of the disclosure provides.
Information processor 300 includes original feature vector signal generating unit 301, label vector signal generating unit 302, label phase Determine that unit 303, non-relevant data determine that unit 314, characteristic similarity determine that unit 305, neighbour's non-relevant data are selected like degree Select unit 316, neighbour's irrelevant figure signal generating unit 317 and eigentransformation unit 308.These component units functionally each with The original feature vector signal generating unit 101 of information processor 100, label vector signal generating unit 102, label similarity determine list Unit 103, non-relevant data determine unit 114, characteristic similarity determine unit 105, neighbour's non-relevant data select unit 116, Neighbour's irrelevant figure signal generating unit 117 and eigentransformation unit 108 are similar, therefore below for information processor 300 Each component units, will not be described in great detail for the function similar to the component units of information processor 100 and operation.
Original feature vector signal generating unit 301 is according to the initial data with many dimension labels received, for each former Beginning data generate the original feature vector of the primitive character representing this initial data, and provide it to characteristic similarity and determine Unit 305.
Label vector signal generating unit 302 is according to the initial data with many dimension labels received, for each original number According to generating the label vector representing many dimension labels that this initial data is had, and provide it to label similarity and determine list Unit 303.
Label similarity determines that unit 303 is vectorial based on the label about each data received, for each original Data, calculate this initial data and other initial datas each label similarity in label vector space, and are provided Unit 314 and neighbour's irrelevant figure signal generating unit 317 is determined to non-relevant data
Non-relevant data determine unit 314 for each initial data and other initial datas each, based on being received Label similarity, determine that whether these other initial datas are the non-relevant data of this initial data, and will determine that result provides Unit 316 is selected to neighbour's non-relevant data.
Characteristic similarity determines unit 305 original feature vector based on the initial data received, to each original number According to, calculate this initial data and other initial datas each characteristic similarity in original feature vector space, and carried Supply neighbour's non-relevant data selects unit 316 and neighbour's irrelevant figure signal generating unit 317.
Neighbour's non-relevant data selects unit 316 to be closed about the non-relevant data between initial data based on receive System, and the characteristic similarity received, for each initial data, select this former in the non-relevant data of this initial data Multiple neighbour's non-relevant data of beginning data, and provide it to neighbour's irrelevant figure signal generating unit 317.
Neighbour's irrelevant figure signal generating unit 317 neighbour's non-relevant data based on each initial data received, with often Neighbour's non-relevant data of individual initial data and this initial data as node, every with this initial data and this initial data Form limit between the node of individual neighbour's non-relevant data, and be that each limit sets the weight more than or equal to zero, thus formed near Adjacent irrelevant figure, and provide it to eigentransformation unit 308.
Eigentransformation unit 308, based on the irrelevant figure of the neighbour received, solves object transformation matrix and according to this target Transformation matrix carries out eigentransformation to the plurality of initial data, and exports the initial data after carrying out eigentransformation.Wherein, This object transformation matrix representative makes object function obtain the linear transformation of maximum, in this object function and the irrelevant figure of neighbour Whole limits in the weighting length sum positive correlation in the feature space of linear transformation.
The purpose that eigentransformation unit 308 carries out eigentransformation is so that each in the irrelevant figure of neighbour after eigentransformation Weighted distance between individual neighbour's non-relevant data is zoomed out as far as possible, namely realizes the object function in formula (19).
max∑I, j(aTxi-aTxj)2WIr, ij(17)
Wherein αTIt it is the linear transformation (that is, eigentransformation) that initial data is carried out.
As the institute's common method in the retaining projection eigentransformation method of local, utilize the Laplce of the irrelevant figure of neighbour , the object function realized in formula (17) can be converted into the optimization problem as shown in following formula (18):
arg max a a T XL ir X T a - - - ( 18 )
s.t.aTXDirXTA=1
Wherein Dir=diag (sum (Wir)), the Laplce item L of the irrelevant figure of neighbourir=Dir-Wir
Formula (18) is solved the generalized eigenvalue problem being equivalent to solve as the formula (19):
XWirXTA=λ XDirXTA (19)
If α12,……,αmIt is by eigenvalue 0 < λ in formula (19)12<·……<λmOrder distinguished correspondence spy Levy vector, then can obtain eigentransformation matrix A=(a1, a2..., am), wherein yi=ATxiIt it is the feature after conversion.
It should be noted that object function herein and the whole limits in the irrelevant figure of neighbour are in the feature through linear transformation Weighting length sum positive correlation in space.
Thus, to have obtained each neighbour in figure that is correlated with of the neighbour after making eigentransformation irrelevant for information processor 300 The linear transformation that Weighted distance between data is zoomed out as far as possible, and can and then obtain after this linear transformation Initial data.The initial data that information processor 300 is also based on after this linear transformation is classified.Especially Ground, the initial data of such as image or text can be processed by information processor 300, with according to image or text It is classified by the primitive character of the many dimension labels being had and image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor (KNN) The nearest neighbour classification algorithm of algorithm.
Hereinafter, by with reference to Fig. 6 describe according to third embodiment of the present disclosure, to having the multiple original of many dimension labels The information processing 320 that data are carried out, as the example of the information processing method that the second aspect according to the disclosure provides.At information Reason 320 such as can be performed by information processor 300.
After information processing 320 starts, initially enter step S301.In step S301, according to having the former of many dimension labels Beginning data, generate the original feature vector of the primitive character representing this initial data, and process for each initial data Proceed to step S302.Step S301 such as can be performed by original feature vector signal generating unit 301.
In step s 302, according to having the initial data of many dimension labels, generating representative for each initial data should The label of many dimension labels that initial data is had is vectorial, and process proceeds to step S303.Step S302 such as can be by Label vector signal generating unit 302 performs.
In step S303, based on the label vector about each data, for each initial data, calculate this original number According to other initial datas each label similarity in label vector space, and process proceeds to step S304.Step By label similarity, S303 such as can determine that unit 303 performs.
In step s 304, for each initial data and other initial datas each, based on label similarity, determine Whether these other initial datas are the non-relevant data of this initial data, and process proceeds to step S305.Step S304 example As determined that unit 314 performs by non-relevant data.
In step S305, original feature vector based on initial data, to each initial data, calculate this initial data With other initial datas each characteristic similarity in original feature vector space, and process proceeds to step S306.Step By characteristic similarity, rapid S305 such as can determine that unit 305 performs.
In step S306, based on about the non-relevant data relation between initial data and characteristic similarity, for Each initial data, selects multiple neighbour's non-relevant data of this initial data in the non-relevant data of this initial data, and And process proceeds to step S307.Step S306 such as can be selected unit 316 to perform by neighbour's non-relevant data.
In step S307, neighbour's non-relevant data based on each initial data is original with this with each initial data Neighbour's non-relevant data of data as node, with each neighbour's non-relevant data of this initial data and this initial data Form limit between node, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour, and process Proceed to step S308.Step S307 such as can be performed by neighbour's irrelevant figure signal generating unit 317.
In step S308, based on the irrelevant figure of neighbour, solve object transformation matrix and according to this object transformation matrix pair The plurality of initial data carries out eigentransformation, and terminates to process.Wherein, this object transformation matrix representative makes object function Obtaining the linear transformation of maximum, this object function and the whole limits in the irrelevant figure of neighbour are empty in the feature through linear transformation The positive negative correlation of the weighting length between.Step S308 such as can be performed by eigentransformation unit 308.
Thus, the non-phase of each neighbour in the irrelevant figure of the neighbour after making eigentransformation has been obtained by information processing 320 Close the linear transformation that the Weighted distance between data is zoomed out as far as possible, and can and then obtain after this linear transformation Initial data.
Classify further, it is also possible to be based further on the initial data after this linear transformation.Especially, may be used Process with the initial data to such as image or text, with the many dimension labels being had according to image or text and It is classified by the primitive character of image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor algorithm Nearest neighbour classification algorithm.
(hardware configuration embodiment)
Each component units, subelement etc. in the information processor that above-mentioned basis embodiment of the disclosure can pass through The mode of software, firmware, hardware or its combination in any configures.In the case of being realized by software or firmware, can be from depositing Storage media or network are installed to the machine (the such as general-purpose machinery 700 shown in Fig. 7) with specialized hardware structure and are constituted this software Or the program of firmware, this machine is when being provided with various program, it is possible to perform the various merits of above-mentioned each component units, subelement Energy.
Fig. 7 is to schematically show to can be used to realize at according to the information processing method that embodiment of the disclosure and information A kind of structure diagram of the hardware configuration of the possible messaging device of reason device.
In the figure 7, CPU (CPU) 701 is according to the program stored in read only memory (ROM) 702 or from depositing Storage part 708 is loaded into the program of random-access memory (ram) 703 and performs various process.In RAM 703, always according to needing Store the data required when CPU 701 performs various process etc..CPU 701, ROM 702 and RAM 703 are via bus 704 are connected to each other.Input/output interface 705 is also connected to bus 704.
Components described below is also connected to input/output interface 705: importation 706(includes keyboard, mouse etc.), output unit Point 707(includes display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc. and speaker etc.), storage part 708 (including hard disk etc.), communications portion 709(include NIC such as Local Area Network card, modem etc.).Communication unit 709 are divided to perform communication process via network such as the Internet.As required, driver 710 can be connected to input/output interface 705.Detachable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc. can be installed in as required and drive On dynamic device 710 so that the computer program read out can be installed to store in part 708 as required.
In the case of realizing above-mentioned series of processes by software, can be from network such as the Internet or from storage medium example As detachable media 711 installs the program of composition software.
It will be understood by those of skill in the art that this storage medium be not limited to wherein having program stored therein shown in Fig. 7, The detachable media 711 of the program that provides a user with is distributed separately with equipment.The example of detachable media 711 comprises disk (comprising floppy disk), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini Dish (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be to comprise in ROM 702, storage part 708 Hard disk etc., wherein computer program stored, and be distributed to user together with the equipment comprising them.
Additionally, the disclosure also proposed the program product that a kind of storage has the instruction code of machine-readable.Described instruction When code is read by machine and performs, above-mentioned document processing method according to embodiments of the present invention can be performed.Correspondingly, it is used for holding The various storage mediums of the such as disk, CD, magneto-optic disk, semiconductor memory etc. that carry this program product are also included within these public affairs In the technical scheme opened.
It should be further understood that each operating process according to the information processing method that embodiment of the disclosure can also be with storage The mode of the computer executable program in various machine-readable storage mediums realizes.
It should be noted that can be individual components according to each component units of the information processor that embodiment of the disclosure, also The function of several component units can be realized by the parts of.
Furthermore it is noted that each step of the information processing method according to the disclosure, it is not necessary to according to described in the disclosure Order carry out, but can perform with executed in parallel or according to calling, such as, in information processing 120, step S102 is not Necessarily must carry out after step slol, step S103 not necessarily must be carried out after step s 102, and step S106 differs Surely must carry out after either step in step S101 and S103 to S105, step S107 and S108 not necessarily must be in steps Carrying out after rapid S106, step S109 and S110 not necessarily must be carried out after step S107 or S108.In information processing 220 Be also similar in 320.
Although having shown that and describing preferred embodiment of the present disclosure, it is contemplated that those skilled in the art can be in institute The design various amendments to the present invention in attached spirit and scope by the claims.
Understand according to above description, embodiment of the disclosure and disclose techniques below scheme, but be not limited to this:
1. 1 kinds of information processors of technical scheme, for carrying out feature change to multiple initial datas with many dimension labels Changing, described information processor includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent this initial data The original feature vector of primitive character;
Label vector signal generating unit, it is configured to for each initial data, generates and represents this initial data and is had Many dimension labels label vector;
Label similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its His initial data label similarity in label vector space;
Related data determines unit, and it is configured to for each initial data, based on other initial datas each with should The label similarity of initial data determines that whether these other initial datas are the related datas of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its His initial data characteristic similarity in original feature vector space;
Neighbour's related data selects unit, and it is configured to for each initial data, based on this initial data each Related data and the characteristic similarity of this initial data, select the multiple of this initial data in the related data of this initial data Neighbour's related data;
Neighbour is correlated with figure signal generating unit, and it is configured to neighbour's related data of each initial data He this initial data As node, between the node of each neighbour's related data corresponding to this initial data and this initial data, form limit, and And be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to described many Individual initial data carries out eigentransformation, and wherein, described object transformation matrix representative makes object function obtain the linear of maximum Conversion, whole limits the adding in the feature space through described linear transformation in described object function figure relevant to described neighbour Power length sum negative correlation.
Technical scheme 2. is according to the information processor described in technical scheme 1, wherein
Described label similarity determines that unit is further configured to according to each initial data and other original number each Described label similarity is calculated according to the distance in label vector space and label correlation matrix.
Technical scheme 3. is according to the information processor described in technical scheme 2, wherein
Each initial data and other initial datas each distance in label vector space be COS distance or Europe several In obtain distance.
Technical scheme 4. is according to the information processor according to any one of technical scheme 1 to 3, wherein
Described related data determines that unit is further configured to for each initial data and other initial datas each, When the label similarity of this initial data and these other initial datas is more than or equal to the first label threshold value, determine these other original Data are the related datas of this initial data.
Technical scheme 5. is according to the information processor according to any one of technical scheme 1 to 3, wherein
Described characteristic similarity determines that unit is further configured to according to each initial data and other original number each Described label similarity is calculated according to the distance in original feature vector space.
Technical scheme 6. is according to the information processor described in technical scheme 5, wherein
Each initial data and other initial datas each distance in characteristic vector space are Euclidean distances, graceful Hatton's distance or card side's distance.
Technical scheme 7. is according to the information processor according to any one of technical scheme 1 to 6, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, is with right by the weight setting on this limit Characteristic similarity between two initial datas of two nodes that should be connected in this limit and at least in label similarity Item positive correlation.
Technical scheme 8. is according to the information processor according to any one of technical scheme 1 to 7, wherein
Described neighbour's related data selects unit to be further configured to for each initial data, at this initial data In all related datas, the related data of the first predetermined number that selection is maximum with the characteristic similarity of this initial data is as this Neighbour's related data of initial data.
Technical scheme 9. is according to the information processor described in technical scheme 8, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, if one be connected with this limit Other initial datas of described first predetermined number that the characteristic similarity of the initial data corresponding to node is maximum include with The initial data corresponding to another node that this edge is connected, then be 1 by the weight setting on this limit, otherwise by the power on this limit Reset the characteristic similarity between two initial datas being set to two nodes being connected with corresponding to this limit and label is similar At least one positive correlation in degree and less than or equal to 1.
Technical scheme 10. is according to the information processor according to any one of technical scheme 1 to 9, and it also includes:
Non-relevant data determines unit, and it is configured to for each initial data, based on this initial data with each its The label similarity of his initial data determines that whether other initial datas each are the non-relevant data of this initial data;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, based on this initial data every Individual non-relevant data and the characteristic similarity of this initial data, select this initial data in the non-relevant data of this initial data Multiple neighbour's non-relevant data;And
Neighbour's irrelevant figure signal generating unit is configured to the irrelevant number of the neighbour with each initial data He this initial data According to as node, formed between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data Limit, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour;And wherein
Whole limits in described object function figure relevant to described neighbour are in the feature space through described linear transformation Weighting length sum negative correlation, and the whole limits in figure irrelevant with described neighbour are in the feature through described linear transformation Weighting length sum positive correlation in space.
Technical scheme 11. is according to the information processor described in technical scheme 10, wherein
Described related data determines that unit is further configured to for each initial data and other initial datas each, When the label similarity of this initial data and these other initial datas is more than or equal to the first label threshold value, determine these other original Data are the related datas of this initial data;
Described non-relevant data determines that unit is further configured to for each initial data and other original number each According to, when the label similarity of this initial data and these other initial datas is less than the second label threshold value, determine these other original Data are the non-relevant data of this initial data;And
Described first label threshold value is more than or equal to described second label threshold value.
Technical scheme 12. is according to the information processor described in technical scheme 10 or 11, wherein
Described neighbour's non-relevant data selects unit to be further configured to for each initial data, at this initial data Non-relevant data in, select the non-relevant data conduct of second predetermined number maximum with the characteristic similarity of this initial data Neighbour's non-relevant data of this initial data.
Technical scheme 13. is according to the information processor according to any one of technical scheme 10 to 12, wherein
Described neighbour irrelevant figure signal generating unit is further configured to for each edge, by the weight setting on this limit be with In characteristic similarity between two initial datas of two nodes connected corresponding to this limit and label similarity at least One positive correlation.
Technical scheme 14. is according to the information processor according to any one of technical scheme 10 to 13, wherein
Described neighbour irrelevant figure signal generating unit is further configured to for each edge, if two that this limit is connected The characteristic similarity between two initial datas corresponding to node is more than the institute of one of them initial data with this initial data There is a maximum in the characteristic similarity of related data, be then two the node institutes being connected with this limit by the weight setting on this limit The corresponding characteristic similarity positive correlation between two initial datas, is otherwise 0 by the weight setting on this limit.
Technical scheme 15. is according to the information processor according to any one of technical scheme 1 to 14, and it is described by carrying out Described initial data is classified by eigentransformation.
Technical scheme 16. is according to the information processor described in technical scheme 15, and wherein said multiple initial datas are many Individual view data or multiple text data.
17. 1 kinds of information processors of technical scheme, for carrying out feature to multiple initial datas with many dimension labels Conversion, described information processor includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent this initial data The original feature vector of primitive character;
Label vector signal generating unit, it is configured to for each initial data, generates and represents this initial data and is had Many dimension labels label vector;
Label similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its His initial data label similarity in label vector space, and determine this other original number based on this label similarity According to the non-relevant data being whether this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its His initial data characteristic similarity in original feature vector space;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, owning at this initial data In non-relevant data, based on the irrelevant number of multiple neighbours selecting this initial data with the characteristic similarity of this initial data According to;
Neighbour's irrelevant figure signal generating unit, it is irrelevant that it is configured to the neighbour with each initial data He this initial data Data, as node, are formed between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data Limit, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to described many Individual initial data carries out eigentransformation, and wherein, described object transformation matrix representative makes object function obtain the linear of maximum Conversion, described object function and the whole limits in the irrelevant figure of described neighbour are in the feature space through described linear transformation Weighting length sum positive correlation.
18. 1 kinds of information processing methods of technical scheme, it is for carrying out spy to multiple initial datas with many dimension labels Levying conversion, described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this Whether other initial datas are the related datas of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space Characteristic similarity;
For each initial data, each related data based on this initial data and the feature similarity of this initial data Degree, selects multiple neighbour's related datas of this initial data in the related data of this initial data;
Using neighbour's related data of each initial data and this initial data as node, corresponding to this initial data with Form limit between the node of each neighbour's related data of this initial data, and be that each limit sets the power more than or equal to zero Weight, thus form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation, Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described The whole limits that neighbour is correlated with in figure weighting length sum negative correlation in the feature space through described linear transformation.
19. 1 kinds of information processing methods of technical scheme, it is for carrying out spy to multiple initial datas with many dimension labels Levying conversion, described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this Whether other initial datas are the non-relevant data of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space Characteristic similarity;
For each initial data, each non-relevant data based on this initial data and the feature similarity of this initial data Degree, selects multiple neighbour's non-relevant data of this initial data in the non-relevant data of this initial data;
Using neighbour's non-relevant data of each initial data and this initial data as node, corresponding to this initial data And form limit between the node of each neighbour's non-relevant data of this initial data, and be that each limit sets more than or equal to zero Weight, thus form the irrelevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation, Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described The weighting length sum positive correlation in the feature space through described linear transformation of the whole limits in the irrelevant figure of neighbour.
The computer program that technical scheme 20. 1 kinds can be performed by the equipment of calculating, described computer program is upon execution Described calculating equipment can be made to perform for the information processing that multiple initial datas with many dimension labels carry out eigentransformation Method, described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this Whether other initial datas are the related datas of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space Characteristic similarity;
For each initial data, each related data based on this initial data and the feature similarity of this initial data Degree, selects multiple neighbour's related datas of this initial data in the related data of this initial data;
Using neighbour's related data of each initial data and this initial data as node, corresponding to this initial data with Form limit between the node of each neighbour's related data of this initial data, and be that each limit sets the power more than or equal to zero Weight, thus form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation, Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described The whole limits that neighbour is correlated with in figure weighting length sum negative correlation in the feature space through described linear transformation.
21. 1 kinds of computer-readable recording mediums of technical scheme, on it, storage has the calculating that can be performed by calculating equipment Machine program, described computer program can make described calculating equipment perform for having the multiple former of many dimension labels upon execution Beginning data carry out the information processing method of eigentransformation, and described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this Whether other initial datas are the related datas of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space Characteristic similarity;
For each initial data, each related data based on this initial data and the feature similarity of this initial data Degree, selects multiple neighbour's related datas of this initial data in the related data of this initial data;
Using neighbour's related data of each initial data and this initial data as node, corresponding to this initial data with Form limit between the node of each neighbour's related data of this initial data, and be that each limit sets the power more than or equal to zero Weight, thus form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation, Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described The whole limits that neighbour is correlated with in figure weighting length sum negative correlation in the feature space through described linear transformation.
Although technical scheme and the advantage thereof of the disclosure has been described in detail it should be appreciated that without departing from by appended The spirit and scope of the present invention that limited of claim in the case of can carry out various change, substitute and convert.And, The scope of the present disclosure is not limited only to the process described by description, equipment, manufacture, the structure of material, means, method and steps Specific embodiment.One of ordinary skilled in the art will readily appreciate that from the disclosure, permissible according to the present invention Use perform the function essentially identical to corresponding embodiment described herein or the acquisition result essentially identical with it, existing With process, equipment, manufacture, the structure of material, means, method or step the most to be developed.Therefore, appended right is wanted Ask and be intended in the range of them include such process, equipment, manufacture, the structure of material, means, method or step.
Although combine accompanying drawing above to describe in detail and embodiment of the disclosure, it is to be understood that reality described above The mode of executing is only intended to illustrate the technical scheme of the disclosure, and is not intended that the restriction of technical scheme of this disclosure.For this For the technical staff in field, above-mentioned embodiment can be made various changes and modifications the essence without departing from the present invention And scope.Therefore, the scope of the present disclosure is only limited by appended claim and equivalents thereof.

Claims (10)

1. an information processor, for carrying out eigentransformation, described information to multiple initial datas with many dimension labels Processing means includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent the original of this initial data The original feature vector of feature;
Label vector signal generating unit, it is configured to for each initial data, and generation represents this initial data and had many The label vector of dimension label;
Label similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former Beginning data label similarity in label vector space;
Related data determines unit, and it is configured to for each initial data, original with this based on other initial datas each The label similarity of data determines that whether these other initial datas are the related datas of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former Beginning data characteristic similarity in original feature vector space;
Neighbour's related data selects unit, and it is configured to for each initial data, based on this initial data each relevant Data and the characteristic similarity of this initial data, select multiple neighbours of this initial data in the related data of this initial data Related data;
Neighbour is correlated with figure signal generating unit, its be configured to using neighbour's related data of each initial data and this initial data as Node, forms limit between the node of each neighbour's related data corresponding to this initial data and this initial data, and is Each limit sets the weight more than or equal to zero, thus forms the relevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to the plurality of former Beginning data carry out eigentransformation, and wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, The weighting in the feature space through described linear transformation of the whole limits in described object function figure relevant to described neighbour is long Degree sum negative correlation so that the Weighted distance that the neighbour after eigentransformation is correlated with in figure between each neighbour's related data is drawn Closely.
Information processor the most according to claim 1, wherein
Described label similarity determines unit to be further configured to according to each initial data and other initial datas each to exist Distance and label correlation matrix in label vector space calculate described label similarity.
Information processor the most according to claim 1 and 2, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, by the weight setting on this limit for and correspond to Characteristic similarity between two initial datas of two nodes that this limit is connected and at least one in label similarity are just Relevant.
Information processor the most according to claim 1 and 2, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, if the node being connected with this limit Other initial datas of the first predetermined number that the characteristic similarity of corresponding initial data is maximum include and this edge institute The initial data corresponding to another node connected, then be 1 by the weight setting on this limit, by the weight setting on this limit be otherwise In characteristic similarity between two initial datas of two nodes being connected with corresponding to this limit and label similarity extremely One item missing positive correlation and less than or equal to 1.
Information processor the most according to claim 1 and 2, it also includes:
Non-relevant data determines unit, and it is configured to for each initial data, based on this initial data with each other are former The label similarity of beginning data determines that whether other initial datas each are the non-relevant data of this initial data;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, based on this initial data each non- Related data and the characteristic similarity of this initial data, select this initial data many in the non-relevant data of this initial data Individual neighbour's non-relevant data;And
Neighbour's irrelevant figure signal generating unit is configured to make with neighbour's non-relevant data of each initial data and this initial data For node, between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data, form limit, and And be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour;And wherein
Whole limits adding in the feature space through described linear transformation in described object function figure relevant to described neighbour Whole limits in power length sum negative correlation, and figure irrelevant with described neighbour are at the feature space through described linear transformation In weighting length sum positive correlation.
Information processor the most according to claim 5, wherein
Described neighbour irrelevant figure signal generating unit is further configured to for each edge, if two nodes that this limit is connected The corresponding characteristic similarity between two initial datas is more than all phases of one of them initial data with this initial data Close the maximum in the characteristic similarity of data, then by the weight setting on this limit corresponding to two nodes being connected with this limit Two initial datas between characteristic similarity positive correlation, be otherwise 0 by the weight setting on this limit.
Information processor the most according to claim 1 and 2, it comes described original number by carrying out described eigentransformation According to classifying.
Information processor the most according to claim 7, wherein said multiple initial datas be multiple view data or Multiple text datas.
9. an information processor, for carrying out eigentransformation, described information to multiple initial datas with many dimension labels Processing means includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent the original of this initial data The original feature vector of feature;
Label vector signal generating unit, it is configured to for each initial data, and generation represents this initial data and had many The label vector of dimension label;
Label similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former Beginning data label similarity in label vector space, and determine that these other initial datas are based on this label similarity No is the non-relevant data of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former Beginning data characteristic similarity in original feature vector space;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, in all non-phase of this initial data Close in data, based on the multiple neighbour's non-relevant data selecting this initial data with the characteristic similarity of this initial data;
Neighbour's irrelevant figure signal generating unit, it is configured to neighbour's non-relevant data of each initial data He this initial data As node, between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data, form limit, And it is that each limit sets the weight more than or equal to zero, thus forms the irrelevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to the plurality of former Beginning data carry out eigentransformation, and wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, The weighting in the feature space through described linear transformation of described object function and the whole limits in the irrelevant figure of described neighbour Length sum positive correlation so that Weighted distance between each neighbour's non-relevant data in the irrelevant figure of neighbour after eigentransformation Zoomed out.
10. an information processing method, it is for carrying out eigentransformation, described letter to multiple initial datas with many dimension labels Breath processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each label phase in label vector space Like degree;
For each initial data, label similarity based on other initial datas each Yu this initial data determine these other Whether initial data is the related data of this initial data;
For each initial data, calculate this initial data and other initial datas each spy in original feature vector space Levy similarity;
For each initial data, each related data based on this initial data and the characteristic similarity of this initial data, The related data of this initial data selects multiple neighbour's related datas of this initial data;
Using neighbour's related data of each initial data and this initial data as node, former with this corresponding to this initial data Form limit between the node of each neighbour's related data of beginning data, and be that each limit sets the weight more than or equal to zero, from And form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation, its In, described object transformation matrix representative makes the linear transformation that object function obtains maximum, described object function with described closely The whole limits in adjacent relevant figure weighting length sum negative correlation in the feature space through described linear transformation so that feature The Weighted distance that neighbour after conversion is correlated with in figure between each neighbour's related data is furthered.
CN201210152699.4A 2012-05-16 2012-05-16 Information processor and information processing method Active CN103425666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210152699.4A CN103425666B (en) 2012-05-16 2012-05-16 Information processor and information processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210152699.4A CN103425666B (en) 2012-05-16 2012-05-16 Information processor and information processing method

Publications (2)

Publication Number Publication Date
CN103425666A CN103425666A (en) 2013-12-04
CN103425666B true CN103425666B (en) 2016-12-14

Family

ID=49650424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210152699.4A Active CN103425666B (en) 2012-05-16 2012-05-16 Information processor and information processing method

Country Status (1)

Country Link
CN (1) CN103425666B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069483B (en) * 2015-08-21 2019-01-01 中国地质大学(武汉) The method that a kind of pair of categorized data set is tested
CN107305543B (en) * 2016-04-22 2021-05-11 富士通株式会社 Method and device for classifying semantic relation of entity words
CN111428251B (en) * 2020-03-18 2023-04-28 北京明略软件系统有限公司 Data processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320370B (en) * 2008-05-16 2011-06-01 苏州普达新信息技术有限公司 Deep layer web page data source sort management method based on query interface connection drawing
CN101515328B (en) * 2008-12-18 2012-05-09 东华大学 Local projection preserving method for identification of statistical noncorrelation
CN102024262B (en) * 2011-01-06 2012-07-04 西安电子科技大学 Method for performing image segmentation by using manifold spectral clustering

Also Published As

Publication number Publication date
CN103425666A (en) 2013-12-04

Similar Documents

Publication Publication Date Title
Sayed et al. A binary clonal flower pollination algorithm for feature selection
Mallya et al. Learning informative edge maps for indoor scene layout prediction
Karger et al. Iterative learning for reliable crowdsourcing systems
Jonsbråten et al. A class of stochastic programs withdecision dependent random elements
Joy et al. Batch Bayesian optimization using multi-scale search
Wang et al. Efficient learning by directed acyclic graph for resource constrained prediction
Al Mashrgy et al. Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted dirichlet mixture models
Drton et al. Binary models for marginal independence
Bajer et al. A population initialization method for evolutionary algorithms based on clustering and Cauchy deviates
US8977579B2 (en) Latent factor dependency structure determination
CN112364942B (en) Credit data sample equalization method and device, computer equipment and storage medium
CN108021930A (en) A kind of adaptive multi-view image sorting technique and system
CN103425666B (en) Information processor and information processing method
Xia et al. Incomplete multi-view clustering via kernelized graph learning
Pham et al. Unsupervised training of Bayesian networks for data clustering
Hubin et al. Flexible Bayesian nonlinear model configuration
Hubin et al. Deep Bayesian regression models
Liu et al. Semi-supervised stochastic blockmodel for structure analysis of signed networks
Niezgoda et al. Unsupervised learning for efficient texture estimation from limited discrete orientation data
Unal et al. Quantifying tradeoffs to reduce the dimensionality of complex design optimization problems and expedite trade space exploration
Huang et al. Operator-adapted evolutionary large-scale multiobjective optimization for voltage transformer ratio error estimation
Montesinos López et al. Reproducing Kernel Hilbert spaces regression and classification methods
Parvin et al. CCHR: combination of classifiers using heuristic retraining
Sun et al. Causal reasoning by evaluating the complexity of conditional densities with kernel methods
Platon et al. Localized multiple sources self-organizing map

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant