CN103425666B - Information processor and information processing method - Google Patents
Information processor and information processing method Download PDFInfo
- Publication number
- CN103425666B CN103425666B CN201210152699.4A CN201210152699A CN103425666B CN 103425666 B CN103425666 B CN 103425666B CN 201210152699 A CN201210152699 A CN 201210152699A CN 103425666 B CN103425666 B CN 103425666B
- Authority
- CN
- China
- Prior art keywords
- initial data
- data
- initial
- neighbour
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of information processor and a kind of information processing method, for multiple initial datas with many dimension labels are carried out eigentransformation.This information processing method includes: calculates each initial data and other initial data label similarity each and determines that whether these other initial datas are the related datas of this initial data based on this, select multiple neighbour's related datas of this initial data, form the relevant figure of neighbour, and solve the object transformation matrix for carrying out eigentransformation, this object transformation matrix representative makes object function obtain the linear transformation of maximum, the weighting length sum negative correlation in the feature space through described linear transformation of the whole limits in this object function figure relevant to neighbour.Technical scheme according to the disclosure, it is possible to achieve multiple initial datas with many dimension labels are carried out locality preserving projections (LPP) eigentransformation, and then can preferably initial data be classified.
Description
Technical field
It relates to a kind of information processor and information processing method, particularly relate to a kind of for having multidimensional mark
The multiple initial datas signed carry out information processor and the information processing method of eigentransformation.
Background technology
In categorizing process, generally require and first data are carried out eigentransformation data are carried out, so that eigentransformation
Become the key technology of data classification.This is in order at following reason.On the one hand, in data sorting task, such as image
Or the data of text typically obtain at the submanifold up-sampling around theorem in Euclid space, say, that these data are not
Being distributed across in the theorem in Euclid space of " smooth ", the primitive character of these data is not appropriate for being analyzed in theorem in Euclid space,
It is thus desirable to these data are carried out eigentransformation.On the other hand, the primitive character of these data often has higher dimension,
Directly these data carry out classification and will run into dimension calamity and (see " the On adaptive of R.Bellman and R.Kalab a
Control processes ", IRE Trans actions onAutomatic Control, roll up 4,1959).
Currently, locality preserving projections (Locality Preserving Projection, LPP) eigentransformation method is one
Planting very conventional local keeps eigentransformation method (to see " the Locality preserving of X.F.He and P.Niyogi
Projections ", Advances in neuralinformation processing systems, roll up 16,2004).At this
In method, first primitive character and data category according to data comes for one adjacent non-directed graph of all data construct, so
After minimize Laplce's item of this non-directed graph, in the hope of projective transformation matrix (matrix of a linear transformation).Due to what LPP was carried out it is
Linear transformation and can retain the partial structurtes of data, the operand needed for therefore carrying out LPP eigentransformation is relatively small, can
To perform quickly and to be suitable for processing to up-sample, at manifold structure, the data obtained.
Summary of the invention
But the shortcoming of LPP is that it only remains the local neighbor structure of data (that is, in the primitive character of data
Local neighbor information), but can not utilize the label information that data are had.It addition, the method cannot process has many dimension labels
Data.
Therefore, present disclosure proposes a kind of letter for multiple initial datas with many dimension labels being carried out eigentransformation
Breath processing means and information processing method, it can be while the local neighbor information in the primitive character retaining data, profit
The label information being had by data.Additionally, information processor and information processing method according to the disclosure the most alternatively can
Enough consider that the association in the presence of many dimension labels that data are had is to carry out eigentransformation.
According to embodiment of the disclosure, it is provided that a kind of information processor, for having the multiple former of many dimension labels
Beginning data carry out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for each
Initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is configured
For for each initial data, generating the label vector representing many dimension labels that this initial data is had;Label similarity is true
Cell, it is configured to for each initial data, calculates this initial data with other initial datas each at label vector
Label similarity in space;Related data determines unit, and it is configured to for each initial data, and other are former based on each
The label similarity of beginning data and this initial data determines that whether these other initial datas are the related datas of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and other original number each
According to the characteristic similarity in original feature vector space;Neighbour's related data selects unit, and it is configured to for each former
The characteristic similarity of beginning data, each related data based on this initial data and this initial data, in the phase of this initial data
Close the multiple neighbour's related datas selecting this initial data in data;Neighbour is correlated with figure signal generating unit, and it is configured to each
Neighbour's related data of initial data and this initial data as node, every corresponding to this initial data and this initial data
Form limit between the node of individual neighbour's related data, and be that each limit sets the weight more than or equal to zero, thus form neighbour
Relevant figure;And eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to institute
Stating multiple initial data and carry out eigentransformation, wherein, described object transformation matrix representative makes object function obtain maximum
Linear transformation, the whole limits in described object function figure relevant to described neighbour are in the feature space through described linear transformation
Weighting length sum negative correlation.
According to embodiment of the disclosure, additionally provide a kind of information processor, for having the multiple of many dimension labels
Initial data carries out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for often
Individual initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is joined
It is set to, for each initial data, generate the label vector representing many dimension labels that this initial data is had;Label similarity
Determining unit, it is configured to for each initial data, calculate this initial data and other initial datas each label to
Label similarity in quantity space;Non-relevant data determines unit, and it is configured to for each initial data, based on each its
His initial data and the label similarity of this initial data determine that whether these other initial datas are the non-phases of this initial data
Close data;Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each other
Initial data characteristic similarity in original feature vector space;Neighbour's non-relevant data selects unit, and it is configured to pin
To each initial data, each non-relevant data based on this initial data and the characteristic similarity of this initial data, former at this
The non-relevant data of beginning data selects multiple neighbour's non-relevant data of this initial data;Neighbour's irrelevant figure signal generating unit,
It is configured to using neighbour's non-relevant data of each initial data and this initial data as node, corresponding to this original number
According to forming limit between the node of each neighbour's non-relevant data of this initial data, and it is that each limit sets more than or equal to zero
Weight, thus form the irrelevant figure of neighbour;And eigentransformation unit, it is configured to solve object transformation matrix basis
This object transformation matrix carries out eigentransformation to multiple initial datas, and wherein, this object transformation matrix representative makes object function
Obtaining the linear transformation of maximum, this object function and the whole limits in the irrelevant figure of described neighbour are through this linear transformation
Weighting length sum positive correlation in feature space.
According to embodiment of the disclosure, additionally providing a kind of information processing method, it is for many to having many dimension labels
Individual initial data carries out eigentransformation.This information processing method includes: for each initial data, generates and represents this initial data
The original feature vector of primitive character;For each initial data, generate and represent many dimension labels that this initial data is had
Label vector;For each initial data, calculate this initial data and other initial datas each in label vector space
Label similarity;For each initial data, label similarity based on other initial datas each with this initial data are come
Determine that whether these other initial datas are the related datas of this initial data;For each initial data, calculate this initial data
With other initial datas each characteristic similarity in original feature vector space;For each initial data, former based on this
Each related data of beginning data and the characteristic similarity of this initial data, select this former in the related data of this initial data
Multiple neighbour's related datas of beginning data;Using neighbour's related data of each initial data and this initial data as node,
Corresponding to forming limit between the node of this initial data and each neighbour's related data of this initial data, and set for each limit
Surely the weight more than or equal to zero, thus form the relevant figure of neighbour;And solve object transformation matrix and according to this object transformation square
Battle array carries out eigentransformation to multiple initial datas, and wherein, this object transformation matrix representative makes object function obtain maximum
Linear transformation, the whole limits in this object function figure relevant to described neighbour are in the feature space through described linear transformation
Weighting length sum negative correlation.
According to embodiment of the disclosure, additionally providing a kind of program, it is used for so that computer equipment performs above-mentioned information
Processing method, for carrying out eigentransformation to multiple initial datas with many dimension labels.
According to embodiment of the disclosure, additionally providing corresponding computer-readable recording medium, this computer-readable stores
On medium, storage has the program that can be performed by calculating equipment, and described program can make described calculating equipment perform upon execution
State information processing method.
The information processor proposed according to the disclosure and information processing method, it is possible at the primitive character retaining data
In local neighbor information while, utilize the label information that data are had.Additionally, according to the information processor of the disclosure
The association in the presence of the many dimension labels being had in view of data also it is optionally able to carry out feature with information processing method
Conversion.
The brief overview of the technical scheme about the disclosure given above, in order to technical side of this disclosure is provided
Basic comprehension in terms of some of case.It is not the exhaustive of the technical scheme about the disclosure it should be appreciated that outlined above
Property general introduction.Key or the pith being not intended to determine the technical scheme of the disclosure outlined above, is not intended limitation
The scope of the technical scheme of the disclosure.Its purpose is only to provide some concept in simplified form, in this, as discussing after a while
Preamble in greater detail.
By the detailed description below in conjunction with accompanying drawing preferred embodiment of this disclosure, these of the technical scheme of the disclosure
And other advantages will be apparent from.
Accompanying drawing explanation
The technical scheme of the disclosure can be by with reference to preferably being managed below in association with the description given by accompanying drawing
Solve, employ same or analogous reference the most in all of the figs to represent same or like parts.Described attached
Figure comprises in this manual together with detailed description below and forms the part of this specification, and is used for into one
Step illustrates preferred embodiment of the present disclosure and explains the principle and advantage of the disclosure.In the accompanying drawings:
Fig. 1 is the block diagram of the structure schematically showing the information processor according to first embodiment of the present disclosure;
Fig. 2 is the flow chart schematically showing the information processing method according to first embodiment of the present disclosure;
Fig. 3 is the block diagram of the structure schematically showing the information processor according to second embodiment of the present disclosure;
Fig. 4 is the flow chart schematically showing the information processing method according to second embodiment of the present disclosure;
Fig. 5 is the block diagram of the structure schematically showing the information processor according to third embodiment of the present disclosure;
Fig. 6 is the flow chart schematically showing the information processing method according to third embodiment of the present disclosure;
Fig. 7 is to schematically show to can be used to realize at according to the information processing method that embodiment of the disclosure and information
A kind of structure diagram of the hardware configuration of the possible messaging device of reason device.
It will be appreciated by those skilled in the art that each building block in accompanying drawing be only used to simple and clear for the sake of and show
Go out, and be not necessarily drawn to scale.Such as, in accompanying drawing, the size of some building block may form relative to other
Parts are exaggerated, in order to be favorably improved understanding of the embodiments of the disclosed embodiments.
Detailed description of the invention
It is described hereinafter in connection with accompanying drawing preferred embodiment of this disclosure.For clarity and conciseness, exist
Description does not describe all features of actual embodiment.It should be understood, however, that developing any this practical embodiments
During must make much specific to the decision of embodiment, in order to realize the objectives of developer, such as, meet
Those restrictive conditions relevant to system and business, and these restrictive conditions may along with the difference of embodiment
Change.Additionally, it also should be appreciated that, although development is likely to be extremely complex and time-consuming, but to having benefited from the disclosure
For the those skilled in the art held, this development is only routine task.
Here, also need to explanation a bit, in order to avoid having obscured the technical side of the disclosure because of unnecessary details
Case, illustrate only and the closely-related apparatus structure of technical scheme according to the disclosure and/or process step in the accompanying drawings, and
Eliminate other details little with the technical scheme relation of the disclosure.
First aspect according to the disclosure, it is provided that a kind of information processor, for having the multiple of many dimension labels
Initial data carries out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for often
Individual initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is joined
It is set to, for each initial data, generate the label vector representing many dimension labels that this initial data is had;Label similarity
Determining unit, it is configured to for each initial data, calculate this initial data and other initial datas each label to
Label similarity in quantity space;Related data determines unit, and it is configured to for each initial data, based on each other
The label similarity of initial data and this initial data determines that whether these other initial datas are the dependency numbers of this initial data
According to;Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are original
Data characteristic similarity in original feature vector space;Neighbour's related data selects unit, and it is configured to for each
The characteristic similarity of initial data, each related data based on this initial data and this initial data, at this initial data
Related data selects multiple neighbour's related datas of this initial data;Neighbour is correlated with figure signal generating unit, and it is configured to often
Individual initial data, as node, forms limit between this initial data and each neighbour's related data of this initial data, and
Set the weight more than or equal to zero for each limit, thus form the relevant figure of neighbour;And eigentransformation unit, it is configured to ask
Solve object transformation matrix and according to this object transformation matrix, multiple initial datas carried out eigentransformation, wherein, this object transformation
Matrix representative makes object function obtain the linear transformation of maximum, the whole limits in this object function figure relevant to described neighbour
In the weighting length sum negative correlation in the feature space of this linear transformation.
According to the first aspect of the disclosure, additionally providing a kind of information processing method, it is for having many dimension labels
Multiple initial datas carry out eigentransformation.This information processing method includes: for each initial data, generates and represents this original number
According to the original feature vector of primitive character;For each initial data, generate and represent the multidimensional mark that this initial data is had
The label vector signed;For each initial data, calculate this initial data with other initial datas each in label vector space
In label similarity;For each initial data, label similarity based on other initial datas each Yu this initial data
Determine that whether these other initial datas are the related datas of this initial data;For each initial data, calculate this original number
According to other initial datas each characteristic similarity in original feature vector space;For each initial data, based on this
Each related data of initial data and the characteristic similarity of this initial data, selecting in the related data of this initial data should
Multiple neighbour's related datas of initial data;Using neighbour's related data of each initial data and this initial data as node,
Between the node of each neighbour's related data corresponding to this initial data and this initial data, form limit, and be each limit
Set the weight more than or equal to zero, thus form the relevant figure of neighbour;And solve object transformation matrix and according to this object transformation
Matrix carries out eigentransformation to multiple initial datas, and wherein, this object transformation matrix representative makes object function obtain maximum
Linear transformation, the whole limits in this object function figure relevant to described neighbour are in the feature space through described linear transformation
Weighting length sum negative correlation.
Second aspect according to the disclosure, it is provided that a kind of information processor, for having the multiple of many dimension labels
Initial data carries out eigentransformation.This information processor includes: original feature vector signal generating unit, and it is configured to for often
Individual initial data, generates the original feature vector of the primitive character representing this initial data;Label vector signal generating unit, it is joined
It is set to, for each initial data, generate the label vector representing many dimension labels that this initial data is had;Label similarity
Determining unit, it is configured to for each initial data, calculate this initial data and other initial datas each label to
Label similarity in quantity space;Non-relevant data determines unit, and it is configured to for each initial data, based on each its
His initial data and the label similarity of this initial data determine that whether these other initial datas are the non-phases of this initial data
Close data;Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each other
Initial data characteristic similarity in original feature vector space;Neighbour's non-relevant data selects unit, and it is configured to pin
To each initial data, each non-relevant data based on this initial data and the characteristic similarity of this initial data, former at this
The related data of beginning data selects multiple neighbour's non-relevant data of this initial data;Neighbour's irrelevant figure signal generating unit, its
Be configured to using each initial data as node, this initial data and this initial data each neighbour's non-relevant data it
Between form limit, and be that each limit sets the weight being more than or equal to zero, thus form the irrelevant figure of neighbour;And eigentransformation list
Unit, it is configured to solve object transformation matrix and according to this object transformation matrix, multiple initial datas carried out eigentransformation,
Wherein, this object transformation matrix representative makes object function obtain the linear transformation of maximum, this object function and described neighbour
Whole limits in relevant figure are in the weighting length sum positive correlation in the feature space of this linear transformation.
According to the second aspect of the disclosure, additionally providing a kind of information processing method, it is for having many dimension labels
Multiple initial datas carry out eigentransformation.This information processing method includes: for each initial data, generates and represents this original number
According to the original feature vector of primitive character;For each initial data, generate and represent the multidimensional mark that this initial data is had
The label vector signed;For each initial data, calculate this initial data with other initial datas each in label vector space
In label similarity;For each initial data, label similarity based on other initial datas each Yu this initial data
Determine that whether these other initial datas are the non-relevant data of this initial data;For each initial data, calculate this original
Data and other initial datas each characteristic similarity in original feature vector space;For each initial data, based on
Each non-relevant data of this initial data and the characteristic similarity of this initial data, in the non-relevant data of this initial data
Select multiple neighbour's non-relevant data of this initial data;Neighbour's non-relevant data with each initial data He this initial data
As node, between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data, form limit,
And it is that each limit sets the weight more than or equal to zero, thus forms the irrelevant figure of neighbour;And solve object transformation matrix also
According to this object transformation matrix, multiple initial datas being carried out eigentransformation, wherein, this object transformation matrix representative makes target
Function obtains the linear transformation of maximum, the whole limits in this object function and the irrelevant figure of described neighbour through described linearly
Weighting length sum positive correlation in the feature space of conversion.
(first embodiment)
First, by describing the information processor 100 according to first embodiment of the present disclosure with reference to Fig. 1, as basis
The example of the information processor that the first aspect of the disclosure provides.
Information processor 100 includes original feature vector signal generating unit 101, label vector signal generating unit 102, label phase
Determine that unit 103, related data determine that unit 104, characteristic similarity determine that unit 105, neighbour's related data select like degree single
Be correlated with figure signal generating unit 107, eigentransformation unit 108, non-relevant data of unit 106, neighbour determines that unit 114, neighbour are irrelevant
Data selection unit 116 and neighbour's irrelevant figure signal generating unit 117.
Original feature vector signal generating unit 101 is according to the initial data with many dimension labels received, for each former
Beginning data generate the original feature vector of the primitive character representing this initial data, and provide it to characteristic similarity and determine
Unit 105.Such as, original feature vector signal generating unit 101 makes initial data a1, a2..., anIt is respectively provided with x1,
x2..., xnAs its original feature vector.Wherein, i is the natural number of the total n less than or equal to initial data, aiRepresent i-th
Individual initial data, xiRepresent aiCharacteristic vector, such as, xiIt is aiD dimension in the original feature vector space of d dimension to
Amount.The original feature vector space of d dimension is the vector space of all primitive characters representing initial data, is generally of higher
Dimension.
Label vector signal generating unit 102 is according to the initial data with many dimension labels received, for each original number
According to generating the label vector representing many dimension labels that this initial data is had, and provide it to label similarity and determine list
Unit 103.Such as, original feature vector signal generating unit 101 makes initial data a1, a2..., anIt is respectively provided with y1, y2...,
ynAs its label vector.Wherein, yiRepresent aiLabel vector, such as, yiIt is aiIn the label vector space of k dimension one
K dimensional vector, this k dimensional vector can be the 0-1 vector of k dimension, wherein, if k is dimensional vector yiValue in jth dimension is 0, then table
Show aiNot there is the jth label in k label, if k is dimensional vector yiValue in jth dimension is 1, then it represents that aiThere is k
Jth label in label, wherein j is less than the natural number equal to k.Certainly, yiCan also be outside 0-1 vector k dimension to
Amount, such as, if each initial data is the photo containing a personage, and a label of initial data is height number
Value, another label is body weight numerical value, and the label vector of the most each initial data is a bivector, and this vector is often
Value in one-dimensional is all a positive number.
Label similarity determines that unit 103 is vectorial based on the label about each data received, for each original
Data, calculate this initial data and other initial datas each label similarity in label vector space, and are provided
Determine that be correlated with figure signal generating unit 107, non-relevant data of unit 104, neighbour determines that unit 114 and neighbour are irrelevant to related data
Figure signal generating unit 117.Label similarity can be relevant with label according to two initial datas distance in label vector space
Property matrix calculates.Such as can calculate initial data a according to following formula (1)iAnd ajBetween label similarity SL, ij。
Sl,ij=y 'iCyj(1)
Wherein C is the label correlation matrix that k takes advantage of k, and it can artificially give (as not having between unit matrix I, i.e. label
Association), it is also possible to utilize following formula (2) such as to calculate:
Wherein YaAnd YbIt is n-dimensional vector, YaAnd YbWith yiBetween just like the relation described by following formula (3), (4):
Yai=yia(3)
Ybi=yib(4)
In other words, YaValue in i-th dimension is yiValue in a dimension, YbValue in i-th dimension is yiIn b dimension
Value.
Foregoing merely illustrate a kind of mode determining label similarity.It will be understood by those skilled in the art that can
In order to determining label similarity in other ways, such as, can be based only upon initial data distance in Label space and determine
Label similarity, and the distance used can be COS distance, Euclidean distance or the distance of other suitable types.
Related data determine unit 104 for each initial data and other initial datas each, based on received
Label similarity, determines that whether these other initial datas are the related datas of this initial data, and will determine that result is supplied to closely
Adjacent related data selects unit 106.Wherein it is possible to determine in various ways between initial data the most each other for dependency number
According to.
A kind of feasible mode is, if aiWith ajLabel similarity be aiWith label phase in every other initial data
One of m that seemingly degree is the highest, is also a simultaneouslyjOne of the m the highest with label similarity in every other initial data, then aiWith
ajBeing relatively related data, wherein m is less than the natural number of total n of initial data, and m can be previously given, it is possible to
Being according to the distribution character of such as initial data or other are suitable because usually determining.
Another kind of feasible mode is, if aiWith ajLabel similarity greater than or equal to predetermined the first label threshold value
Thr, then aiWith ajIt is relatively related data.Similar to above-mentioned natural number m, the first label threshold value Thr can be previously given
, it is also possible to it is according to the distribution character of such as initial data or other are suitable because usually determining.
Non-relevant data determines that unit 114 determines that with related data unit 104 is similar, and difference is, non-relevant data is true
Cell 114, for each initial data and other initial datas each, based on the label similarity received, determines this its
Whether his initial data is the non-relevant data of this initial data, and will determine that result is supplied to neighbour's non-relevant data and selects single
Unit 116.Likewise it is possible to determine in various ways between initial data the most each other for non-relevant data.
A kind of feasible mode is, if aiWith ajLabel similarity be aiWith label phase in every other initial data
Seemingly spend one of minimum r, be also a simultaneouslyjOne of the r minimum with label similarity in every other initial data, then aiWith
ajBeing relatively non-relevant data, wherein r is less than the natural number of total n of initial data, and r can be previously given, also
Can be according to the distribution character of such as initial data or other are suitable because usually determining.
Another kind of feasible mode is, if aiWith ajLabel similarity less than predetermined the second label threshold value Thir, then
aiWith ajIt is relatively non-relevant data.Similar to above-mentioned natural number n, the second label threshold value Thir can be previously given,
Can also be according to the distribution character of such as initial data or other are suitable because usually determining.
Preferably, when using the second label threshold while using the first label threshold value Thr to determine related data relation
When value Thir determines non-relevant data relation, the first label threshold value Thr is more than or equal to the second label threshold value Thir.Therefore, may be used
To guarantee that two initial datas can not be related data simultaneously, it it is again non-relevant data.
Characteristic similarity determines unit 105 original feature vector based on the initial data received, to each original number
According to, calculate this initial data and other initial datas each characteristic similarity in original feature vector space, and carried
Supply neighbour's related data selection unit 106, neighbour is correlated with figure signal generating unit 107, neighbour's non-relevant data selects unit 116 and
Neighbour's irrelevant figure signal generating unit 117.Characteristic similarity can be according to two initial datas in original feature vector space
Distance calculates.Such as can calculate initial data a according to following formula (5)iAnd ajBetween characteristic similarity Sv,ij。
Wherein σ=mean (| | xi-xj||2, 1≤i ≠ j≤n) be all initial datas between any two primitive character to
Average distance in quantity space.
It will be understood by those skilled in the art that initial data distance in original feature vector space can be Europe
Must be apart from, manhatton distance, card side's distance or the distance of other suitable types in several.
Neighbour's related data select unit 106 based on received about the related data relation between initial data, with
And the characteristic similarity received, for each initial data, the related data of this initial data selects this initial data
Multiple neighbour's related datas, and provide it to neighbour and be correlated with figure signal generating unit 107.Can come in various ways for each
Initial data selects neighbour's related data.
A kind of feasible mode is, for each initial data, selects the feature with this initial data in its related data
Q related data of similarity maximum is as neighbour's related data of this initial data.Wherein q is less than the sum of initial data
The natural number of n, q can be previously given, it is also possible to is according to the distribution character of such as initial data or other are suitable
Because of usually determine.
Another kind of feasible mode is, for each initial data, by feature with this initial data in its related data
Similarity is more than first neighbour's threshold value Th1, then this related data is neighbour's related data of this initial data.With above-mentioned nature
Q are similar for number, and first neighbour's threshold value Th1 can be previously given, it is also possible to be according to the distribution character of such as initial data or
Other are suitable because usually determining for person.
It will be appreciated by one of ordinary skill in the art that can there are other modes for selecting neighbour's related data,
Such as in the neighbour's related data selected according to first kind of way, remove little with the characteristic similarity of targeted initial data
Related data in first neighbour's threshold value Th1.
Neighbour is correlated with figure signal generating unit 107 neighbour's related data based on each initial data received, with each former
Neighbour's related data of beginning data and this initial data is as node, each neighbour with this initial data and this initial data
Form limit between the node of related data, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour,
And provide it to eigentransformation unit 108.When setting weight for each limit, two joints that can be connected based on this edge
The characteristic similarity between two initial datas corresponding to Dian and at least one in label similarity are carried out.Such as, may be used
To set weight for each limit based in the following manner.
If neighbour's related data selection unit 106 is for each initial data, at all dependency numbers of this initial data
According to, q related data of the characteristic similarity maximum of selection and this initial data is as neighbour's dependency number of this initial data
According to, then neighbour's figure signal generating unit 107 of being correlated with can for each edge, be connected based on that received, this limit in such a way
Two initial datas corresponding to two nodes between characteristic similarity make a distinction different situations set weight.
If other are former for the characteristic similarity of the initial data corresponding to a node being connected with this limit maximum q
Beginning data include the initial data corresponding to another node being connected with this edge, then be 1 by the weight setting on this limit,
Otherwise by the feature phase between two initial datas that the weight setting on this limit is two nodes being connected with corresponding to this limit
Like at least one positive correlation in degree and label similarity and less than or equal to 1, such as, it is corresponding to this by the weight setting on this limit
Characteristic similarity between two initial datas of two nodes that limit is connected and the linear combination of label similarity, as following
Shown in formula (6):
Wherein, WR, ijIt is to connect in neighbour is correlated with figure corresponding to initial data aiAnd ajThe weight on limit of node, α is
Regulation parameter, is the real number between 0 to 1.
In formula (6), for each initial data ai, from the set N of its related data compositionrI () is found front q
With initial data aiThe maximum data of characteristic similarity, define NrqI () is the set of this q initial data composition;Meanwhile, right
In each initial data ai, never include initial data aiAll initial datas in find front q and initial data aiFeature
The initial data that similarity is maximum, defines NqI () is the set of this q data composition.
According to the weight setting mode shown in formula (6):
For belonging simultaneously to gather Nrq(i) and NqThe initial data a of (i)j, by connection corresponding to initial data aiAnd aj's
The weight on the limit of node is set to maximum 1;
For belonging simultaneously to gather Nrq(j) and NqThe initial data a of (j)i, by connection corresponding to initial data aiAnd aj's
The weight on the limit of node is also set to maximum 1;
For in addition, the initial data a that there is limit between corresponding nodeiAnd aj, the weight on this limit is set
It is set to initial data aiAnd ajBetween characteristic similarity and the linear combination of label similarity, wherein by regulation parameter alpha adjust
The ratio that joint characteristic similarity and label similarity are the most shared in weight;
For there is not the initial data a on limit between corresponding nodeiAnd aj, by WR, ijBeing set as 0, this can manage
Xie Wei, the weight on all non-existent limits the most necessarily 0.
It should be noted that owing to the figure of being correlated with of neighbour herein is non-directed graph, therefore each limit in figure is not have direction
, therefore WR, ijNecessarily equal to WR, ji。
Foregoing merely illustrating a kind of mode setting weight, those of ordinary skill in the art are it is contemplated that use its other party
Formula sets weight.For example, it is possible to by weight W of each edgeR, ijTwo nodes being set as being connected with corresponding to this limit
Two initial data aiAnd ajBetween characteristic similarity and at least one positive correlation in label similarity.More specifically, such as
Can be by weight W of each edgeR, ijIt is set to correspond to two initial data W of two nodes that this limit is connectedR, ijBetween
Characteristic similarity and the linear combination of label similarity.
It is of course also possible to the weight of each edge is all set to 1.That is so that each edge all has identical weight.
Neighbour's non-relevant data selects unit 116 to select unit 106 similar with neighbour's related data, and difference is, neighbour
Non-relevant data select unit 116 based on received about the non-relevant data relation between initial data, and received
Characteristic similarity, for each initial data, the non-relevant data of this initial data selects the multiple of this initial data
Neighbour's non-relevant data, and provide it to neighbour's irrelevant figure signal generating unit 117.Can come in various ways for each former
Beginning data select neighbour's non-relevant data.
A kind of feasible mode is, for each initial data, selects the spy with this initial data in its non-relevant data
Levy p the maximum related data of similarity neighbour's non-relevant data as this initial data.Wherein p is less than initial data
The natural number of sum n, p can be previously given, it is also possible to be according to the distribution character of such as initial data or other fit
When because usually determining.
Another kind of feasible mode is, for each initial data, by spy with this initial data in its non-relevant data
Levy similarity and be more than second neighbour's threshold value Th2, then this non-relevant data is neighbour's non-relevant data of this initial data.With above-mentioned
Natural number p similar, second neighbour's threshold value Th2 can be previously given, it is also possible to is the distribution according to such as initial data
Characteristic or other are suitable because usually determining.
It will be appreciated by one of ordinary skill in the art that can be to exist other modes for selecting the irrelevant number of neighbour
According to, such as in the neighbour's non-relevant data selected according to first kind of way, remove the feature phase with targeted initial data
The non-relevant data of second neighbour's threshold value Th2 it is less than like degree.
Neighbour's irrelevant figure signal generating unit 117 figure signal generating unit relevant to neighbour 107 is similar to, and difference is, the non-phase of neighbour
Pass figure signal generating unit 117 neighbour's non-relevant data based on each initial data received, former with this with each initial data
Neighbour's non-relevant data of beginning data is as node, in each neighbour's non-relevant data with this initial data and this initial data
Node between form limit, and be that each limit sets the weight being more than or equal to zero, thus form the irrelevant figure of neighbour, and by it
It is supplied to eigentransformation unit 108.When setting weight for each limit, two the nodes institutes that can be connected based on this edge are right
Characteristic similarity between two initial datas answered and at least one in label similarity are carried out.For example, it is possible to based on
In the following manner to set weight for each limit.
If neighbour's non-relevant data selection unit 116 is for each initial data, in all non-phase of this initial data
Close in data, select p the related data maximum with the characteristic similarity of this initial data non-as the neighbour of this initial data
Related data, then neighbour's irrelevant figure signal generating unit 117 can in such a way, for each edge, based on that received, should
Label similarity between two initial datas corresponding to two nodes that limit is connected is to set weight.
If the characteristic similarity between two initial datas corresponding to two nodes that this limit is connected is more than wherein
One initial data and the maximum in the characteristic similarity of all related datas of this initial data, then set the weight on this limit
It is set to the characteristic similarity positive correlation between two initial datas corresponding to two nodes being connected with this limit, such as, sets
For the characteristic similarity between the two initial data, it is otherwise 0 by the weight setting on this limit.
If for each initial data ai, from the set N of its uncorrelated data compositionirI () is found front p and number
According to aiThe maximum uncorrelated data of characteristic similarity, define NirkI () is the set of this p uncorrelated data composition, and right
In each initial data ai, calculate its characteristic similarity maximum with the characteristic similarity of its all related datas and determine
Justice is MaxRS (i), and above weight setting method can be expressed as: for belonging to set Nir(i) and with initial data aiSpy
Levy the similarity initial data a more than MaxRS (i)j, by it and initial data aiThe power on the corresponding limit between two nodes
Reset and be set to initial data aiAnd ajBetween characteristic similarity.Above establishing method can also be represented by following formula (7):
Wherein, WIr, ijIt is to connect corresponding to initial data a in the irrelevant figure of neighbouriAnd ajThe weight on limit of node.
It should be noted that owing to the irrelevant figure of neighbour herein is non-directed graph, therefore each limit in figure is the most square
To, therefore WIr, ijNecessarily equal to WIr, ji。
Those of ordinary skill in the art be also to be understood that can be by the weight setting of each edge by being connected with corresponding to this limit
Characteristic similarity between two initial datas of two nodes connect and at least one positive correlation in label similarity, such as
It is set as the linear combination of the characteristic similarity between the two initial data and label similarity.
It is of course also possible to the weight of each edge is all set to 1.That is so that each edge all has identical weight.
Eigentransformation unit 108, based on the relevant figure of the neighbour received and the irrelevant figure of neighbour, solves object transformation matrix
And according to this object transformation matrix the plurality of initial data carried out eigentransformation, and export after carrying out eigentransformation former
Beginning data.Wherein, this object transformation matrix representative makes object function obtain the linear transformation of maximum, and this object function is with near
Whole limits in adjacent relevant figure are in the weighting length sum negative correlation in the feature space of linear transformation and non-with neighbour
Whole limits in relevant figure are in the weighting length sum positive correlation in the feature space of linear transformation.It should be noted that this
In the length on described limit refer to two points that limit connected distance in space.Those of ordinary skill in the art should
Recognize, the distance length as limit of suitable type can be selected here.
The purpose that eigentransformation unit 108 carries out eigentransformation is so that the neighbour after eigentransformation is correlated with in figure each
Weighted distance between neighbour's related data is furthered as far as possible, and each in the irrelevant figure of neighbour after eigentransformation is near simultaneously
Weighted distance between adjacent non-relevant data is zoomed out as far as possible, namely realizes the object function in formula (8) and formula (9).
min∑I, j(aTxi-aTxj)2Wr,ij(8)
max∑I, j(aTxi-aTxj)2WIr, ij(9)
Wherein αTIt it is the linear transformation (that is, eigentransformation) that initial data is carried out.
As the institute's common method in the retaining projection eigentransformation method of local, utilize the relevant figure of neighbour and neighbour irrelevant
Laplce's item of figure, can ask the optimization that the object function realized in formula (8) and formula (9) is converted into as shown in following formula (10)
Topic:
s.t.aTXDrXTA=1
Wherein Dr=diag (sum (Wr)), Dir=diag (sum (Wir)), neighbour is correlated with the Laplce item L of figurer=Dr-
Wr, the Laplce item L of the irrelevant figure of neighbourir=Dir-Wir, and β is that a relevant figure of regulation neighbour is each with the irrelevant figure of neighbour
From the scale parameter of shared weight, 0≤β≤1.
Because Lr=Dr-Wr, therefore the optimization problem shown in formula (10) can be equivalent to the optimization problem shown in formula (11),
And due to boundary condition aTXDrXTA=1, and then the optimization problem shown in formula (11) can be equivalent to the optimization shown in formula (12)
Problem.
s.t.aTXDrXTA=1
s.t.aTXDrXTA=1
Formula (12) is solved the generalized eigenvalue problem being equivalent to solve as the formula (13):
X(βLir+(1-β)Wr)XTA=λ XDrXTA (13)
If α1,α2,……,αmIt is by eigenvalue λ in formula (13)1>λ2>·……>λmThe distinguished characteristic of correspondence of order
Vector, then can obtain eigentransformation matrix A=(a1, a2..., am), wherein yi=ATxiIt it is the feature after conversion.
It should be noted that the whole limits in object function figure relevant to neighbour herein are empty in the feature through linear transformation
Weighting length sum negative correlation between, and the whole limits in figure irrelevant with neighbour are at the feature space through linear transformation
In weighting length sum positive correlation.
Thus, information processor 100 has obtained the neighbour after making eigentransformation and has been correlated with each neighbour's dependency number in figure
Weighted distance between according to is furthered as far as possible, simultaneously the irrelevant number of each neighbour in the irrelevant figure of neighbour after eigentransformation
The linear transformation that Weighted distance between according to is zoomed out as far as possible, and can and then obtain after this linear transformation former
Beginning data.The initial data that information processor 100 is also based on after this linear transformation is classified.Especially
Ground, the initial data of such as image or text can be processed by information processor 100, with according to image or text
It is classified by the primitive character of the many dimension labels being had and image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation
According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had
Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation
Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence
Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor (K-
Nearest Neighbor, KNN) the nearest neighbour classification algorithm of algorithm.
Hereinafter, by with reference to Fig. 2 describe according to first embodiment of the present disclosure, to having the multiple original of many dimension labels
The information processing 120 that data are carried out, as the example of the information processing method that the first aspect according to the disclosure provides.At information
Reason 120 such as can be performed by information processor 100.
After information processing 120 starts, initially enter step S101.In step S101, according to having the former of many dimension labels
Beginning data, generate the original feature vector of the primitive character representing this initial data, and process for each initial data
Proceed to step S102.Step S101 such as can be performed by original feature vector signal generating unit 101, at this to its details not
Repeat again.
In step s 102, according to having the initial data of many dimension labels, generating representative for each initial data should
The label of many dimension labels that initial data is had is vectorial, and process proceeds to step S103.Step S102 such as can be by
Label vector signal generating unit 102 performs, and repeats no more its details at this.
In step s 103, based on the label vector about each data, for each initial data, this original number is calculated
According to other initial datas each label similarity in label vector space, and process proceeds to step S104.Step
By label similarity, S103 such as can determine that unit 103 performs, repeat no more its details at this.
In step S104, for each initial data and other initial datas each, based on label similarity, determine
Whether these other initial datas are the related datas of this initial data, and process proceeds to step S105.Step S104 is such as
Can be determined that unit 104 performs by related data, at this, its details be repeated no more.
In step S105, for each initial data and other initial datas each, based on label similarity, determine
Whether these other initial datas are the non-relevant data of this initial data, and process proceeds to step S106.Step S105 example
As determined that unit 114 performs by non-relevant data, at this, its details is repeated no more.
In step s 106, original feature vector based on initial data, to each initial data, calculate this initial data
With other initial datas each characteristic similarity in original feature vector space, and process proceeds to step S107.Step
By characteristic similarity, rapid S106 such as can determine that unit 105 performs, repeat no more its details at this.
In step s 107, based on about the related data relation between initial data and characteristic similarity, for often
Individual initial data, selects multiple neighbour's related datas of this initial data in the related data of this initial data, and processes
Proceed to step S108.Step S107 such as can be selected unit 106 to perform, at this to its details not by neighbour's related data
Repeat again.
In step S108, neighbour's related data based on each initial data, with each initial data and this original number
According to neighbour's related data as node, with the node of each neighbour's related data of this initial data and this initial data it
Between form limit, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour, and process proceeds to walk
Rapid S109.Step S108 such as can be performed by neighbour's figure signal generating unit 107 of being correlated with, and repeats no more its details at this.
In step S109, based on about the non-relevant data relation between initial data and characteristic similarity, for
Each initial data, selects multiple neighbour's non-relevant data of this initial data in the non-relevant data of this initial data, and
And process proceeds to step S110.Step S109 such as can be selected unit 116 to perform by neighbour's non-relevant data, right at this
Its details repeats no more.
In step s 110, neighbour's non-relevant data based on each initial data is original with this with each initial data
Neighbour's non-relevant data of data as node, with each neighbour's non-relevant data of this initial data and this initial data
Form limit between node, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour, and process
Proceed to step S111.Step S110 such as can be performed, at this to its details not by neighbour's irrelevant figure signal generating unit 117
Repeat again.
In step S111, based on the relevant figure of neighbour and the irrelevant figure of neighbour, solve object transformation matrix and according to this mesh
Mark transformation matrix carries out eigentransformation to the plurality of initial data, and terminates to process.Wherein, this object transformation matrix representative
Making object function obtain the linear transformation of maximum, the whole limits in this object function figure relevant to neighbour are becoming through linear
Weighting length sum negative correlation in the feature space changed, and the whole limits in figure irrelevant with neighbour are through linear transformation
Feature space in weighting length sum positive correlation.Step S111 such as can be performed by eigentransformation unit 108, at this
Its details is repeated no more.
Thus, obtained the neighbour after making eigentransformation by information processing 120 to be correlated with each neighbour's dependency number in figure
Weighted distance between according to is furthered as far as possible, simultaneously the irrelevant number of each neighbour in the irrelevant figure of neighbour after eigentransformation
The linear transformation that Weighted distance between according to is zoomed out as far as possible, and can and then obtain after this linear transformation former
Beginning data.
Classify further, it is also possible to be based further on the initial data after this linear transformation.Especially, may be used
Process with the initial data to such as image or text, with the many dimension labels being had according to image or text and
It is classified by the primitive character of image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation
According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had
Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation
Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence
Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor algorithm
Nearest neighbour classification algorithm.
(the second embodiment)
First, by describing the information processor 200 according to second embodiment of the present disclosure with reference to Fig. 3, as basis
The another example of the information processor that the first aspect of the disclosure provides.
Information processor 200 includes original feature vector signal generating unit 201, label vector signal generating unit 202, label phase
Determine that unit 203, related data determine that unit 204, characteristic similarity determine that unit 205, neighbour's related data select like degree single
Unit 206, neighbour are correlated with figure signal generating unit 207 and eigentransformation unit 208.These component units are functionally each and at information
The reason original feature vector signal generating unit 101 of device 100, label vector signal generating unit 102, label similarity determine unit 103,
Related data determines that unit 104, characteristic similarity determine that unit 105, neighbour's related data select the relevant figure of unit 106, neighbour
Signal generating unit 107 and eigentransformation unit 108 are similar, therefore below for each component units of information processor 200, right
Will not be described in great detail in the function similar to the component units of information processor 100 and operation.
Original feature vector signal generating unit 201 is according to the initial data with many dimension labels received, for each former
Beginning data generate the original feature vector of the primitive character representing this initial data, and provide it to characteristic similarity and determine
Unit 205.
Label vector signal generating unit 202 is according to the initial data with many dimension labels received, for each original number
According to generating the label vector representing many dimension labels that this initial data is had, and provide it to label similarity and determine list
Unit 203.
Label similarity determines that unit 203 is vectorial based on the label about each data received, for each original
Data, calculate this initial data and other initial datas each label similarity in label vector space, and are provided
Unit 204 figure signal generating unit 207 relevant with neighbour is determined to related data
Related data determine unit 204 for each initial data and other initial datas each, based on received
Label similarity, determines that whether these other initial datas are the related datas of this initial data, and will determine that result is supplied to closely
Adjacent related data selects unit 206.
Characteristic similarity determines unit 205 original feature vector based on the initial data received, to each original number
According to, calculate this initial data and other initial datas each characteristic similarity in original feature vector space, and carried
Supply neighbour's related data selects unit 206 figure signal generating unit 207 relevant with neighbour.
Neighbour's related data select unit 206 based on received about the related data relation between initial data, with
And the characteristic similarity received, for each initial data, the related data of this initial data selects this initial data
Multiple neighbour's related datas, and provide it to neighbour and be correlated with figure signal generating unit 207.
Neighbour is correlated with figure signal generating unit 207 neighbour's related data based on each initial data received, with each former
Neighbour's related data of beginning data and this initial data is as node, each neighbour with this initial data and this initial data
Form limit between the node of related data, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour,
And provide it to eigentransformation unit 208.
Eigentransformation unit 208, based on the relevant figure of the neighbour received, solves object transformation matrix and becomes according to this target
Change matrix and the plurality of initial data is carried out eigentransformation, and export the initial data after carrying out eigentransformation.Wherein, should
Object transformation matrix representative makes object function obtain the linear transformation of maximum, complete in this object function figure relevant to neighbour
Limit, portion is in the weighting length sum negative correlation in the feature space of linear transformation.
The purpose that eigentransformation unit 208 carries out eigentransformation is so that the neighbour after eigentransformation is correlated with in figure each
Weighted distance between neighbour's related data is furthered as far as possible, namely realizes the object function in formula (14).
min∑I, j(aTxi-aTxj)2Wr,ij(14)
Wherein αTIt it is the linear transformation (that is, eigentransformation) that initial data is carried out.
As the institute's common method in the retaining projection eigentransformation method of local, neighbour is utilized to be correlated with the Laplce of figure
, the object function realized in formula (14) can be converted into the optimization problem as shown in following formula (15):
s.t.aTXDrXTA=1
Wherein Dr=diag (sum (Wr)), neighbour is correlated with the Laplce item L of figurer=Dr-Wr。
Formula (15) is solved the generalized eigenvalue problem being equivalent to solve as the formula (16):
XLrXTA=λ XDrXTA (16)
If α1,α2,……,αmIt is by eigenvalue 0 < λ in formula (16)1<λ2<·……<λmOrder distinguished correspondence spy
Levy vector, then can obtain eigentransformation matrix A=(a1, a2..., am), wherein yi=ATxiIt it is the feature after conversion.
It should be noted that herein, need to make its whole limits obtained in the object function of minima figure relevant to neighbour
In the weighting length sum positive correlation in the feature space of linear transformation.In other words, if needed so that object function obtains
To maximum, then the whole limits in this object function figure relevant to neighbour are long in the weighting in the feature space of linear transformation
Degree sum negative correlation.
Thus, information processor 200 has obtained the neighbour after making eigentransformation and has been correlated with each neighbour's dependency number in figure
The linear transformation that Weighted distance between according to is furthered as far as possible, and can and then obtain after this linear transformation former
Beginning data.The initial data that information processor 200 is also based on after this linear transformation is classified.Especially
Ground, the initial data of such as image or text can be processed by information processor 200, with according to image or text
It is classified by the primitive character of the many dimension labels being had and image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation
According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had
Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation
Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence
Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor algorithm
Nearest neighbour classification algorithm.
Hereinafter, by with reference to Fig. 4 describe according to second embodiment of the present disclosure, to having the multiple original of many dimension labels
The information processing 220 that data are carried out, as the another example of the information processing method that the first aspect according to the disclosure provides.Letter
Breath processes 220 and such as can be performed by information processor 200.
After information processing 220 starts, initially enter step S201.In step s 201, according to having the former of many dimension labels
Beginning data, generate the original feature vector of the primitive character representing this initial data, and process for each initial data
Proceed to step S202.Step S201 such as can be performed by original feature vector signal generating unit 201.
In step S202, according to having the initial data of many dimension labels, generating representative for each initial data should
The label of many dimension labels that initial data is had is vectorial, and process proceeds to step S203.Step S202 such as can be by
Label vector signal generating unit 202 performs.
In step S203, based on the label vector about each data, for each initial data, calculate this original number
According to other initial datas each label similarity in label vector space, and process proceeds to step S204.Step
By label similarity, S203 such as can determine that unit 203 performs.
In step S204, for each initial data and other initial datas each, based on label similarity, determine
Whether these other initial datas are the related datas of this initial data, and process proceeds to step S205.Step S204 is such as
Can be determined that unit 204 performs by related data.
In step S205, original feature vector based on initial data, to each initial data, calculate this initial data
With other initial datas each characteristic similarity in original feature vector space, and process proceeds to step S206.Step
By characteristic similarity, rapid S205 such as can determine that unit 205 performs.
In step S206, based on about the related data relation between initial data and characteristic similarity, for often
Individual initial data, selects multiple neighbour's related datas of this initial data in the related data of this initial data, and processes
Proceed to step S207.Step S206 such as can be selected unit 206 to perform by neighbour's related data.
In step S207, neighbour's related data based on each initial data, with each initial data and this original number
According to neighbour's related data as node, with the node of each neighbour's related data of this initial data and this initial data it
Between form limit, and be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour, and process proceeds to walk
Rapid S208.Step S207 such as can be performed by neighbour's figure signal generating unit 207 of being correlated with.
In step S208, based on the relevant figure of neighbour, solve object transformation matrix and according to this object transformation matrix to institute
State multiple initial data and carry out eigentransformation, and terminate to process.Wherein, this object transformation matrix representative makes object function obtain
To the linear transformation of maximum, the whole limits in this object function figure relevant to neighbour are in the feature space of linear transformation
Weighting length sum negative correlation.Step S208 such as can be performed by eigentransformation unit 208.
Thus, obtained the neighbour after making eigentransformation by information processing 220 to be correlated with each neighbour's dependency number in figure
The linear transformation that Weighted distance between according to is furthered as far as possible, and can and then obtain after this linear transformation former
Beginning data.
Classify further, it is also possible to be based further on the initial data after this linear transformation.Especially, may be used
Process with the initial data to such as image or text, with the many dimension labels being had according to image or text and
It is classified by the primitive character of image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation
According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had
Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation
Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence
Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor (KNN)
The nearest neighbour classification algorithm of algorithm.
(the 3rd embodiment)
First, by describing the information processor 200 according to third embodiment of the present disclosure with reference to Fig. 5, as basis
The another example of the information processor that the second aspect of the disclosure provides.
Information processor 300 includes original feature vector signal generating unit 301, label vector signal generating unit 302, label phase
Determine that unit 303, non-relevant data determine that unit 314, characteristic similarity determine that unit 305, neighbour's non-relevant data are selected like degree
Select unit 316, neighbour's irrelevant figure signal generating unit 317 and eigentransformation unit 308.These component units functionally each with
The original feature vector signal generating unit 101 of information processor 100, label vector signal generating unit 102, label similarity determine list
Unit 103, non-relevant data determine unit 114, characteristic similarity determine unit 105, neighbour's non-relevant data select unit 116,
Neighbour's irrelevant figure signal generating unit 117 and eigentransformation unit 108 are similar, therefore below for information processor 300
Each component units, will not be described in great detail for the function similar to the component units of information processor 100 and operation.
Original feature vector signal generating unit 301 is according to the initial data with many dimension labels received, for each former
Beginning data generate the original feature vector of the primitive character representing this initial data, and provide it to characteristic similarity and determine
Unit 305.
Label vector signal generating unit 302 is according to the initial data with many dimension labels received, for each original number
According to generating the label vector representing many dimension labels that this initial data is had, and provide it to label similarity and determine list
Unit 303.
Label similarity determines that unit 303 is vectorial based on the label about each data received, for each original
Data, calculate this initial data and other initial datas each label similarity in label vector space, and are provided
Unit 314 and neighbour's irrelevant figure signal generating unit 317 is determined to non-relevant data
Non-relevant data determine unit 314 for each initial data and other initial datas each, based on being received
Label similarity, determine that whether these other initial datas are the non-relevant data of this initial data, and will determine that result provides
Unit 316 is selected to neighbour's non-relevant data.
Characteristic similarity determines unit 305 original feature vector based on the initial data received, to each original number
According to, calculate this initial data and other initial datas each characteristic similarity in original feature vector space, and carried
Supply neighbour's non-relevant data selects unit 316 and neighbour's irrelevant figure signal generating unit 317.
Neighbour's non-relevant data selects unit 316 to be closed about the non-relevant data between initial data based on receive
System, and the characteristic similarity received, for each initial data, select this former in the non-relevant data of this initial data
Multiple neighbour's non-relevant data of beginning data, and provide it to neighbour's irrelevant figure signal generating unit 317.
Neighbour's irrelevant figure signal generating unit 317 neighbour's non-relevant data based on each initial data received, with often
Neighbour's non-relevant data of individual initial data and this initial data as node, every with this initial data and this initial data
Form limit between the node of individual neighbour's non-relevant data, and be that each limit sets the weight more than or equal to zero, thus formed near
Adjacent irrelevant figure, and provide it to eigentransformation unit 308.
Eigentransformation unit 308, based on the irrelevant figure of the neighbour received, solves object transformation matrix and according to this target
Transformation matrix carries out eigentransformation to the plurality of initial data, and exports the initial data after carrying out eigentransformation.Wherein,
This object transformation matrix representative makes object function obtain the linear transformation of maximum, in this object function and the irrelevant figure of neighbour
Whole limits in the weighting length sum positive correlation in the feature space of linear transformation.
The purpose that eigentransformation unit 308 carries out eigentransformation is so that each in the irrelevant figure of neighbour after eigentransformation
Weighted distance between individual neighbour's non-relevant data is zoomed out as far as possible, namely realizes the object function in formula (19).
max∑I, j(aTxi-aTxj)2WIr, ij(17)
Wherein αTIt it is the linear transformation (that is, eigentransformation) that initial data is carried out.
As the institute's common method in the retaining projection eigentransformation method of local, utilize the Laplce of the irrelevant figure of neighbour
, the object function realized in formula (17) can be converted into the optimization problem as shown in following formula (18):
s.t.aTXDirXTA=1
Wherein Dir=diag (sum (Wir)), the Laplce item L of the irrelevant figure of neighbourir=Dir-Wir。
Formula (18) is solved the generalized eigenvalue problem being equivalent to solve as the formula (19):
XWirXTA=λ XDirXTA (19)
If α1,α2,……,αmIt is by eigenvalue 0 < λ in formula (19)1<λ2<·……<λmOrder distinguished correspondence spy
Levy vector, then can obtain eigentransformation matrix A=(a1, a2..., am), wherein yi=ATxiIt it is the feature after conversion.
It should be noted that object function herein and the whole limits in the irrelevant figure of neighbour are in the feature through linear transformation
Weighting length sum positive correlation in space.
Thus, to have obtained each neighbour in figure that is correlated with of the neighbour after making eigentransformation irrelevant for information processor 300
The linear transformation that Weighted distance between data is zoomed out as far as possible, and can and then obtain after this linear transformation
Initial data.The initial data that information processor 300 is also based on after this linear transformation is classified.Especially
Ground, the initial data of such as image or text can be processed by information processor 300, with according to image or text
It is classified by the primitive character of the many dimension labels being had and image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation
According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had
Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation
Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence
Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor (KNN)
The nearest neighbour classification algorithm of algorithm.
Hereinafter, by with reference to Fig. 6 describe according to third embodiment of the present disclosure, to having the multiple original of many dimension labels
The information processing 320 that data are carried out, as the example of the information processing method that the second aspect according to the disclosure provides.At information
Reason 320 such as can be performed by information processor 300.
After information processing 320 starts, initially enter step S301.In step S301, according to having the former of many dimension labels
Beginning data, generate the original feature vector of the primitive character representing this initial data, and process for each initial data
Proceed to step S302.Step S301 such as can be performed by original feature vector signal generating unit 301.
In step s 302, according to having the initial data of many dimension labels, generating representative for each initial data should
The label of many dimension labels that initial data is had is vectorial, and process proceeds to step S303.Step S302 such as can be by
Label vector signal generating unit 302 performs.
In step S303, based on the label vector about each data, for each initial data, calculate this original number
According to other initial datas each label similarity in label vector space, and process proceeds to step S304.Step
By label similarity, S303 such as can determine that unit 303 performs.
In step s 304, for each initial data and other initial datas each, based on label similarity, determine
Whether these other initial datas are the non-relevant data of this initial data, and process proceeds to step S305.Step S304 example
As determined that unit 314 performs by non-relevant data.
In step S305, original feature vector based on initial data, to each initial data, calculate this initial data
With other initial datas each characteristic similarity in original feature vector space, and process proceeds to step S306.Step
By characteristic similarity, rapid S305 such as can determine that unit 305 performs.
In step S306, based on about the non-relevant data relation between initial data and characteristic similarity, for
Each initial data, selects multiple neighbour's non-relevant data of this initial data in the non-relevant data of this initial data, and
And process proceeds to step S307.Step S306 such as can be selected unit 316 to perform by neighbour's non-relevant data.
In step S307, neighbour's non-relevant data based on each initial data is original with this with each initial data
Neighbour's non-relevant data of data as node, with each neighbour's non-relevant data of this initial data and this initial data
Form limit between node, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour, and process
Proceed to step S308.Step S307 such as can be performed by neighbour's irrelevant figure signal generating unit 317.
In step S308, based on the irrelevant figure of neighbour, solve object transformation matrix and according to this object transformation matrix pair
The plurality of initial data carries out eigentransformation, and terminates to process.Wherein, this object transformation matrix representative makes object function
Obtaining the linear transformation of maximum, this object function and the whole limits in the irrelevant figure of neighbour are empty in the feature through linear transformation
The positive negative correlation of the weighting length between.Step S308 such as can be performed by eigentransformation unit 308.
Thus, the non-phase of each neighbour in the irrelevant figure of the neighbour after making eigentransformation has been obtained by information processing 320
Close the linear transformation that the Weighted distance between data is zoomed out as far as possible, and can and then obtain after this linear transformation
Initial data.
Classify further, it is also possible to be based further on the initial data after this linear transformation.Especially, may be used
Process with the initial data to such as image or text, with the many dimension labels being had according to image or text and
It is classified by the primitive character of image or text self.
And compared with traditional locality preserving projections eigentransformation method, based on the original number after this linear transformation
According to classifying, can achieve while retaining the local message of data, additionally it is possible to utilize the mark that data are had
Label information.
And, compared with existing most of eigentransformation methods, based on the initial data after this linear transformation
Classify, it is possible to be applicable to the data with many dimension labels.
Additionally, due to the classification carried out based on the initial data after this linear transformation i.e. can be with encumbrance evidence
Local neighbor information, can utilize again the label information that data are had, thus be more suitable for using such as k nearest neighbor algorithm
Nearest neighbour classification algorithm.
(hardware configuration embodiment)
Each component units, subelement etc. in the information processor that above-mentioned basis embodiment of the disclosure can pass through
The mode of software, firmware, hardware or its combination in any configures.In the case of being realized by software or firmware, can be from depositing
Storage media or network are installed to the machine (the such as general-purpose machinery 700 shown in Fig. 7) with specialized hardware structure and are constituted this software
Or the program of firmware, this machine is when being provided with various program, it is possible to perform the various merits of above-mentioned each component units, subelement
Energy.
Fig. 7 is to schematically show to can be used to realize at according to the information processing method that embodiment of the disclosure and information
A kind of structure diagram of the hardware configuration of the possible messaging device of reason device.
In the figure 7, CPU (CPU) 701 is according to the program stored in read only memory (ROM) 702 or from depositing
Storage part 708 is loaded into the program of random-access memory (ram) 703 and performs various process.In RAM 703, always according to needing
Store the data required when CPU 701 performs various process etc..CPU 701, ROM 702 and RAM 703 are via bus
704 are connected to each other.Input/output interface 705 is also connected to bus 704.
Components described below is also connected to input/output interface 705: importation 706(includes keyboard, mouse etc.), output unit
Point 707(includes display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc. and speaker etc.), storage part 708
(including hard disk etc.), communications portion 709(include NIC such as Local Area Network card, modem etc.).Communication unit
709 are divided to perform communication process via network such as the Internet.As required, driver 710 can be connected to input/output interface
705.Detachable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc. can be installed in as required and drive
On dynamic device 710 so that the computer program read out can be installed to store in part 708 as required.
In the case of realizing above-mentioned series of processes by software, can be from network such as the Internet or from storage medium example
As detachable media 711 installs the program of composition software.
It will be understood by those of skill in the art that this storage medium be not limited to wherein having program stored therein shown in Fig. 7,
The detachable media 711 of the program that provides a user with is distributed separately with equipment.The example of detachable media 711 comprises disk
(comprising floppy disk), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini
Dish (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be to comprise in ROM 702, storage part 708
Hard disk etc., wherein computer program stored, and be distributed to user together with the equipment comprising them.
Additionally, the disclosure also proposed the program product that a kind of storage has the instruction code of machine-readable.Described instruction
When code is read by machine and performs, above-mentioned document processing method according to embodiments of the present invention can be performed.Correspondingly, it is used for holding
The various storage mediums of the such as disk, CD, magneto-optic disk, semiconductor memory etc. that carry this program product are also included within these public affairs
In the technical scheme opened.
It should be further understood that each operating process according to the information processing method that embodiment of the disclosure can also be with storage
The mode of the computer executable program in various machine-readable storage mediums realizes.
It should be noted that can be individual components according to each component units of the information processor that embodiment of the disclosure, also
The function of several component units can be realized by the parts of.
Furthermore it is noted that each step of the information processing method according to the disclosure, it is not necessary to according to described in the disclosure
Order carry out, but can perform with executed in parallel or according to calling, such as, in information processing 120, step S102 is not
Necessarily must carry out after step slol, step S103 not necessarily must be carried out after step s 102, and step S106 differs
Surely must carry out after either step in step S101 and S103 to S105, step S107 and S108 not necessarily must be in steps
Carrying out after rapid S106, step S109 and S110 not necessarily must be carried out after step S107 or S108.In information processing 220
Be also similar in 320.
Although having shown that and describing preferred embodiment of the present disclosure, it is contemplated that those skilled in the art can be in institute
The design various amendments to the present invention in attached spirit and scope by the claims.
Understand according to above description, embodiment of the disclosure and disclose techniques below scheme, but be not limited to this:
1. 1 kinds of information processors of technical scheme, for carrying out feature change to multiple initial datas with many dimension labels
Changing, described information processor includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent this initial data
The original feature vector of primitive character;
Label vector signal generating unit, it is configured to for each initial data, generates and represents this initial data and is had
Many dimension labels label vector;
Label similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its
His initial data label similarity in label vector space;
Related data determines unit, and it is configured to for each initial data, based on other initial datas each with should
The label similarity of initial data determines that whether these other initial datas are the related datas of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its
His initial data characteristic similarity in original feature vector space;
Neighbour's related data selects unit, and it is configured to for each initial data, based on this initial data each
Related data and the characteristic similarity of this initial data, select the multiple of this initial data in the related data of this initial data
Neighbour's related data;
Neighbour is correlated with figure signal generating unit, and it is configured to neighbour's related data of each initial data He this initial data
As node, between the node of each neighbour's related data corresponding to this initial data and this initial data, form limit, and
And be that each limit sets the weight more than or equal to zero, thus form the relevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to described many
Individual initial data carries out eigentransformation, and wherein, described object transformation matrix representative makes object function obtain the linear of maximum
Conversion, whole limits the adding in the feature space through described linear transformation in described object function figure relevant to described neighbour
Power length sum negative correlation.
Technical scheme 2. is according to the information processor described in technical scheme 1, wherein
Described label similarity determines that unit is further configured to according to each initial data and other original number each
Described label similarity is calculated according to the distance in label vector space and label correlation matrix.
Technical scheme 3. is according to the information processor described in technical scheme 2, wherein
Each initial data and other initial datas each distance in label vector space be COS distance or Europe several
In obtain distance.
Technical scheme 4. is according to the information processor according to any one of technical scheme 1 to 3, wherein
Described related data determines that unit is further configured to for each initial data and other initial datas each,
When the label similarity of this initial data and these other initial datas is more than or equal to the first label threshold value, determine these other original
Data are the related datas of this initial data.
Technical scheme 5. is according to the information processor according to any one of technical scheme 1 to 3, wherein
Described characteristic similarity determines that unit is further configured to according to each initial data and other original number each
Described label similarity is calculated according to the distance in original feature vector space.
Technical scheme 6. is according to the information processor described in technical scheme 5, wherein
Each initial data and other initial datas each distance in characteristic vector space are Euclidean distances, graceful
Hatton's distance or card side's distance.
Technical scheme 7. is according to the information processor according to any one of technical scheme 1 to 6, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, is with right by the weight setting on this limit
Characteristic similarity between two initial datas of two nodes that should be connected in this limit and at least in label similarity
Item positive correlation.
Technical scheme 8. is according to the information processor according to any one of technical scheme 1 to 7, wherein
Described neighbour's related data selects unit to be further configured to for each initial data, at this initial data
In all related datas, the related data of the first predetermined number that selection is maximum with the characteristic similarity of this initial data is as this
Neighbour's related data of initial data.
Technical scheme 9. is according to the information processor described in technical scheme 8, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, if one be connected with this limit
Other initial datas of described first predetermined number that the characteristic similarity of the initial data corresponding to node is maximum include with
The initial data corresponding to another node that this edge is connected, then be 1 by the weight setting on this limit, otherwise by the power on this limit
Reset the characteristic similarity between two initial datas being set to two nodes being connected with corresponding to this limit and label is similar
At least one positive correlation in degree and less than or equal to 1.
Technical scheme 10. is according to the information processor according to any one of technical scheme 1 to 9, and it also includes:
Non-relevant data determines unit, and it is configured to for each initial data, based on this initial data with each its
The label similarity of his initial data determines that whether other initial datas each are the non-relevant data of this initial data;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, based on this initial data every
Individual non-relevant data and the characteristic similarity of this initial data, select this initial data in the non-relevant data of this initial data
Multiple neighbour's non-relevant data;And
Neighbour's irrelevant figure signal generating unit is configured to the irrelevant number of the neighbour with each initial data He this initial data
According to as node, formed between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data
Limit, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour;And wherein
Whole limits in described object function figure relevant to described neighbour are in the feature space through described linear transformation
Weighting length sum negative correlation, and the whole limits in figure irrelevant with described neighbour are in the feature through described linear transformation
Weighting length sum positive correlation in space.
Technical scheme 11. is according to the information processor described in technical scheme 10, wherein
Described related data determines that unit is further configured to for each initial data and other initial datas each,
When the label similarity of this initial data and these other initial datas is more than or equal to the first label threshold value, determine these other original
Data are the related datas of this initial data;
Described non-relevant data determines that unit is further configured to for each initial data and other original number each
According to, when the label similarity of this initial data and these other initial datas is less than the second label threshold value, determine these other original
Data are the non-relevant data of this initial data;And
Described first label threshold value is more than or equal to described second label threshold value.
Technical scheme 12. is according to the information processor described in technical scheme 10 or 11, wherein
Described neighbour's non-relevant data selects unit to be further configured to for each initial data, at this initial data
Non-relevant data in, select the non-relevant data conduct of second predetermined number maximum with the characteristic similarity of this initial data
Neighbour's non-relevant data of this initial data.
Technical scheme 13. is according to the information processor according to any one of technical scheme 10 to 12, wherein
Described neighbour irrelevant figure signal generating unit is further configured to for each edge, by the weight setting on this limit be with
In characteristic similarity between two initial datas of two nodes connected corresponding to this limit and label similarity at least
One positive correlation.
Technical scheme 14. is according to the information processor according to any one of technical scheme 10 to 13, wherein
Described neighbour irrelevant figure signal generating unit is further configured to for each edge, if two that this limit is connected
The characteristic similarity between two initial datas corresponding to node is more than the institute of one of them initial data with this initial data
There is a maximum in the characteristic similarity of related data, be then two the node institutes being connected with this limit by the weight setting on this limit
The corresponding characteristic similarity positive correlation between two initial datas, is otherwise 0 by the weight setting on this limit.
Technical scheme 15. is according to the information processor according to any one of technical scheme 1 to 14, and it is described by carrying out
Described initial data is classified by eigentransformation.
Technical scheme 16. is according to the information processor described in technical scheme 15, and wherein said multiple initial datas are many
Individual view data or multiple text data.
17. 1 kinds of information processors of technical scheme, for carrying out feature to multiple initial datas with many dimension labels
Conversion, described information processor includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent this initial data
The original feature vector of primitive character;
Label vector signal generating unit, it is configured to for each initial data, generates and represents this initial data and is had
Many dimension labels label vector;
Label similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its
His initial data label similarity in label vector space, and determine this other original number based on this label similarity
According to the non-relevant data being whether this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculate this initial data with each its
His initial data characteristic similarity in original feature vector space;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, owning at this initial data
In non-relevant data, based on the irrelevant number of multiple neighbours selecting this initial data with the characteristic similarity of this initial data
According to;
Neighbour's irrelevant figure signal generating unit, it is irrelevant that it is configured to the neighbour with each initial data He this initial data
Data, as node, are formed between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data
Limit, and be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to described many
Individual initial data carries out eigentransformation, and wherein, described object transformation matrix representative makes object function obtain the linear of maximum
Conversion, described object function and the whole limits in the irrelevant figure of described neighbour are in the feature space through described linear transformation
Weighting length sum positive correlation.
18. 1 kinds of information processing methods of technical scheme, it is for carrying out spy to multiple initial datas with many dimension labels
Levying conversion, described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space
Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this
Whether other initial datas are the related datas of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space
Characteristic similarity;
For each initial data, each related data based on this initial data and the feature similarity of this initial data
Degree, selects multiple neighbour's related datas of this initial data in the related data of this initial data;
Using neighbour's related data of each initial data and this initial data as node, corresponding to this initial data with
Form limit between the node of each neighbour's related data of this initial data, and be that each limit sets the power more than or equal to zero
Weight, thus form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation,
Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described
The whole limits that neighbour is correlated with in figure weighting length sum negative correlation in the feature space through described linear transformation.
19. 1 kinds of information processing methods of technical scheme, it is for carrying out spy to multiple initial datas with many dimension labels
Levying conversion, described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space
Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this
Whether other initial datas are the non-relevant data of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space
Characteristic similarity;
For each initial data, each non-relevant data based on this initial data and the feature similarity of this initial data
Degree, selects multiple neighbour's non-relevant data of this initial data in the non-relevant data of this initial data;
Using neighbour's non-relevant data of each initial data and this initial data as node, corresponding to this initial data
And form limit between the node of each neighbour's non-relevant data of this initial data, and be that each limit sets more than or equal to zero
Weight, thus form the irrelevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation,
Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described
The weighting length sum positive correlation in the feature space through described linear transformation of the whole limits in the irrelevant figure of neighbour.
The computer program that technical scheme 20. 1 kinds can be performed by the equipment of calculating, described computer program is upon execution
Described calculating equipment can be made to perform for the information processing that multiple initial datas with many dimension labels carry out eigentransformation
Method, described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space
Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this
Whether other initial datas are the related datas of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space
Characteristic similarity;
For each initial data, each related data based on this initial data and the feature similarity of this initial data
Degree, selects multiple neighbour's related datas of this initial data in the related data of this initial data;
Using neighbour's related data of each initial data and this initial data as node, corresponding to this initial data with
Form limit between the node of each neighbour's related data of this initial data, and be that each limit sets the power more than or equal to zero
Weight, thus form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation,
Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described
The whole limits that neighbour is correlated with in figure weighting length sum negative correlation in the feature space through described linear transformation.
21. 1 kinds of computer-readable recording mediums of technical scheme, on it, storage has the calculating that can be performed by calculating equipment
Machine program, described computer program can make described calculating equipment perform for having the multiple former of many dimension labels upon execution
Beginning data carry out the information processing method of eigentransformation, and described information processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each mark in label vector space
Sign similarity;
For each initial data, label similarity based on other initial datas each Yu this initial data determine this
Whether other initial datas are the related datas of this initial data;
For each initial data, calculate this initial data and other initial datas each in original feature vector space
Characteristic similarity;
For each initial data, each related data based on this initial data and the feature similarity of this initial data
Degree, selects multiple neighbour's related datas of this initial data in the related data of this initial data;
Using neighbour's related data of each initial data and this initial data as node, corresponding to this initial data with
Form limit between the node of each neighbour's related data of this initial data, and be that each limit sets the power more than or equal to zero
Weight, thus form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation,
Wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum, and described object function is with described
The whole limits that neighbour is correlated with in figure weighting length sum negative correlation in the feature space through described linear transformation.
Although technical scheme and the advantage thereof of the disclosure has been described in detail it should be appreciated that without departing from by appended
The spirit and scope of the present invention that limited of claim in the case of can carry out various change, substitute and convert.And,
The scope of the present disclosure is not limited only to the process described by description, equipment, manufacture, the structure of material, means, method and steps
Specific embodiment.One of ordinary skilled in the art will readily appreciate that from the disclosure, permissible according to the present invention
Use perform the function essentially identical to corresponding embodiment described herein or the acquisition result essentially identical with it, existing
With process, equipment, manufacture, the structure of material, means, method or step the most to be developed.Therefore, appended right is wanted
Ask and be intended in the range of them include such process, equipment, manufacture, the structure of material, means, method or step.
Although combine accompanying drawing above to describe in detail and embodiment of the disclosure, it is to be understood that reality described above
The mode of executing is only intended to illustrate the technical scheme of the disclosure, and is not intended that the restriction of technical scheme of this disclosure.For this
For the technical staff in field, above-mentioned embodiment can be made various changes and modifications the essence without departing from the present invention
And scope.Therefore, the scope of the present disclosure is only limited by appended claim and equivalents thereof.
Claims (10)
1. an information processor, for carrying out eigentransformation, described information to multiple initial datas with many dimension labels
Processing means includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent the original of this initial data
The original feature vector of feature;
Label vector signal generating unit, it is configured to for each initial data, and generation represents this initial data and had many
The label vector of dimension label;
Label similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former
Beginning data label similarity in label vector space;
Related data determines unit, and it is configured to for each initial data, original with this based on other initial datas each
The label similarity of data determines that whether these other initial datas are the related datas of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former
Beginning data characteristic similarity in original feature vector space;
Neighbour's related data selects unit, and it is configured to for each initial data, based on this initial data each relevant
Data and the characteristic similarity of this initial data, select multiple neighbours of this initial data in the related data of this initial data
Related data;
Neighbour is correlated with figure signal generating unit, its be configured to using neighbour's related data of each initial data and this initial data as
Node, forms limit between the node of each neighbour's related data corresponding to this initial data and this initial data, and is
Each limit sets the weight more than or equal to zero, thus forms the relevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to the plurality of former
Beginning data carry out eigentransformation, and wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum,
The weighting in the feature space through described linear transformation of the whole limits in described object function figure relevant to described neighbour is long
Degree sum negative correlation so that the Weighted distance that the neighbour after eigentransformation is correlated with in figure between each neighbour's related data is drawn
Closely.
Information processor the most according to claim 1, wherein
Described label similarity determines unit to be further configured to according to each initial data and other initial datas each to exist
Distance and label correlation matrix in label vector space calculate described label similarity.
Information processor the most according to claim 1 and 2, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, by the weight setting on this limit for and correspond to
Characteristic similarity between two initial datas of two nodes that this limit is connected and at least one in label similarity are just
Relevant.
Information processor the most according to claim 1 and 2, wherein
Described neighbour figure signal generating unit of being correlated with is further configured to for each edge, if the node being connected with this limit
Other initial datas of the first predetermined number that the characteristic similarity of corresponding initial data is maximum include and this edge institute
The initial data corresponding to another node connected, then be 1 by the weight setting on this limit, by the weight setting on this limit be otherwise
In characteristic similarity between two initial datas of two nodes being connected with corresponding to this limit and label similarity extremely
One item missing positive correlation and less than or equal to 1.
Information processor the most according to claim 1 and 2, it also includes:
Non-relevant data determines unit, and it is configured to for each initial data, based on this initial data with each other are former
The label similarity of beginning data determines that whether other initial datas each are the non-relevant data of this initial data;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, based on this initial data each non-
Related data and the characteristic similarity of this initial data, select this initial data many in the non-relevant data of this initial data
Individual neighbour's non-relevant data;And
Neighbour's irrelevant figure signal generating unit is configured to make with neighbour's non-relevant data of each initial data and this initial data
For node, between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data, form limit, and
And be that each limit sets the weight more than or equal to zero, thus form the irrelevant figure of neighbour;And wherein
Whole limits adding in the feature space through described linear transformation in described object function figure relevant to described neighbour
Whole limits in power length sum negative correlation, and figure irrelevant with described neighbour are at the feature space through described linear transformation
In weighting length sum positive correlation.
Information processor the most according to claim 5, wherein
Described neighbour irrelevant figure signal generating unit is further configured to for each edge, if two nodes that this limit is connected
The corresponding characteristic similarity between two initial datas is more than all phases of one of them initial data with this initial data
Close the maximum in the characteristic similarity of data, then by the weight setting on this limit corresponding to two nodes being connected with this limit
Two initial datas between characteristic similarity positive correlation, be otherwise 0 by the weight setting on this limit.
Information processor the most according to claim 1 and 2, it comes described original number by carrying out described eigentransformation
According to classifying.
Information processor the most according to claim 7, wherein said multiple initial datas be multiple view data or
Multiple text datas.
9. an information processor, for carrying out eigentransformation, described information to multiple initial datas with many dimension labels
Processing means includes:
Original feature vector signal generating unit, it is configured to, for each initial data, generate and represent the original of this initial data
The original feature vector of feature;
Label vector signal generating unit, it is configured to for each initial data, and generation represents this initial data and had many
The label vector of dimension label;
Label similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former
Beginning data label similarity in label vector space, and determine that these other initial datas are based on this label similarity
No is the non-relevant data of this initial data;
Characteristic similarity determines unit, and it is configured to for each initial data, calculates this initial data and each other are former
Beginning data characteristic similarity in original feature vector space;
Neighbour's non-relevant data selects unit, and it is configured to for each initial data, in all non-phase of this initial data
Close in data, based on the multiple neighbour's non-relevant data selecting this initial data with the characteristic similarity of this initial data;
Neighbour's irrelevant figure signal generating unit, it is configured to neighbour's non-relevant data of each initial data He this initial data
As node, between the node of each neighbour's non-relevant data corresponding to this initial data and this initial data, form limit,
And it is that each limit sets the weight more than or equal to zero, thus forms the irrelevant figure of neighbour;And
Eigentransformation unit, it is configured to solve object transformation matrix and according to described object transformation matrix to the plurality of former
Beginning data carry out eigentransformation, and wherein, described object transformation matrix representative makes the linear transformation that object function obtains maximum,
The weighting in the feature space through described linear transformation of described object function and the whole limits in the irrelevant figure of described neighbour
Length sum positive correlation so that Weighted distance between each neighbour's non-relevant data in the irrelevant figure of neighbour after eigentransformation
Zoomed out.
10. an information processing method, it is for carrying out eigentransformation, described letter to multiple initial datas with many dimension labels
Breath processing method includes:
For each initial data, generate the original feature vector of the primitive character representing this initial data;
For each initial data, generate the label vector representing many dimension labels that this initial data is had;
For each initial data, calculate this initial data and other initial datas each label phase in label vector space
Like degree;
For each initial data, label similarity based on other initial datas each Yu this initial data determine these other
Whether initial data is the related data of this initial data;
For each initial data, calculate this initial data and other initial datas each spy in original feature vector space
Levy similarity;
For each initial data, each related data based on this initial data and the characteristic similarity of this initial data,
The related data of this initial data selects multiple neighbour's related datas of this initial data;
Using neighbour's related data of each initial data and this initial data as node, former with this corresponding to this initial data
Form limit between the node of each neighbour's related data of beginning data, and be that each limit sets the weight more than or equal to zero, from
And form the relevant figure of neighbour;And
Solve object transformation matrix and according to described object transformation matrix, the plurality of initial data carried out eigentransformation, its
In, described object transformation matrix representative makes the linear transformation that object function obtains maximum, described object function with described closely
The whole limits in adjacent relevant figure weighting length sum negative correlation in the feature space through described linear transformation so that feature
The Weighted distance that neighbour after conversion is correlated with in figure between each neighbour's related data is furthered.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210152699.4A CN103425666B (en) | 2012-05-16 | 2012-05-16 | Information processor and information processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210152699.4A CN103425666B (en) | 2012-05-16 | 2012-05-16 | Information processor and information processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103425666A CN103425666A (en) | 2013-12-04 |
CN103425666B true CN103425666B (en) | 2016-12-14 |
Family
ID=49650424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210152699.4A Active CN103425666B (en) | 2012-05-16 | 2012-05-16 | Information processor and information processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103425666B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069483B (en) * | 2015-08-21 | 2019-01-01 | 中国地质大学(武汉) | The method that a kind of pair of categorized data set is tested |
CN107305543B (en) * | 2016-04-22 | 2021-05-11 | 富士通株式会社 | Method and device for classifying semantic relation of entity words |
CN111428251B (en) * | 2020-03-18 | 2023-04-28 | 北京明略软件系统有限公司 | Data processing method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101320370B (en) * | 2008-05-16 | 2011-06-01 | 苏州普达新信息技术有限公司 | Deep layer web page data source sort management method based on query interface connection drawing |
CN101515328B (en) * | 2008-12-18 | 2012-05-09 | 东华大学 | Local projection preserving method for identification of statistical noncorrelation |
CN102024262B (en) * | 2011-01-06 | 2012-07-04 | 西安电子科技大学 | Method for performing image segmentation by using manifold spectral clustering |
-
2012
- 2012-05-16 CN CN201210152699.4A patent/CN103425666B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN103425666A (en) | 2013-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sayed et al. | A binary clonal flower pollination algorithm for feature selection | |
Mallya et al. | Learning informative edge maps for indoor scene layout prediction | |
Karger et al. | Iterative learning for reliable crowdsourcing systems | |
Jonsbråten et al. | A class of stochastic programs withdecision dependent random elements | |
Joy et al. | Batch Bayesian optimization using multi-scale search | |
Wang et al. | Efficient learning by directed acyclic graph for resource constrained prediction | |
Al Mashrgy et al. | Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted dirichlet mixture models | |
Drton et al. | Binary models for marginal independence | |
Bajer et al. | A population initialization method for evolutionary algorithms based on clustering and Cauchy deviates | |
US8977579B2 (en) | Latent factor dependency structure determination | |
CN112364942B (en) | Credit data sample equalization method and device, computer equipment and storage medium | |
CN108021930A (en) | A kind of adaptive multi-view image sorting technique and system | |
CN103425666B (en) | Information processor and information processing method | |
Xia et al. | Incomplete multi-view clustering via kernelized graph learning | |
Pham et al. | Unsupervised training of Bayesian networks for data clustering | |
Hubin et al. | Flexible Bayesian nonlinear model configuration | |
Hubin et al. | Deep Bayesian regression models | |
Liu et al. | Semi-supervised stochastic blockmodel for structure analysis of signed networks | |
Niezgoda et al. | Unsupervised learning for efficient texture estimation from limited discrete orientation data | |
Unal et al. | Quantifying tradeoffs to reduce the dimensionality of complex design optimization problems and expedite trade space exploration | |
Huang et al. | Operator-adapted evolutionary large-scale multiobjective optimization for voltage transformer ratio error estimation | |
Montesinos López et al. | Reproducing Kernel Hilbert spaces regression and classification methods | |
Parvin et al. | CCHR: combination of classifiers using heuristic retraining | |
Sun et al. | Causal reasoning by evaluating the complexity of conditional densities with kernel methods | |
Platon et al. | Localized multiple sources self-organizing map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |