CN105447001B - Methods of Dimensionality Reduction in High-dimensional Data and device - Google Patents

Methods of Dimensionality Reduction in High-dimensional Data and device Download PDF

Info

Publication number
CN105447001B
CN105447001B CN201410379941.0A CN201410379941A CN105447001B CN 105447001 B CN105447001 B CN 105447001B CN 201410379941 A CN201410379941 A CN 201410379941A CN 105447001 B CN105447001 B CN 105447001B
Authority
CN
China
Prior art keywords
data
dominance relation
node
relation
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410379941.0A
Other languages
Chinese (zh)
Other versions
CN105447001A (en
Inventor
张世明
袁明轩
曾嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201410379941.0A priority Critical patent/CN105447001B/en
Publication of CN105447001A publication Critical patent/CN105447001A/en
Application granted granted Critical
Publication of CN105447001B publication Critical patent/CN105447001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Methods of Dimensionality Reduction in High-dimensional Data and devices, are applied to technical field of data processing.In embodiments of the present invention, first determine the dominance relation in high-dimensional data-object between any two data to obtain dominance relation figure, then library professor operation is carried out to obtain the intrinsic dimension of intrinsic lower dimensional space according to the dominance relation figure, the data vector for determining priority sequence group, coding and the intrinsic lower dimensional space of composition is finally executed according to the dominance relation figure.High-dimensional data-object can be thus mapped to intrinsic lower dimensional space, and when determining intrinsic dimension and priority sequence group in this process, it is all to be obtained according to dominance relation figure, so that finally obtained intrinsic lower dimensional space can be well reflected the dominance relation feature of high dimensional data in higher dimensional space.

Description

Methods of Dimensionality Reduction in High-dimensional Data and device
Technical field
The present invention relates to technical field of data processing, and in particular to Methods of Dimensionality Reduction in High-dimensional Data and device.
Background technique
High dimensional data dimensionality reduction technology mainly uses certain mapping method, by the Mapping of data points in former higher dimensional space to low In dimension space and certain primary characteristics of data is kept, usually as the pre-treatment step of machine learning.High dimensional data dimensionality reduction skill Art has been widely used, and is such as applied to image recognition, text mining, gene data analysis, text classification, image retrieval and disappears In the application such as the person's of expense relation management, as the quantity and dimension of data all sharply increase, the especially arrival of big data era, this The magnanimity and higher-dimension disaster of kind data generate a large amount of machine learning algorithms in terms of measurability and learning performance seriously Problem, so that high dimensional data dimension-reduction treatment becomes important one of analysis tool.
However existing all Methods of Dimensionality Reduction in High-dimensional Data are all that one kind has damage information dimensionality reduction, cannot reflect the sheet of high dimensional data Lower dimensional space is levied, i.e. certain features of higher dimensional space cannot be saved in lower dimensional space.Present high dimensional data contains Many redundancy features or dimension attribute, the presence of these features or attribute not only influence the higher-dimension characteristic of data, are also high dimensional data Effective analysis, cause many troubles, how effectively to reject these redundancy features or attribute so that data is reached its intrinsic low Dimension space is one of vital task for big data analysis.
Summary of the invention
The embodiment of the present invention provides Methods of Dimensionality Reduction in High-dimensional Data and device, and the assertive evidence lower dimensional space enabled is good Reflect the feature of high dimensional data in higher dimensional space.
First aspect of the embodiment of the present invention provides a kind of Methods of Dimensionality Reduction in High-dimensional Data, comprising:
Determine the dominance relation figure of high-dimensional data-object, the dominance relation figure is for indicating in the high-dimensional data-object Dominance relation between any two data;
Library professor operation is carried out according to the dominance relation figure, using obtained dyeing number as the high-dimensional data-object The intrinsic dimension of intrinsic lower dimensional space;
The multiple groups priority sequence group of the high-dimensional data-object, the priority sequence group are determined according to the dominance relation figure Quantity it is consistent with the intrinsic dimension;The priority sequence group is the high dimensional data by being ranked up according to dominance relation Data composition in object;
The data in the multiple groups priority sequence group are encoded respectively;
Coding of the data in the high-dimensional data-object in each group priority sequence group is formed into the intrinsic low-dimensional One data vector in space.
In the possible implementation of the first of first aspect of the embodiment of the present invention, in the dominance relation figure include node, And the dominance relation between node and node;
Wherein, if a node is better than another node, one node is the father node of another node, described another One node is the child node of one node;The node is used to indicate the data in the high-dimensional data-object;Two sections Dominance relation between point is used to indicate the dominance relation between data represented by described two nodes.
In conjunction with the first possible implementation of first aspect of the embodiment of the present invention, in first aspect of the embodiment of the present invention In second of possible implementation, the dominance relation figure of the determining high-dimensional data-object, later further include:
Merge the dominance relation figure after the dominance relation figure is optimized according to preset strategy, wherein described preset Strategy include: that father node is identical and the identical multiple nodes of child node merge into a node;And/or if first segment Point is the father node of second node, and second node is the father node of third node, then merging into first node is third node Father node;
Accordingly, described to carry out library professor to operate including: according to preferential after the optimization according to the dominance relation figure Relational graph carries out library professor operation.
In conjunction with the embodiment of the present invention in a first aspect, or first aspect the first to second may in implementation it is any Kind implementation, in the third possible implementation of first aspect of the embodiment of the present invention, the determining high-dimensional data-object Dominance relation figure, specifically include:
The preference relation set of user's input is received, includes N-dimensional data relationship information in the preference relation set, wherein It include the dominance relation information in the high-dimensional data-object between any two data in any one-dimensional data relation information;
If the preferential pass between the first data and the second data in the high-dimensional data-object in any two data System meets the first prerequisite, it is determined that the dominance relation between first data and the second data is that the first kind is preferentially closed System;The first kind dominance relation includes that a data are better than another data;
The dominance relation figure is determined according to the first kind dominance relation of the determination;
Wherein, first prerequisite includes: to tie up in the data relationship information in M, the first data and the second data Between dominance relation of the dominance relation in every one-dimensional data relationship information it is all identical, and all preferentially closed for the first kind System, the M is greater than preset value, and is less than or equal to the N.
In conjunction with first aspect of the embodiment of the present invention the first or second may implementation, in the embodiment of the present invention first In 4th kind of possible implementation of aspect, the dominance relation figure of the determining high-dimensional data-object is specifically included:
It is determined according to first part's data in the high-dimensional data-object initial excellent between first part's data First relational graph;
Randomly selecting any data in first part's data is reference data;
The second part data in the high-dimensional data-object in addition to first part's data are traversed, according to the ginseng The dominance relation between data and the second part data is examined, determines that the second part data are added to initial priority relationship Position in figure;
The second part data are added to the initial priority relational graph according to the position of the determination.
In conjunction with the 4th kind of possible implementation of first aspect of the embodiment of the present invention, in first aspect of the embodiment of the present invention In 5th kind of possible implementation, the dominance relation according between the reference data and the second part data, really The fixed second part data are added to the position in initial priority relational graph, specifically include:
If it is pre- that the dominance relation in the second part data between third data and the reference data meets second Condition is set, then if in the initial priority relational graph, the 4th data of the father node data as the reference data are excellent In the third data, it is determined that the third data are the son node number evidence of the 4th data, and determine the 4th number According to all son node numbers be inferior to the son node numbers of the third data according to the son node number evidence for being the third data;Such as Third data described in fruit are better than the 4th data, and father node data are not present in the 4th data, it is determined that the third data For the father node data of the 4th data;
If the dominance relation between the third data and the reference data meets third prerequisite, if In the initial priority relational graph, the 5th data of the son node number evidence as the reference data are better than the third data, And son node number evidence is not present in the 5th data, it is determined that the third data are the son node number evidence of the 5th data; If the third data are better than the 5th data, it is determined that the third data are the father node data of the 5th data, and Determine that the father node data for being better than the third data in all father node data of the 5th data are the third data Father node data;
Wherein, second prerequisite includes the third data better than the reference data, the preset item of third Part includes the reference data better than the third data.
In conjunction with the embodiment of the present invention in a first aspect, or first aspect the first is any into the 5th kind of possible implementation Kind implementation, it is described according to the dominance relation in the 6th kind of possible implementation of first aspect of the embodiment of the present invention Figure carries out library professor operation, specifically includes:
Dominance relation is the second class dominance relation in dominance relation figure after finding out the dominance relation figure or the optimization Multi-group data pair, the second class dominance relation is that the first data are not better than the second data, and second data are not also excellent In the first data;
The first kind dominance relation is established between two data in one group of data pair in the multi-group data pair, So that being first kind dominance relation between two data of another group of data centering of the multi-group data centering;
Described will not be set up in the data pair of the first kind dominance relation of the foundation and the multi-group data pair The data of a kind of dominance relation carry out library professor operation to as node;
Accordingly, the multiple groups priority sequence that data in the high-dimensional data-object are determined according to the dominance relation figure Group specifically includes:
It is preferentially closed according to the first kind of first kind dominance relation and the foundation in the dominance relation figure between data System, determines the multiple groups priority sequence group.
In conjunction with the embodiment of the present invention in a first aspect, or first aspect the first is any into the 6th kind of possible implementation Kind implementation, it is described excellent to the multiple groups respectively in the 7th kind of possible implementation of first aspect of the embodiment of the present invention Data encoding in first sequence group, specifically includes:
It is excellent in the multiple groups for two data adjacent in the first priority sequence group in the multiple groups priority sequence group In other priority sequence groups in first sequence group in addition to the first priority sequence group, if described two data are non-conterminous, And the dominance relation of described two data is identical as its relative order sequence in the first priority sequence group, then described two Coding of a data in the first priority sequence group is identical;
If described two data are non-conterminous, and the dominance relation of described two data and its in first priority sequence Relative order sequence in group is different, then coding of described two data in the first priority sequence group is different.
Second aspect of the embodiment of the present invention provides a kind of high dimensional data dimensionality reduction device, comprising:
Preferential figure determination unit, for determining that the dominance relation figure of high-dimensional data-object, the dominance relation figure are used for table Show the dominance relation in the high-dimensional data-object between any two data;
Dye unit, the dominance relation figure for being determined according to the preferential figure determination unit carry out library professor operation, will Intrinsic dimension of the obtained dyeing number as the intrinsic lower dimensional space of the high-dimensional data-object;
Sequence group determination unit, the dominance relation figure for being determined according to the preferential figure determination unit determine the higher-dimension The multiple groups priority sequence group of data object, the quantity of the priority sequence group is consistent with the intrinsic dimension, the priority sequence Group is made of the data in the high-dimensional data-object that is ranked up according to dominance relation;
Coding unit, the data in multiple groups priority sequence group for determining respectively to the sequence group determination unit carry out Coding;
Low-dimensional forms unit, for the coding according to the coding unit, by a data in the high-dimensional data-object Coding in each group priority sequence group forms a data vector of the intrinsic lower dimensional space.
In the possible implementation of the first of second aspect of the embodiment of the present invention, in the dominance relation figure include node, And the dominance relation between node and node;
Wherein, if a node is better than another node, one node is the father node of another node, described another One node is the child node of one node;The node is used to indicate the data in the high-dimensional data-object;Two sections Dominance relation between point is used to indicate the dominance relation between data represented by described two nodes.
In conjunction with the first possible implementation of second aspect of the embodiment of the present invention, in second aspect of the embodiment of the present invention In second of possible implementation, described device further include:
Optimize unit, is obtained for merging the dominance relation figure that the preferential figure determination unit determines according to preset strategy Dominance relation figure after optimization;
Wherein, the preset strategy includes: that father node is identical and the identical multiple nodes of child node merge into one Node;And/or if first node is the father node of second node, and second node is the father node of third node, then merges It is the father node of third node for first node;
The dye unit, specifically for carrying out library professor behaviour according to the dominance relation figure after the optimization unit optimization Make.
It is any into second of possible implementation in conjunction with the first of second aspect of the embodiment of the present invention or second aspect Kind implementation, in the third possible implementation of second aspect of the embodiment of the present invention, the preferential figure determination unit, tool Body includes:
Receiving unit includes N-dimensional data in the preference relation set for receiving the preference relation set of user's input Relation information, wherein including excellent between any two data in the high-dimensional data-object in any one-dimensional data relation information First relation information;
Relation determination unit, if for the first data and second in any two data in the high-dimensional data-object Dominance relation between data meets the first prerequisite, it is determined that the dominance relation between first data and the second data For first kind dominance relation, the dominance relation of the high-dimensional data-object is determined according to the first kind dominance relation of the determination Figure;The first kind dominance relation includes that a data are better than another data;
Wherein, first prerequisite includes: to tie up in the data relationship information in M, the first data and the second data Between dominance relation of the dominance relation in every one-dimensional data relationship information it is all identical, and all preferentially closed for the first kind System, the M is greater than preset value, and is less than or equal to the N.
In conjunction with second aspect of the embodiment of the present invention the first or second may implementation, in the embodiment of the present invention the In 4th kind of possible implementation of two aspects, the preferential figure determination unit, further includes:
Originally determined unit, for determining the first part according to first part's data in the high-dimensional data-object Initial priority relational graph between data;
It is reference data for randomly selecting any data in first part's data with reference to determination unit;
Position determination unit, for traversing second in the high-dimensional data-object in addition to first part's data Divided data determines the second part data according to the dominance relation between the reference data and the second part data It is added to the position in initial priority relational graph;
The second part data are added to institute by adding unit, the position for being determined according to the position determination unit State initial priority relational graph.
In conjunction with the 4th kind of possible implementation of second aspect of the embodiment of the present invention, in second aspect of the embodiment of the present invention In 5th kind of possible implementation, the position determination unit is specifically included:
First determination unit, if for excellent between third data and the reference data in the second part data First relationship meets the second prerequisite, then if in initial priority relational graph between first part's data, as institute The 4th data of the father node data of reference data are stated better than the third data, it is determined that the third data are as described the The son node number evidence of four data, and determine that all son node numbers of the 4th data are inferior to the son section of the third data in Son node number evidence of the point data as the third data;If the third data are better than the 4th data, and the 4th number According to there is no father node data, it is determined that father node data of the third data as the 4th data;
Second determination unit, if the dominance relation between the third data and the reference data meets third Prerequisite, then if in initial priority relational graph between first part's data, the son as the reference data 5th data of node data are better than the third data, and son node number evidence is not present in the 5th data, it is determined that described Son node number evidence of the third data as the 5th data;If the third data are better than the 5th data, it is determined that described Father node data of the third data as the 5th data, and be better than in all father node data of determining 5th data Father node data of the father node data of the third data as the third data;
Wherein, second prerequisite includes the third data better than reference data, the third prerequisite packet The reference data is included better than third data.
It is any into the 5th kind of possible implementation in conjunction with the first of second aspect of the embodiment of the present invention or second aspect Kind implementation, in the 6th kind of possible implementation of second aspect of the embodiment of the present invention, the dye unit is specifically included:
Data preferentially close searching unit for finding out in the dominance relation figure after the dominance relation figure or the optimization System is the multi-group data pair of the second class dominance relation, and the second class dominance relation is not better than the second data for the first data, and Second data are also not better than the first data;
Relationship establishes unit, for establishing institute between two data in one group of data pair in the multi-group data pair First kind dominance relation is stated, so that being the between two data of another group of data centering of the multi-group data centering found out A kind of dominance relation;
Library professor unit, for by the data pair of the first kind dominance relation of the foundation and the multi-group data The data of the first kind dominance relation are not set up in as node, carry out library professor operation;
The then sequence group determination unit determines number in the dominance relation figure with specific reference to the preferential figure determination unit First kind dominance relation and the relationship between establish the first kind dominance relation of unit foundation, determine that the multiple groups are preferential Sequence group.
It is any into the 6th kind of possible implementation in conjunction with the first of second aspect of the embodiment of the present invention or second aspect Kind implementation, in the 7th kind of possible implementation of second aspect of the embodiment of the present invention:
The coding unit, specifically for for adjacent in the first priority sequence group in the multiple groups priority sequence group Two data, in other priority sequence groups in the multiple groups priority sequence group in addition to the first priority sequence group, such as The described two data of fruit are non-conterminous, and the dominance relation of described two data and it is opposite in the first priority sequence group Collating sequence is identical, then coding of described two data in the first priority sequence group is identical;If described two data It is non-conterminous, and the dominance relation of described two data is different from its relative order sequence in the first priority sequence group, Then coding of described two data in the first priority sequence group is different.
In embodiments of the present invention, first determine the dominance relation in high-dimensional data-object between any two data to obtain Then dominance relation figure carries out library professor operation to obtain the intrinsic dimension of intrinsic lower dimensional space, most according to the dominance relation figure The data vector for determining priority sequence group, coding and the intrinsic lower dimensional space of composition is executed according to the dominance relation figure afterwards.Thus High-dimensional data-object can be mapped to intrinsic lower dimensional space, and determine intrinsic dimension and priority sequence group in this process When, it is all to be obtained according to dominance relation figure, so that finally obtained intrinsic lower dimensional space can be well reflected higher dimensional space The dominance relation feature of middle high dimensional data.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart of Methods of Dimensionality Reduction in High-dimensional Data provided in an embodiment of the present invention;
Fig. 2 is a kind of method flow diagram that dominance relation figure is determined in the embodiment of the present invention;
Fig. 3 a is another method flow diagram that dominance relation figure is determined in the embodiment of the present invention;
Fig. 3 b is another method flow diagram that dominance relation figure is determined in the embodiment of the present invention;
Fig. 4 is the method flow diagram for carrying out library professor operation in the embodiment of the present invention according to dominance relation figure;
Fig. 5 a is the schematic diagram of the dominance relation figure determined in Application Example of the present invention;
Fig. 5 b is the schematic diagram of dominance relation figure after optimizing in Application Example of the present invention;
Fig. 5 c is established between the data pair of first kind dominance relation and other data pair in Application Example of the present invention Relation schematic diagram;
Fig. 5 d is that the schematic diagram after library professor operation is carried out in Application Example of the present invention;
Fig. 5 e is in Application Example of the present invention to the schematic diagram after data encoding in multiple groups priority sequence group;
Fig. 5 f is the schematic diagram of the data vector in the intrinsic lower dimensional space formed in Application Example of the present invention;
Fig. 6 is a kind of structural schematic diagram of high dimensional data dimensionality reduction device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of another high dimensional data dimensionality reduction device provided in an embodiment of the present invention;
Fig. 8 is the structural schematic diagram of another high dimensional data dimensionality reduction device provided in an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of data processing equipment provided in an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It is described in detail separately below.
Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing Four " etc. are not use to describe a particular order for distinguishing different objects.In addition, term " includes " and " having " and Their any deformations, it is intended that cover and non-exclusive include.Such as contain a series of steps or units process, method, System, product or equipment are not limited to listed step or unit, but optionally further comprising the step of not listing or Unit, or optionally further comprising other step or units intrinsic for these process, methods, product or equipment.
The embodiment of the present invention provides a kind of Methods of Dimensionality Reduction in High-dimensional Data, is mainly mapped to the data of higher dimensional space intrinsic In lower dimensional space, and it is able to maintain the characteristic in higher dimensional space, the method for the present embodiment is held by high dimensional data dimensionality reduction device Capable method, flow chart are as shown in Figure 1, comprising:
Step 101, the dominance relation figure of high-dimensional data-object is determined, the dominance relation figure is for indicating high-dimensional data-object Dominance relation between middle any two data, dominance relation here may include first kind dominance relation, i.e. a data Better than another data;Or the second class dominance relation, i.e. a data are not better than another data, and be also not better than should for another data Data.And a data can be better than another data and be greater than in index value, may also mean that two data pass through one respectively Devise a stratagem calculate after obtain numerically be greater than etc..
It may include the dominance relation between node and node and node in the dominance relation figure, wherein if a section Point is better than another node, then one node is the father node of another node, and another node is the son section of one node Point;The node is used to indicate any data in high-dimensional data-object;Dominance relation between two nodes is for indicating described Dominance relation between data represented by two nodes.If father and son's node each other between two nodes, the two node institutes The dominance relation between two data indicated is first kind dominance relation;If not father and son's node each other between two nodes, Then illustrate that the dominance relation between data representated by the two nodes is the second class dominance relation.
User can according to need, and be supplied to the input interface of user by high dimensional data dimensionality reduction device to input high dimension According to the dominance relation in object between any two data, the feature of high-dimensional data-object dominance relation is thus set.
Step 102, library professor operation is carried out according to dominance relation figure, using obtained dyeing number as high-dimensional data-object The intrinsic dimension of intrinsic lower dimensional space.Wherein library professor operation refers to that with m kind color be each node in dominance relation figure Color, it is desirable that each node a kind of color, and make the color for having different between adjacent two nodes, m here is to dye Number.
In the present embodiment, high dimensional data dimensionality reduction device can first carry out certain pretreatment to the node in dominance relation figure Afterwards, then library professor operation is carried out.
Step 103, the multiple groups priority sequence group of high-dimensional data-object, the number of priority sequence group are determined according to dominance relation figure Amount is consistent with the intrinsic dimension that above-mentioned steps 102 obtain, and every group of priority sequence group is the height of the sequence carried out according to dominance relation Data form in dimension data object.
Step 104, the data in multiple groups priority sequence group are encoded respectively, due to for the same data, every Priority in group priority sequence group is different, then the coding in every group of priority sequence group is all different.
Step 105, using coding of the data in each group priority sequence group in high-dimensional data-object as intrinsic low One data vector of dimension space.
It should be noted that in a specific embodiment, after above-mentioned steps 101, high dimensional data dimensionality reduction device Can first to dominance relation figure carry out centainly pre-process after, just to pretreated dominance relation figure carry out step 102 after Operations.Under normal circumstances, pretreatment here refers to the optimization to dominance relation figure, when specific implementation, high dimensional data drop Dimension device can merge the dominance relation figure after dominance relation figure is optimized according to preset strategy.Wherein, preset strategy can To include: that father node is identical and the identical multiple nodes of child node merge into a node;And/or if first node is The father node of second node, and second node is the father node of third node, then merges into the father that first node is third node Node.So accordingly, when executing above-mentioned steps 102, library professor behaviour is mainly carried out according to the dominance relation figure after optimization Make, and is the multiple groups for determining data in high-dimensional data-object according to the dominance relation figure after optimization when executing above-mentioned steps 103 Priority sequence group.
As it can be seen that first determining any two in high-dimensional data-object in the Methods of Dimensionality Reduction in High-dimensional Data of the embodiment of the present invention Then dominance relation between data carries out library professor operation according to the dominance relation figure to obtain dominance relation figure to obtain this The intrinsic dimension of lower dimensional space is levied, is finally executed according to the dominance relation figure and determines that priority sequence group, coding and composition are intrinsic low The data vector of dimension space.High-dimensional data-object can be thus mapped to intrinsic lower dimensional space, and in this process really It is all to be obtained according to dominance relation figure, so that finally obtained intrinsic lower dimensional space when fixed intrinsic dimension and priority sequence group It can be well reflected the dominance relation feature of high dimensional data in higher dimensional space.
In a specific embodiment, high dimensional data dimensionality reduction device is preferentially closed in the determination executed in above-mentioned steps 101 When system's figure, it can specifically be realized by following several ways:
(1) refering to what is shown in Fig. 2, the method compared one by one using data two-by-two
A1: receiving the preference relation set of user's input, includes N-dimensional data relationship information in the preference relation set, In include dominance relation information in high-dimensional data-object between any two data in any one-dimensional data relation information, N can The dimension for thinking high-dimensional data-object might be less that the dimension of high-dimensional data-object.Specifically, which can be with It is information set by user, and wherein every one-dimensional data relation information can be the corresponding preferential pass of high-dimensional data-object System's figure, is also possible to the information such as priority sequence of data in high-dimensional data-object.
B1: judge the dominance relation between the first data and the second data in high-dimensional data-object in any two data Whether first prerequisite is met, if it is satisfied, then executing step E1 after executing step C1;If conditions are not met, then executing step Rapid D1 determines that the dominance relation between the first data and the second data is the second class dominance relation.Wherein, the first prerequisite It include: to be tieed up in the data relationship information in M, the dominance relation between the first data and the second data is in every one-dimensional data relationship Dominance relation in information is all identical, and is all first kind dominance relation, and M is greater than preset value here, and is less than or equal to N.
C1: correspondingly determine that the dominance relation between the first data and the second data is first kind dominance relation.
In this way, the above-mentioned any two for repeating above-mentioned steps B1 and C1, traversing in high-dimensional data-object can be passed through Data, so that it may finally obtain the dominance relation in high-dimensional data-object between any two data.
E1: determining the dominance relation figure of high-dimensional data-object according to finally obtained first kind dominance relation, specifically, will The node definition of two data with first kind dominance relation is father and son's node as a node by arbitrary data, thus The dominance relation figure of available high-dimensional data-object.
(2) with reference to shown in Fig. 3 a, the data in high-dimensional data-object are divided into two parts to determine dominance relation figure.
A2: the initial priority relationship between first part's data is determined according to first part's data in high-dimensional data-object Figure.
B2: randomly selecting any data in first part's data is reference data.
C2: second part data in addition to first part's data in traversal high-dimensional data-object, according to reference data with Dominance relation between second part data determines the position that second part data are added in initial priority relational graph.Specifically Ground, if reference data is prior to second part data, it is determined that the reference data is the father node data of second part data, If second part high priority data is in reference data, it is determined that the second part data are the father node data etc. of reference data.
D2: second part data are added to by initial priority relational graph according to the position determined in step C2.
(3) with reference to shown in Fig. 3 b, on the basis of data two-by-two compare one by one, using reverse comparative approach, this mode The method for determining dominance relation figure is the specific refinement that above-mentioned (2) kind mode determines dominance relation figure.
A3: according to the method for above-mentioned steps A2 to E2, first is determined in high-dimensional data-object by the first prerequisite Initial priority relational graph between divided data can be using inversely comparing for second part data in high-dimensional data-object Method is realized specifically by following steps:
B3: a ginseng is determined in first part's data that initial priority relationship has been determined by above-mentioned first prerequisite Examine data v.
C3: judge third data o in the second part data in high-dimensional data-object in addition to first part's data with Whether the dominance relation between reference data v meets the second prerequisite, if it is satisfied, then step D3 is executed, if conditions are not met, Then further execute step E3, wherein the second prerequisite includes third data o better than reference data v.Wherein, third here Data o refers to being greater than numerically better than reference data v, is also possible to be defined by the user, can also be about some letter Number be calculated numerically be greater than etc..
D3: if in initial priority relational graph between determining first part's data, the father as reference data v is saved 4th data p of point data is better than third data o, then using third data o as the son node number of the 4th data p according on being added to It states in the dominance relation figure between partial data, and all son node numbers of the 4th data p is inferior to the son of third data o in Son node number evidence of the node data q as third data o can indicate the third data o better than the son node number evidence in this way q。
If third data o is better than the 4th data p, and father node data are not present in the 4th data p, then by third data o Father node data as the 4th data p are added in the dominance relation figure between above-mentioned partial data.
E3: judge whether the dominance relation between the third data o in second part data and reference data v meets third Prerequisite, if it is satisfied, then step F3 is executed, if conditions are not met, then terminating process, wherein third prerequisite includes ginseng Data v is examined better than third data o.
F3: if in initial priority relational graph between first part's data, the son node number evidence as reference data v The 5th data r be better than third data o, and the 5th data r be not present son node number evidence, then using third data o as the 5th number It is added in the dominance relation figure between above-mentioned partial data according to the son node number evidence of r;
If third data o is better than the 5th data r, using third data o as the father node data of the 5th data r, and It will be better than the father node data of third data in all father node data of 5th data r as the father node number of third data o According to.
Refering to what is shown in Fig. 4, high dimensional data dimensionality reduction device is executing above-mentioned steps 102 in another specific embodiment In library professor operation when, can specifically be realized by following step:
A4: dominance relation is the multiple groups of the second class dominance relation in the dominance relation figure after finding out dominance relation figure or optimization Data pair, the second class dominance relation are that the first data are not better than the second data, and the second data are also not better than the first data.Its In searching multi-group data clock synchronization, if being the between two data in dominance relation figure (or optimization after dominance relation figure) Two class dominance relations, then finding out is two data pair, i.e. data are identical, but two data pair that sequence is different, such as in b1 Between b2, data can be found out to (b1, b2) and data to (b2, b1).
B4: it is preferential between two data in the multi-group data pair that step A4 is found out in one group of data pair to establish the first kind Relationship, so that being first between two data of another group of data centering of all multi-group data centerings that above-mentioned steps A4 is found out Class dominance relation.High dimensional data dimensionality reduction device can also be in one group of data pair for establishing first kind dominance relation, to because establishing The first kind dominance relation and cause to establish corresponding relationship between another group of data pair with first kind dominance relation.
Wherein, first kind dominance relation is established between two data of the high dimensional data dimensionality reduction device between one group of data pair When, it is the sequence according to two data in data pair to establish, specifically so that one group of data centering, first high priority data In second data, for example data then establish a1 prior to a2 to for (a1, a2).Then make two numbers of another group of data pair According to the first kind dominance relation established between data, and make first data of another group of data centering prior to second number According to.
C4: by the data for establishing first kind dominance relation to as node, library professor operation is carried out, so that adjacent Between node different colours.High dimensional data dimensionality reduction device, which also needs that the first kind will not be set up in above-mentioned multi-group data pair, preferentially to close The data of system are to (the i.e. above-mentioned data for not setting up corresponding relationship to) also as node for library professor operation, wherein need In all nodes for carrying out library professor operation, the first kind dominance relation of the data pair between two neighboring node can make above-mentioned Cyclization is organized between node after dominance relation figure or optimization in dominance relation figure.
In this case, high dimensional data dimensionality reduction device is when executing the determination priority sequence group in above-mentioned steps 103, no It only needs according to the first kind dominance relation in dominance relation figure, it is also necessary to preferential according to the first kind established in above-mentioned steps B4 Relationship determines multiple groups priority sequence group.
It is understood that above-mentioned steps A4 to C4 is first to some categories in dominance relation figure after dominance relation figure or optimization After centainly being handled between the data of the second class dominance relation, operated according to the library professor that the new node of formation carries out.
In other specific embodiments, high dimensional data dimensionality reduction device execute above-mentioned steps 104 in encoding operation when, It needs to realize in conjunction with the coding in multiple priority sequence groups, specifically: for the first precedence in multiple groups priority sequence group Two adjacent data in column group, other priority sequence groups in multiple groups priority sequence group in addition to the first priority sequence group In, if the two data are non-conterminous, and the dominance relation of the two data and its opposite row in the first priority sequence group Sequence sequence is identical, then coding of the two data in the first priority sequence group is identical;If the two data are non-conterminous, and this The dominance relation of two data is different from its relative order sequence in the first priority sequence group, then the two data are first Coding in priority sequence group is different.
The Methods of Dimensionality Reduction in High-dimensional Data for illustrating the embodiment of the present invention with a specific embodiment below, in the present embodiment In, the dimensionality reduction of high dimensional data is mainly realized by the following steps:
(1) dominance relation figure is obtained
High-dimensional data-object in the present embodiment is d dimension data vector set S={ a1, a2, a3 ... }, wherein a1=< A1.d0, a1.d1, a1.d2 ..., a1.dd >, a1.di is the i-th dimension numerical value of data vector a1, below for convenience, will Data a1.di in high-dimensional data-object is denoted as a, b, c ... ...
Preference relation set F={ f1, f2 ..., fd } is input in high dimensional data dimensionality reduction device by user first, wherein wrapping D dimension data relation information is included, all includes any two in above-mentioned S in any one-dimensional data relation information fi (i=1,2 ..., d) Dominance relation information between data, such as every one-dimensional data relation information can be a dominance relation figures.
High dimensional data dimensionality reduction device can determine the height according to flow chart shown in above-mentioned Fig. 2 or Fig. 3 a or Fig. 3 b in this way The corresponding dominance relation figure of dimension data object, herein without repeating.Such as shown in Fig. 5 a, each node is indicated in figure A data in high-dimensional data-object, the first kind dominance relation between data represented by two nodes passes through to be saved at two Directional arrow between point indicates, for example data a in figure is better than data b, b is better than g;And number represented by two nodes It is the second class dominance relation between, then does not have between the two nodes between the data b and c in directional arrow, such as figure Relationship be that b is not better than c, and c is not also due to b.
(2) dominance relation figure optimizes
The dominance relation figure of above-mentioned determination optimized to obtain according to preset strategy excellent after the optimization as described in Fig. 5 b First relational graph, preset strategy includes: that father node is identical and the identical multiple nodes of child node merge into a node, such as Data b and h in Fig. 5 a, the data a of father node having the same, and the data g of child node having the same, then in figure 5b Merge into the data b an of node;
If the first data are better than the second data, and the second data are better than third data, then merge into the first data and be better than Data a in third data, such as Fig. 5 a is better than n, and h is better than e, then merges into the data a in Fig. 5 b better than e etc..
(3) library professor operates
High dimensional data dimensionality reduction device first establishes the in dominance relation figure according to flow chart shown in above-mentioned Fig. 4 after optimization Then a kind of dominance relation carries out library professor operation again.Such as data shown in Fig. 5 c are to (d, a) and (d, b) etc., Fig. 5 b's Belong to the second class dominance relation after optimization in dominance relation figure;If in data to (d establishes first between two data a) Class dominance relation, i.e. d are better than a, then are better than b according to a in Fig. 5 b, then d is better than b, so that another data in Fig. 5 c to (d, It b) is first kind dominance relation between two data, then high dimensional data dimensionality reduction device can establish first kind dominance relation Data pair, with cause because establishing the first kind dominance relation between another data pair with first kind dominance relation establish pair It should be related to, for example the directional arrow added in Fig. 5 c indicates the corresponding relationship, such as in data to (d, a) between (d, b) Add directional arrow.
Then, then the data pair of first kind dominance relation will be established, and does not set up the data pair of first kind dominance relation (in Fig. 5 c be not connected with have the data of directional arrow to) is node, and carries out library professor operation, as shown in Figure 5 c, it establishes The data of first kind dominance relation are to including that (d a), (e, b), (f, c), (b, e) and (c, f), and does not set up the first kind and preferentially closes The data of system are to being (a, d), respectively with these data to the node for obtaining needing to carry out library professor operation for node, such as Fig. 5 d institute Show.Wherein, it needs to carry out in all nodes of library professor operation, the first kind dominance relation meeting of the data pair of two neighboring node So that a ring is formed between the node after above-mentioned optimization in dominance relation figure (as shown in Figure 5 b), such as fig 5d Adjacent data are to (e, b) and (c, e), due to establishing first between two data of each data centering of the two data pair Class dominance relation is then embodied in after optimization as shown in Figure 5 b in dominance relation figure, is directed toward b for e, and c is directed toward e, then is adding B after optimization in preferential figure is directed toward a (i.e. b is due to a), so that facilitating a directed loop between the node of data c, b and e.
Then to each node carry out library professor operate to obtain as Fig. 5 d's as a result, i.e. adjacent node different face Color, for example, node (d, a) with node (a, d) color it is different, respectively 0 and 1.Finally obtained dyeing number is 2, by the dye Chromatic number is as intrinsic dimension.
(4) it encodes
According to the first kind established between data pair in first kind dominance relation in the dominance relation figure of such as Fig. 5 b and Fig. 5 c Dominance relation determines two groups of priority sequence groups C0 and C1, i.e. C0 is (a b d c e f), and C1 is (da e b f c);Then right Data in every group of priority sequence group are encoded to obtain coding as depicted in fig. 5e, for example, for phase in priority sequence group C0 Adjacent data a and b, non-conterminous and collating sequence is identical as the collating sequence in C0 in priority sequence group C1, is all data a Before b, then identical coding is kept in priority sequence group C0, i.e., be all 0;For data adjacent in priority sequence group C0 B and d, in priority sequence group C1 non-conterminous and collating sequence (d is before b) in C0 collating sequence (b is before d) no Together, then different codings, i.e., respectively 0 and 1 are kept in priority sequence group C0.
(5) intrinsic lower dimensional space is formed
Using coding of the data in each group priority sequence group in high-dimensional data-object as an intrinsic lower dimensional space A data vector, obtain the data vector of intrinsic lower dimensional space as shown in figure 5f.For example, for the data in such as Fig. 5 e A, the coding in two groups of priority sequence groups are respectively 0 and 1, then by a data vector of the 0 and 1 intrinsic lower dimensional space of composition, And so on, herein without repeating.
The embodiment of the present invention also provides a kind of high dimensional data dimensionality reduction device, and structural schematic diagram is as shown in Figure 6, comprising:
Preferential figure determination unit 10, for determining that the dominance relation figure of high-dimensional data-object, the dominance relation figure are used for Indicate the dominance relation in the high-dimensional data-object between any two data;It wherein, include node in dominance relation figure, and Dominance relation between node and node a, wherein if node is better than another node, one node is another section The father node of point, another node are the child node of one node;One node is used to indicate the high dimensional data pair As middle any data;Dominance relation between two nodes is used to indicate the preferential pass between data represented by described two nodes System.
Specifically, excellent between data represented by the two nodes if father and son's node each other between two nodes First relationship is first kind dominance relation;If not father and son's node each other between two nodes, number represented by the two nodes Dominance relation between is the second class dominance relation.
Dye unit 11, the dominance relation figure for being determined according to the preferential figure determination unit 10 carry out library professor behaviour Make, using obtained dyeing number as the intrinsic dimension of the intrinsic lower dimensional space of the high-dimensional data-object.
Sequence group determination unit 12, for according to the dominance relation figure determination of the preferential determination of figure determination unit 10 The multiple groups priority sequence group of high-dimensional data-object, the quantity of the priority sequence group is consistent with the intrinsic dimension, described preferential Sequence group is made of the data in the high-dimensional data-object that is ranked up according to dominance relation.
Coding unit 13, the data in multiple groups priority sequence group for being determined respectively to the sequence group determination unit 12 It is encoded, the coding unit 13, it specifically can be for two adjacent in the first priority sequence group in multiple groups priority sequence group Data, in other priority sequence groups in the multiple groups priority sequence group in addition to the first priority sequence group, if institute State that two data are non-conterminous, and the dominance relation of described two data and its relative order sequence in the first priority sequence group Identical, then coding of described two data in the first priority sequence group is identical;If described two data are non-conterminous, and The dominance relation of described two data is different from its relative order sequence in the first priority sequence group, then described two data Coding in the first priority sequence group is different.
Low-dimensional forms unit 14, for the coding according to the coding unit 13, by one in the high-dimensional data-object Coding of the data in each group priority sequence group forms a data vector of the intrinsic lower dimensional space, each data vector Dimension is above-mentioned intrinsic dimension.
In the device of the embodiment of the present invention, preferential figure determination unit 10 first determines any two number in high-dimensional data-object Dominance relation between is to obtain dominance relation figure, and then dye unit 11 carries out library professor operation according to the dominance relation figure To obtain the intrinsic dimension of intrinsic lower dimensional space, final nucleotide sequence group determination unit 12, coding unit 13 and low-dimensional form unit 14 The data vector for determining priority sequence group, coding and the intrinsic lower dimensional space of composition is executed according to the dominance relation figure respectively.In this way High-dimensional data-object can be mapped to intrinsic lower dimensional space, and determine intrinsic dimension and priority sequence group in this process When, it is all to be obtained according to dominance relation figure, so that finally obtained intrinsic lower dimensional space can be well reflected higher dimensional space The dominance relation feature of middle high dimensional data.
Refering to what is shown in Fig. 7, in a specific embodiment, high dimensional data dimensionality reduction device is in addition to may include such as figure 6 above Shown in outside structure, can also include optimization unit 15, and preferential figure determination unit 10 therein can specifically pass through receive it is single Member 110 and relation determination unit 120 are realized, in which:
Optimize unit 15, is obtained for merging the dominance relation figure that the preferential figure determination unit determines according to preset strategy Dominance relation figure after to optimization, wherein the preset strategy includes: by father node is identical and the identical multiple sections of child node Point merges into a node;And/or if first node is the father node of second node, and second node is third node Father node then merges into the father node that first node is third node;
In this way, the dominance relation figure after dye unit 11 will optimize according to the optimization unit 15 carries out library professor behaviour Make;And sequence group determination unit 12 will determine the high dimensional data according to the dominance relation figure after optimization unit 15 optimization The multiple groups priority sequence group of data in object.
Receiving unit 110 includes N-dimensional in the preference relation set for receiving the preference relation set of user's input Data relationship information, wherein including between any two data in the high-dimensional data-object in any one-dimensional data relation information Dominance relation information, the N is the dimension of the high-dimensional data-object, or the dimension less than high-dimensional data-object;
Relation determination unit 120, if for the first data and second in high-dimensional data-object in any two data Dominance relation between data meets the first prerequisite, it is determined that the dominance relation between first data and the second data For first kind dominance relation, the dominance relation of the high-dimensional data-object is determined according to the first kind dominance relation of the determination Figure.
Wherein, the first kind dominance relation includes that a data are better than another data, and first prerequisite includes: In the received M dimension data relation information of the receiving unit 110, the dominance relation between the first data and the second data is every Dominance relation in the one-dimensional data relationship information is all identical, and is all first kind dominance relation, and the M is greater than preset Value, and it is less than or equal to the N.
It is appreciated that in the present embodiment, can directly by above-mentioned receiving unit 110 and relation determination unit 120 come The method compared one by one using data two-by-two obtains final dominance relation figure.
In other specific embodiments, preferential figure determination unit 10 can by originally determined unit, with reference to determination unit, It is realized with reference to determination unit and adding unit, in which:
Originally determined unit, for determining the first part according to first part's data in the high-dimensional data-object Initial priority relational graph between data, specifically, the originally determined unit can pass through according to above-mentioned relation determination unit 120 First prerequisite determines the initial priority relational graph in the high-dimensional data-object between first part's data, that is, passes through The method that data compare one by one two-by-two obtains.
It is reference data for randomly selecting any data in first part's data with reference to determination unit;
Position determination unit, for traversing second in the high-dimensional data-object in addition to first part's data Divided data, according to the dominance relation between the reference data and the second part data determined with reference to determination unit, really The fixed second part data are added to the position in initial priority relational graph;
The second part data are added to institute by adding unit, the position for being determined according to the position determination unit It states in the initial priority relational graph that originally determined unit determines.
Specifically, wherein position determination unit specifically can by following the first determination unit and the second determination unit, It realizes the dominance relation between second part data and the first data, i.e., is obtained using reverse analogy method, in which:
First determination unit, if determined for third data in second part data and the reference determination unit 130 Reference data between dominance relation meet the second prerequisite, then if determined in the relation determination unit 120 the In initial priority relational graph between a part of data, the 4th data of the father node data as the reference data are better than institute State third data, it is determined that son node number evidence of the third data as the 4th data, and determine the 4th data All son node numbers according in be inferior to the son node number of the third data according to the son node number evidence as the third data;Such as Third data described in fruit are better than the 4th data, and father node data are not present in the 4th data, it is determined that the third data Father node data as the 4th data;
Second determination unit, if referring to determination unit 130 with described for third data in the second part data Dominance relation between determining reference data meets third prerequisite, then if determined in the relation determination unit 120 First part's data between initial priority relational graph in, the 5th data of the son node number evidence as the reference data are excellent In the third data, and son node number evidence is not present in the 5th data, it is determined that the third data are as the described 5th The son node number evidence of data;If the third data are better than the 5th data, it is determined that the third data are as the described 5th The father node data of data, and it is better than the father node of the third data in all father node data of determining 5th data Father node data of the data as the third data;
Wherein, second prerequisite includes the third data better than reference data, the third prerequisite packet The reference data is included better than third data.
Refering to what is shown in Fig. 8, in a specific embodiment, high dimensional data dimensionality reduction device is in addition to may include such as figure 6 above Shown in outside structure, can also include optimization unit 15, and dye unit therein 11 specifically can be single to searching by data Member 111, relationship establish unit 112 and library professor unit 113 to realize, in which:
Data are preferential in dominance relation figure for finding out after the dominance relation figure or the optimization to searching unit 111 Relationship is the multi-group data pair of the second class dominance relation, and the second class dominance relation is that the first data are not better than the second data, And second data are also not better than the first data;
Relationship establishes unit 112, for one group of data pair in the multi-group data pair that the data find lookup 111 In two data between establish first kind dominance relation so that described another group of data centering of the multi-group data centering found out It is first kind dominance relation between two data;The relationship, which establishes unit 112, to establish first kind dominance relation One group of data pair causes between another group of data pair with first kind dominance relation to because establishing the first kind dominance relation Establish corresponding relationship.
Library professor unit 113, for the relationship to be established to the data for the first kind dominance relation that unit 112 is established To as node, library professor operation is carried out;And library professor unit 113 also needs to establish above-mentioned relation unit 112 and does not set up The data of a kind of dominance relation to the data of corresponding relationship (do not set up to) also as node, for library professor operation, wherein The first kind dominance relation for needing to carry out the data pair in all nodes of library professor operation, between two neighboring node can make Cyclization is organized between node after above-mentioned dominance relation figure or optimization in dominance relation figure.
In the present embodiment, sequence group determination unit 12, it is described excellent with specific reference to the preferential determination of figure determination unit 10 The relationship for including in first kind dominance relation and the dye unit 12 in first relational graph between data establishes the foundation of unit 112 First kind dominance relation, determine the multiple groups priority sequence group.
The embodiment of the present invention also provides a kind of data processing equipment, i.e., sets applied by above-mentioned Methods of Dimensionality Reduction in High-dimensional Data Standby, structural schematic diagram is as shown in Figure 9, comprising: including the memory 21 and processor 22 being connected respectively in bus, in which:
Processor 22 can be stored in memory 21 and handles the information such as the necessary file of data, for example processor 22 executes certain The code file etc. of one program step.
Processor 22, for determining the dominance relation figure of high-dimensional data-object, the dominance relation figure is for indicating described Dominance relation in high-dimensional data-object between any two data;Library professor operation is carried out according to dominance relation figure, will be obtained Intrinsic dimension of the dyeing number as the intrinsic lower dimensional space of the high-dimensional data-object;Institute is determined according to the dominance relation figure The multiple groups priority sequence group of high-dimensional data-object is stated, the quantity of the priority sequence group is consistent with the intrinsic dimension, described excellent First sequence group is made of the data in the high-dimensional data-object that is ranked up according to dominance relation;Respectively to the multiple groups Data in priority sequence group are encoded, by volume of the data in each group priority sequence group in the high-dimensional data-object Code forms a data vector of the intrinsic lower dimensional space, and the dimension of each data vector is above-mentioned intrinsic dimension.This Sample can be mapped to high-dimensional data-object intrinsic lower dimensional space, and determine intrinsic dimension and priority sequence in this process It is all to be obtained according to dominance relation figure, so that finally obtained intrinsic lower dimensional space can be well reflected higher-dimension sky when group Between middle high dimensional data dominance relation feature.
Specifically, including the dominance relation between node and node and node in above-mentioned dominance relation figure, wherein if One node is better than another node, then one node is the father node of another node, and another node is one The child node of node;One node is used to indicate any data in the high-dimensional data-object;Preferential pass between two nodes System is for indicating the dominance relation between data represented by described two nodes.And processor 22 specifically may be used when being encoded Removed in the multiple groups priority sequence group for two data adjacent in the first priority sequence group in multiple groups priority sequence group In other priority sequence groups except the first priority sequence group, if described two data are non-conterminous, and described two numbers According to dominance relation it is identical as its relative order sequence in the first priority sequence group, then described two data are described first Coding in priority sequence group is identical;If described two data are non-conterminous, and the dominance relation of described two data and its Relative order sequence in first priority sequence group is different, then coding of described two data in the first priority sequence group It is different.
In a specific embodiment, processor 22 are also used to after dominance relation figure has been determined, first according to preset Strategy merges the dominance relation figure after the dominance relation figure that the preferential figure determination unit determines is optimized, wherein described pre- The strategy set includes: that father node is identical and the identical multiple nodes of child node merge into a node;And/or if first Node is the father node of second node, and second node is the father node of third node, then merging into first node is third section The father node of point;Subsequent processor 22 will carry out library professor operation according to the dominance relation figure after optimization in this way;And it can basis Dominance relation figure after optimization determines the multiple groups priority sequence group of data in the high-dimensional data-object.
In another specific embodiment, processor 22, specifically can be by following several when determining dominance relation figure Kind of mode determines:
(1) method that data compare one by one two-by-two, i.e. processor 22 are specifically used for receiving the preference relation collection of user's input It closes, includes N-dimensional data relationship information in the preference relation set, wherein including the height in any one-dimensional data relation information Dominance relation information in dimension data object between any two data, the N are the dimension or small of the high-dimensional data-object In the dimension of high-dimensional data-object;
If the dominance relation between the first data and the second data in high-dimensional data-object in any two data is full The first prerequisite of foot, it is determined that the dominance relation between first data and the second data is first kind dominance relation, root The dominance relation figure of the high-dimensional data-object is determined according to the first kind dominance relation of the determination, wherein it is preferential to state the first kind Relationship includes that a data are better than another data, and first prerequisite includes: to tie up in the data relationship information in M, the Dominance relation of the dominance relation in every one-dimensional data relationship information between one data and the second data is all identical, and is All first kind dominance relations, the M is greater than preset value, and is less than or equal to the N.
(2) processor 22, for determining the first part according to first part's data in the high-dimensional data-object Initial priority relational graph between data specifically can pass through described first according to the method in above-mentioned (1) kind mode Prerequisite determines the initial priority relational graph in the high-dimensional data-object between first part's data, that is, passes through data two-by-two The method compared one by one obtains;Then randomly selecting any data in first part's data is reference data;Time then The second part data in the high-dimensional data-object in addition to first part's data are gone through, according to the reference data and institute The dominance relation between second part data is stated, determines that the second part data are added to the position in initial priority relational graph It sets;Finally the second part data are added in the initial priority relational graph according to determining position.
(3) processor 22 determine first part in the high-dimensional data-object for first passing through first prerequisite Initial priority relational graph between data is obtained by the method that data compare one by one two-by-two;
And for second part data in the high-dimensional data-object, processor 22 also needs first pre- by described first The condition of setting, which has determined, determines a reference data in first part's data of initial priority relational graph;If in second part data Dominance relation between third data and reference data meets the second prerequisite, then if determining first part's data it Between initial priority relational graph in, the 4th data of the father node data as the reference data are better than the third data, Then using the third data as the son node number evidence of the 4th data, and by all son node number evidences of the 4th data In be inferior to the son node number of the third data according to the son node number evidence as the third data;If the third data are excellent In the 4th data, and father node data are not present in the 4th data, then using the third data as the 4th data Father node data;
If the dominance relation in the second part data between third data and reference data meets the preset item of third Part, then if in initial priority relational graph between first part's data of the determination, the son as the reference data 5th data of node data are better than the third data, and son node number evidence is not present in the 5th data, then by described the Son node number evidence of three data as the 5th data;If the third data are better than the 5th data, by the third Father node data of the data as the 5th data, and will be in all father node data of the 5th data better than described the Father node data of the father node data of three data as the third data;
Wherein, second prerequisite includes the third data better than reference data, the third prerequisite packet The reference data is included better than third data.
In another specific embodiment, processor 22 is specifically found out described preferential when carrying out library professor operation Dominance relation is the multi-group data pair of the second class dominance relation, second class in dominance relation figure after relational graph or the optimization Dominance relation is that the first data are not better than the second data, and second data are also not better than the first data;In the multiple groups number First kind dominance relation is established according between two data of one group of data centering of centering, so that the data centering found out is another It is first kind dominance relation between two data of data pair;By one group of data pair for establishing the first kind dominance relation As node, library professor operation is carried out.And processor 22, it is also used in one group of data pair for establishing first kind dominance relation, Pass is corresponded to causing to establish between another group of data pair with first kind dominance relation because establishing the first kind dominance relation System, further by the data for not setting up first kind dominance relation to (i.e. it is above-mentioned be establish the data of corresponding relationship to) also conduct Node, for carrying out library professor operation.Wherein, it needs to carry out in all nodes of library professor operation, between two neighboring node The first kind dominance relations of data pair can make group between the node after above-mentioned dominance relation figure or optimization in dominance relation figure Cyclization.
In this case, processor 22 can be excellent according to the first kind in the determination dominance relation figure between data The first first kind dominance relation of relationship and the foundation determines the multiple groups priority sequence group.
The embodiment of the present invention also provides a kind of computer storage medium, wherein the computer storage medium can be stored with journey Sequence, the program include some or all of the Methods of Dimensionality Reduction in High-dimensional Data recorded in above method embodiment step when executing.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: read-only memory, random access memory, disk or CD etc..
It is provided for the embodiments of the invention Methods of Dimensionality Reduction in High-dimensional Data above and device is described in detail, herein Apply that a specific example illustrates the principle and implementation of the invention, the explanation of above example is only intended to help Understand method and its core concept of the invention;At the same time, for those skilled in the art, according to the thought of the present invention, There will be changes in the specific implementation manner and application range, and to sum up, the content of the present specification should not be construed as to the present invention Limitation.

Claims (16)

1. a kind of Methods of Dimensionality Reduction in High-dimensional Data characterized by comprising
Determine the dominance relation figure of high-dimensional data-object, the dominance relation figure is for indicating any in the high-dimensional data-object Dominance relation between two data;
Library professor operation is carried out according to the dominance relation figure, using obtained dyeing number as the intrinsic of the high-dimensional data-object The intrinsic dimension of lower dimensional space;
The multiple groups priority sequence group of the high-dimensional data-object, the number of the priority sequence group are determined according to the dominance relation figure It measures consistent with the intrinsic dimension;The priority sequence group is the high-dimensional data-object by being ranked up according to dominance relation In data composition;
The data in the multiple groups priority sequence group are encoded respectively;
Coding of the data in the high-dimensional data-object in each group priority sequence group is formed into the intrinsic lower dimensional space A data vector.
2. the method as described in claim 1, which is characterized in that include node and node and node in the dominance relation figure Between dominance relation;
Wherein, if a node is better than another node, one node is the father node of another node, another section Point is the child node of one node;The node is used to indicate the data in the high-dimensional data-object;Two nodes it Between dominance relation be used to indicate dominance relation between data represented by described two nodes.
3. method according to claim 2, which is characterized in that the dominance relation figure of the determining high-dimensional data-object, later Further include:
Merge the dominance relation figure after the dominance relation figure is optimized according to preset strategy, wherein the preset plan It slightly include: that father node is identical and the identical multiple nodes of child node merge into a node;And/or if first node is The father node of second node, and second node is the father node of third node, then merges into the father that first node is third node Node;
Accordingly, described to carry out library professor to operate including: according to the dominance relation after the optimization according to the dominance relation figure Figure carries out library professor operation.
4. method according to any one of claims 1 to 3, which is characterized in that the determining high-dimensional data-object it is preferential Relational graph specifically includes:
The preference relation set of user's input is received, includes N-dimensional data relationship information in the preference relation set, wherein arbitrarily It include the dominance relation information in the high-dimensional data-object between any two data in one-dimensional data relation information;
If the dominance relation between the first data and the second data in the high-dimensional data-object in any two data is full The first prerequisite of foot, it is determined that the dominance relation between first data and the second data is first kind dominance relation;Institute Stating first kind dominance relation includes a data better than another data;
The dominance relation figure is determined according to the first kind dominance relation of the determination;
Wherein, first prerequisite includes: to tie up in the data relationship information in M, between the first data and the second data Dominance relation of the dominance relation in every one-dimensional data relationship information it is all identical, and be all first kind dominance relation, institute M is stated greater than preset value, and is less than or equal to the N.
5. method as claimed in claim 2 or claim 3, which is characterized in that the dominance relation figure of the determining high-dimensional data-object has Body includes:
Determine that the initial priority between first part's data closes according to first part's data in the high-dimensional data-object System's figure;
Randomly selecting any data in first part's data is reference data;
The second part data in the high-dimensional data-object in addition to first part's data are traversed, according to the reference number According to the dominance relation between the second part data, determine that the second part data are added in initial priority relational graph Position;
The second part data are added to the initial priority relational graph according to the position of the determination.
6. method as claimed in claim 5, which is characterized in that described according to the reference data and the second part data Between dominance relation, determine that the second part data are added to the position in initial priority relational graph, specifically include:
If the dominance relation in the second part data between third data and the reference data meets the second preset item Part, then if in the initial priority relational graph, the 4th data of the father node data as the reference data are better than institute State third data, it is determined that the third data are the son node number evidence of the 4th data, and determine the 4th data All son node numbers are inferior to the son node number of the third data according to the son node number evidence for being the third data in;If institute Third data are stated better than the 4th data, and father node data are not present in the 4th data, it is determined that the third data are institute State the father node data of the 4th data;
If the dominance relation between the third data and the reference data meets third prerequisite, if described In initial priority relational graph, the 5th data of the son node number evidence as the reference data are better than the third data, and institute Stating the 5th data, there is no son node number evidences, it is determined that the third data are the son node number evidence of the 5th data;If The third data are better than the 5th data, it is determined that the third data are the father node data of the 5th data, and are determined The father node data for being better than the third data in all father node data of 5th data are the father of the third data Node data;
Wherein, second prerequisite includes the third data better than the reference data, the third prerequisite packet The reference data is included better than the third data.
7. according to the method described in claim 3, it is characterized in that, the dominance relation figure according to after the optimization carries out figure Dying operation specifically includes:
In dominance relation figure after finding out the optimization dominance relation be the second class dominance relation multi-group data pair, described second Class dominance relation is that the first data are not better than the second data, and second data are also not better than the first data;
First kind dominance relation is established between two data in one group of data pair in the multi-group data pair, so that described It is first kind dominance relation, the first kind dominance relation between two data of another group of data centering of multi-group data centering It is better than another data including a data;
The first kind will not be set up in the data pair of the first kind dominance relation of the foundation and the multi-group data pair The data of dominance relation carry out library professor operation to as node;
Accordingly, the multiple groups priority sequence group that data in the high-dimensional data-object are determined according to the dominance relation figure, It specifically includes:
According to the first kind dominance relation of first kind dominance relation and the foundation in the dominance relation figure between data, really The fixed multiple groups priority sequence group.
8. according to claim 1, method described in any one of 2,3,6 or 7, which is characterized in that described respectively to the multiple groups Data encoding in priority sequence group, specifically includes:
For two data adjacent in the first priority sequence group in the multiple groups priority sequence group, in the multiple groups precedence In other priority sequence groups in column group in addition to the first priority sequence group, if described two data are non-conterminous, and institute The dominance relation for stating two data is identical as its relative order sequence in the first priority sequence group, then described two numbers It is identical according to the coding in the first priority sequence group;
If described two data are non-conterminous, and the dominance relation of described two data and its in the first priority sequence group Relative order sequence it is different, then coding of described two data in the first priority sequence group is different.
9. a kind of high dimensional data dimensionality reduction device characterized by comprising
Preferential figure determination unit, for determining the dominance relation figure of high-dimensional data-object, the dominance relation figure is for indicating institute State the dominance relation in high-dimensional data-object between any two data;
Dye unit, the dominance relation figure for being determined according to the preferential figure determination unit carry out library professor operation, will obtain Intrinsic dimension of the dyeing number as the intrinsic lower dimensional space of the high-dimensional data-object;
Sequence group determination unit, the dominance relation figure for being determined according to the preferential figure determination unit determine the high dimensional data The multiple groups priority sequence group of object, the quantity of the priority sequence group is consistent with the intrinsic dimension, and the priority sequence group is It is made of the data in the high-dimensional data-object that is ranked up according to dominance relation;
Coding unit, the data in multiple groups priority sequence group for determining respectively to the sequence group determination unit are compiled Code;
Low-dimensional forms unit, for the coding according to the coding unit, by a data in the high-dimensional data-object each Coding in group priority sequence group forms a data vector of the intrinsic lower dimensional space.
10. device as claimed in claim 9, which is characterized in that include node and node and node in the dominance relation figure Between dominance relation;
Wherein, if a node is better than another node, one node is the father node of another node, another section Point is the child node of one node;The node is used to indicate the data in the high-dimensional data-object;Two nodes it Between dominance relation be used to indicate dominance relation between data represented by described two nodes.
11. device as claimed in claim 10, which is characterized in that described device further include:
Optimize unit, is optimized for merging the dominance relation figure that the preferential figure determination unit determines according to preset strategy Dominance relation figure afterwards;
Wherein, the preset strategy includes: that father node is identical and the identical multiple nodes of child node merge into a node; And/or if first node is the father node of second node, and second node is the father node of third node, then merges into the One node is the father node of third node;
The dye unit, specifically for carrying out library professor operation according to the dominance relation figure after the optimization unit optimization.
12. according to the described in any item devices of claim 9 to 11, which is characterized in that the preferential figure determination unit is specific to wrap It includes:
Receiving unit includes N-dimensional data relationship in the preference relation set for receiving the preference relation set of user's input Information, wherein including the preferential pass in the high-dimensional data-object between any two data in any one-dimensional data relation information It is information;
Relation determination unit, if for the first data and the second data in any two data in the high-dimensional data-object Between dominance relation meet the first prerequisite, it is determined that dominance relation between first data and the second data is the A kind of dominance relation determines the dominance relation figure of the high-dimensional data-object according to the first kind dominance relation of the determination;Institute Stating first kind dominance relation includes a data better than another data;
Wherein, first prerequisite includes: to tie up in the data relationship information in M, between the first data and the second data Dominance relation of the dominance relation in every one-dimensional data relationship information it is all identical, and be all first kind dominance relation, institute M is stated greater than preset value, and is less than or equal to the N.
13. device as described in claim 10 or 11, which is characterized in that the preferential figure determination unit, further includes:
Originally determined unit, for determining first part's data according to first part's data in the high-dimensional data-object Between initial priority relational graph;
It is reference data for randomly selecting any data in first part's data with reference to determination unit;
Position determination unit, for traversing the second part number in the high-dimensional data-object in addition to first part's data According to determining the second part data addition according to the dominance relation between the reference data and the second part data To the position in initial priority relational graph;
The second part data are added to described first by adding unit, the position for being determined according to the position determination unit Beginning dominance relation figure.
14. device as claimed in claim 13, which is characterized in that the position determination unit specifically includes:
First determination unit, if for the preferential pass in the second part data between third data and the reference data System meets the second prerequisite, then if in initial priority relational graph between first part's data, as the ginseng The 4th data of the father node data of data are examined better than the third data, it is determined that the third data are as the 4th number According to son node number evidence, and determine that all son node numbers of the 4th data are inferior to the son node number of the third data in According to the son node number evidence as the third data;If the third data are better than the 4th data, and the 4th data are not There are father node data, it is determined that father node data of the third data as the 4th data;
Second determination unit, if it is preset to meet third for the dominance relation between the third data and the reference data Condition, then if in initial priority relational graph between first part's data, the child node as the reference data 5th data of data are better than the third data, and son node number evidence is not present in the 5th data, it is determined that the third Son node number evidence of the data as the 5th data;If the third data are better than the 5th data, it is determined that the third Father node data of the data as the 5th data, and better than described in all father node data of determining 5th data Father node data of the father node data of third data as the third data;
Wherein, second prerequisite includes the third data better than reference data, and the third prerequisite includes institute Reference data is stated better than third data.
15. 1 described in any item devices according to claim 1, which is characterized in that the dye unit specifically includes:
Data are the second class dominance relation for finding out dominance relation in the dominance relation figure after the optimization to searching unit Multi-group data pair, the second class dominance relation are that the first data are not better than the second data, and second data are also not better than First data;
Relationship establishes unit, for establishing the first kind between two data in one group of data pair in the multi-group data pair Dominance relation, so that preferential for the first kind between two data of another group of data centering of the multi-group data centering found out Relationship, the first kind dominance relation include that a data are better than another data;
Library professor unit, for by the data pair and the multi-group data centering of the first kind dominance relation of the foundation The data of the first kind dominance relation are not set up to as node, carry out library professor operation;
The then sequence group determination unit, with specific reference to the preferential figure determination unit determine in the dominance relation figure data it Between first kind dominance relation and the relationship establish unit foundation first kind dominance relation, determine the multiple groups priority sequence Group.
16. the device according to any one of claim 9,10,11,14 or 15, which is characterized in that
The coding unit, specifically for for two adjacent in the first priority sequence group in the multiple groups priority sequence group Data, in other priority sequence groups in the multiple groups priority sequence group in addition to the first priority sequence group, if institute State that two data are non-conterminous, and the dominance relation of described two data and its relative order in the first priority sequence group Sequentially identical, then coding of described two data in the first priority sequence group is identical;If described two data not phase Neighbour, and the dominance relation of described two data and its relative order sequence in the first priority sequence group are different, then institute It is different to state coding of two data in the first priority sequence group.
CN201410379941.0A 2014-08-04 2014-08-04 Methods of Dimensionality Reduction in High-dimensional Data and device Active CN105447001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410379941.0A CN105447001B (en) 2014-08-04 2014-08-04 Methods of Dimensionality Reduction in High-dimensional Data and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410379941.0A CN105447001B (en) 2014-08-04 2014-08-04 Methods of Dimensionality Reduction in High-dimensional Data and device

Publications (2)

Publication Number Publication Date
CN105447001A CN105447001A (en) 2016-03-30
CN105447001B true CN105447001B (en) 2018-12-14

Family

ID=55557197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410379941.0A Active CN105447001B (en) 2014-08-04 2014-08-04 Methods of Dimensionality Reduction in High-dimensional Data and device

Country Status (1)

Country Link
CN (1) CN105447001B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546332A (en) * 2009-05-07 2009-09-30 哈尔滨工程大学 Manifold dimension-reducing medical image search method based on quantum genetic optimization
CN103608834A (en) * 2011-04-11 2014-02-26 谷歌公司 Priority dimensional data conversion path reporting
CN103605734A (en) * 2013-11-19 2014-02-26 广东电网公司电力科学研究院 Characteristic vector based data transmission compression method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU5926299A (en) * 1998-09-17 2000-04-03 Catholic University Of America, The Data decomposition/reduction method for visualizing data clusters/sub-clusters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546332A (en) * 2009-05-07 2009-09-30 哈尔滨工程大学 Manifold dimension-reducing medical image search method based on quantum genetic optimization
CN103608834A (en) * 2011-04-11 2014-02-26 谷歌公司 Priority dimensional data conversion path reporting
CN103605734A (en) * 2013-11-19 2014-02-26 广东电网公司电力科学研究院 Characteristic vector based data transmission compression method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张川等.确定启停机计划的解耦算法.《中国电机工程学报》.1988,第8卷(第6期),第26-31页. *
李焕勤等.基于多目标遗传算法的装配线平衡问题.《实验室研究与探索》.2011,第30卷(第8期),第36-40页. *
王菁等.FCAN:一种基于快速映射的内容访问网络.《系统仿真学报》.2007,第19卷(第17期),第3955-3960页. *

Also Published As

Publication number Publication date
CN105447001A (en) 2016-03-30

Similar Documents

Publication Publication Date Title
CN104216894B (en) Method and system for data query
CN104462609B (en) RDF data storage and querying method with reference to star-like graph code
AU2011224139B2 (en) Analysis of object structures such as benefits and provider contracts
CN104580027A (en) OpenFlow message forwarding method and equipment
CN108664999A (en) A kind of training method and its device, computer server of disaggregated model
CN104408163B (en) A kind of data classification storage and device
CN105404690A (en) Database querying method and apparatus
CN105511801B (en) The method and apparatus of data storage
CN107506310A (en) A kind of address search, key word storing method and equipment
US20120215770A1 (en) Structured relevance - a mechanism to reveal why data is related
CN113672369A (en) Method and device for verifying ring of directed acyclic graph, electronic equipment and storage medium
WO2017201605A1 (en) Large scale social graph segmentation
CN105095212B (en) The method and apparatus for creating Hash table
CN103929499B (en) A kind of Internet of Things isomery index identification method and system
CN106911777A (en) A kind of data processing method and server
CN105447001B (en) Methods of Dimensionality Reduction in High-dimensional Data and device
CN108536834A (en) Update the method, apparatus and terminal of list
Mamaghani et al. A learning automaton based approach for data fragments allocation in distributed database systems
CN107784091A (en) A kind of operating right querying method and terminal device
Smirnov et al. Ontology-based collaboration in multi-robot system: Approach and case study
CN106778864A (en) Initial sample selection method and device
CN102855278A (en) Simulation method and system
CN105760424A (en) Database establishment method for storing key data of enterprise products
CN103279423B (en) The addressing method of a kind of content adressable memory and equipment
CN113886226B (en) Test data generation method of confrontation generation model based on twin network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant