CN110309139A - Higher-dimension neighbour is to searching method and system - Google Patents

Higher-dimension neighbour is to searching method and system Download PDF

Info

Publication number
CN110309139A
CN110309139A CN201810179962.6A CN201810179962A CN110309139A CN 110309139 A CN110309139 A CN 110309139A CN 201810179962 A CN201810179962 A CN 201810179962A CN 110309139 A CN110309139 A CN 110309139A
Authority
CN
China
Prior art keywords
sample
neighbour
dimension
signature
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810179962.6A
Other languages
Chinese (zh)
Other versions
CN110309139B (en
Inventor
童毅轩
张佳师
姜珊珊
郑继川
董滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liguang Software Research Institute (beijing) Co Ltd
Original Assignee
Liguang Software Research Institute (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liguang Software Research Institute (beijing) Co Ltd filed Critical Liguang Software Research Institute (beijing) Co Ltd
Priority to CN201810179962.6A priority Critical patent/CN110309139B/en
Publication of CN110309139A publication Critical patent/CN110309139A/en
Application granted granted Critical
Publication of CN110309139B publication Critical patent/CN110309139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Abstract

The invention proposes a kind of higher-dimension neighbours to searching method and system, wherein higher-dimension neighbour is to searching method the following steps are included: according to the corresponding sample signature of numerical generation of sample vector;Neighbour's candidate set is generated according to sample signature;The distance between any two sample in each neighbour's candidate set is calculated, distance is met into the sample of preset requirement to as neighbor search result.Hereby it is achieved that effective search of higher-dimension neighbour couple, meets the search need of user, and this method is simple, it is easy to accomplish.

Description

Higher-dimension neighbour is to searching method and system
Technical field
The present invention relates to field of computer technology more particularly to a kind of higher-dimension neighbour to searching method and a kind of higher-dimension neighbour To search system.
Background technique
With the development of science and technology, large-scale search engine must have fast and effectively search capability, common at present Searching method include k-d tree, R- tree etc..But both data structures and their mapped structure are only applicable to search for The lower data of dimension.In order to increase search precision, the feature vector for characterizing target to be searched such as image often has height Characteristic is tieed up, the dimension order of magnitude can reach 105.When the dimension of data is more than 100, even up to thousands of dimensions when, above-mentioned data knot The search capability of structure will hastily fail.Therefore, how to realize effective search of higher-dimension neighbour couple, still there is very high research Value.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, the first purpose of this invention is to propose a kind of higher-dimension neighbour to searching method, to realize higher-dimension neighbour Pair effective search, meet the search need of user.
Second object of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
Third object of the present invention is to propose a kind of computer program product.
Fourth object of the present invention is to propose a kind of higher-dimension neighbour to search system.
In order to achieve the above objectives, first aspect present invention embodiment proposes a kind of higher-dimension neighbour to searching method, including Following steps: according to the corresponding sample signature of the numerical generation of sample vector;Neighbour's candidate set is generated according to the sample signature; The distance between any two sample in each neighbour's candidate set is calculated, distance is met into the sample of preset requirement to as close Adjacent search result.
Higher-dimension neighbour according to an embodiment of the present invention is corresponding according to the numerical generation of sample vector first to search system Then sample signature generates neighbour's candidate set according to sample signature, and then calculates any two sample in each neighbour's candidate set The distance between this, and by distance meet the sample of preset requirement to as neighbor search as a result, hereby it is achieved that higher-dimension neighbour Pair effective search, meet the search need of user, and this method is simple, it is easy to accomplish.
In addition, higher-dimension neighbour according to the above embodiment of the present invention can also have following additional technology to searching method Feature:
According to one embodiment of present invention, the sample signature is binary set.
According to one embodiment of present invention, the numerical generation sample signature according to sample vector, comprising: pass through throwing Shadow matrix R(k×d)Sample vector is mapped to object vector from original vector, wherein d is the dimension of original vector, and k is target The dimension of vector, d are greater than k;If the value of the object vector is not less than zero, 1 is assigned in the corresponding position of sample signature;If The value of the object vector then assigns 0 in the corresponding position of sample signature less than zero.
According to one embodiment of present invention, the projection matrix R(k×d)It is generated at random by Gaussian Profile N (0,1/k).
According to one embodiment of present invention, described that neighbour's candidate set is generated according to the sample signature, comprising: S21, structure It makes depth and is the binary tree of N, and the sample signature is stored in the leaf node of the binary tree, wherein from the y-bend The path of the root node of tree to leaf node is corresponding with the numerical value of N-dimensional before the sample signature, the identical Sample preservation of top N signature In the same leaf node, N is less than the length M of the sample signature;S22, when the sample signature quantity in leaf node is greater than the When one preset value T, the different cotyledon node of the two different sample signature of N+1 bit value is divided under the leaf node; S23 carries out beta pruning to tree, the leaf node of n-th layer is wiped out;S24 repeats step S22 and S23, until there is no need cutting Leaf node.
According to one embodiment of present invention, described that distance is met into the sample of preset requirement to as neighbor search knot Fruit, comprising: respectively by the sample in calculated same neighbour's candidate set to the distance between be ranked up, and obtain before K It is a apart from lesser sample pair;The K sample in different neighbour's candidate sets that will acquire to being ranked up, and by preceding K away from From lesser sample to as the neighbor search result.
In order to achieve the above objectives, second aspect of the present invention embodiment proposes a kind of non-transitory computer-readable storage medium Matter is stored thereon with computer program, which realizes above-mentioned higher-dimension neighbour to searching method when being executed by processor.
The non-transitorycomputer readable storage medium of the embodiment of the present invention storing with above-mentioned higher-dimension by executing thereon Neighbour can be realized effective search of higher-dimension neighbour couple, meet the search need of user to the corresponding program of searching method.
In order to achieve the above objectives, third aspect present invention embodiment proposes a kind of computer program product, works as computer When instruction in program product is executed by processor, above-mentioned higher-dimension neighbour is executed to searching method.
The computer program product of the embodiment of the present invention, it is wherein corresponding to searching method with above-mentioned higher-dimension neighbour by executing Program, can be realized effective search of higher-dimension neighbour couple, meet the search need of user.
In order to achieve the above objectives, fourth aspect present invention embodiment proposes a kind of higher-dimension neighbour to search system, comprising: First generation module, for the corresponding sample signature of numerical generation according to sample vector;Second generation module, for according to institute It states sample signature and generates neighbour's candidate set;Processing module, for calculating between any two sample in each neighbour's candidate set Distance, distance is met into the sample of preset requirement to as neighbor search result.
Higher-dimension neighbour according to an embodiment of the present invention is to search system, first by the first generation module according to sample vector The corresponding sample signature of numerical generation, then by the second generation module according to sample signature generate neighbour's candidate set, in turn The distance between any two sample in each neighbour's candidate set is calculated by processing module, and distance is met into preset requirement Sample to as neighbor search as a result, hereby it is achieved that effective search of higher-dimension neighbour couple, meets the search need of user, And the system is simple, it is easy to accomplish.
According to one embodiment of present invention, first generation module, is used for: by projection matrix R(k×d)By sample Vector is mapped to object vector from original vector, and when the value of the object vector is not less than zero, in the corresponding of sample signature Position assigns 1, and when the value of the object vector is less than zero, assigns 0 in the corresponding position of sample signature, wherein d be it is original to The dimension of amount, k are the dimension of object vector, d > k.
According to one embodiment of present invention, second generation module executes following steps: S21, and construction depth is N's Binary tree, and the sample signature is stored in the leaf node of the binary tree, wherein from the root node of the binary tree to The path of leaf node is corresponding with the numerical value of N-dimensional before the sample signature, and top N signs identical Sample preservation in the same leaf segment In point, N is less than the length M of the sample signature;S22, when the number of signatures of leaf node is greater than the first preset value T, by N+1 Two different cotyledon nodes that the different sample signature of bit value is divided under the leaf node;S23 carries out beta pruning to tree, by N The leaf node of layer is wiped out;S24 repeats step S22 and S23, until there is no the leaf nodes for needing cutting.
According to one embodiment of present invention, distance is met the sample of preset requirement to as neighbour by the processing module When search result, be specifically used for: respectively by the sample in calculated same neighbour's candidate set to the distance between arrange Sequence, and first K are obtained apart from lesser sample pair;The K sample in different neighbour's candidate sets that will acquire to being ranked up, And by preceding K apart from lesser sample to as the neighbor search result.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the higher-dimension neighbour of the embodiment of the present invention to searching method;
Fig. 2 is the flow chart according to the higher-dimension neighbour of the embodiment of the present invention to step S2 in searching method;
Fig. 3 is the schematic diagram according to the binary tree of an example of the present invention;
Fig. 4 is the flow chart according to the higher-dimension neighbour of the embodiment of the present invention to step S3 in searching method;
Fig. 5 is one and executes higher-dimension neighbour of the invention to the schematic diagram of the system of searching method;And
Fig. 6 is the signal according to the higher-dimension neighbour of the embodiment of the present invention to search system.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the higher-dimension neighbour of the embodiment of the present invention is described to searching method and system.
Fig. 1 is flow chart of the higher-dimension neighbour to searching method of the embodiment of the present invention.As shown in Figure 1, higher-dimension neighbour couple Searching method, comprising the following steps:
S1, according to the corresponding sample signature of the numerical generation of sample vector.
In one embodiment of the invention, sample signature is binary set.
Specifically, projection matrix R can be first passed through(k×d)Sample vector is mapped to object vector from original vector, and in mesh When marking the value of vector not less than zero, 1 is assigned in the corresponding position of sample signature, and when the value of object vector is less than zero, in sample Assign 0 in the corresponding position of this signature, wherein d is the dimension of original vector (higher-dimension), and k is the dimension of object vector (low-dimensional), d > k。
Optionally, projection matrix R(k×d)It can be generated at random by Gaussian Profile N (0,1/k).
S2 generates neighbour's candidate set according to sample signature.
Specifically, neighbour's candidate set, the closer sample of value can be generated according to the value of each position of sample signature Signature can be divided into one close to candidate set, for example, sample signature is respectively 00001111,00001010,00001110, 00010010,00011100,00011101 sample can wait 00001111,00001010,00001110 point for a neighbour Group is selected, is another neighbour's candidate set by 00010010,00011100,00011101 point.
S3 calculates the distance between any two sample in each neighbour's candidate set, distance is met preset requirement Sample is to as neighbor search result.
Specifically, if generating a neighbour's candidate set, each neighbour's candidate set includes b sample, then can calculate each neighbour The distance between any two sample (such as Euclidean distance) and by sorting from small to large in candidate set selects to sort in a group respectively Preceding K sample pair,It is ranked up from small to large again, selection sorts preceding K sample to get to closely at this time Adjacent search result.
In one embodiment of the invention, as shown in Fig. 2, this step S2 can further comprise following steps:
S21, construction depth are the binary tree of N, and sample signature is stored in the leaf node of binary tree, wherein from two The path for pitching root node to the leaf node of tree is corresponding with the numerical value of N-dimensional before sample signature, and the identical Sample preservation of top N signature exists In the same leaf node, N is less than the length M of sample signature.
S22, when the sample signature quantity in leaf node is greater than the first preset value T, by the different sample of N+1 bit value Two different cotyledon nodes that signature is divided under the leaf node.
For example, referring to Fig. 3, if desired by leaf node C carry out cutting, current depth 2, due to the position N+1=3 be 0, Signature 110 can be divided into left sibling, and since the 3rd position is 1, signature 111 can enter right node, and the depth of tree is changed to 3, i.e. N+1.
S23 carries out beta pruning to tree, the leaf node of n-th layer is wiped out.
Specifically, it in the dicing process of step S22, may only some leaf node be split.It is not split Leaf node in, the same prefix length of signature fewer than the node being split 1, i.e. sample in these nodes are not as good as being split leaf Sample in node is closer to each other.At this point, the leaf node for the n-th layer not being split can be wiped out, so as to retain more Close sample set.
S24 repeats step S22 and S23, until there is no the leaf nodes for needing cutting.
In an embodiment of the invention, as shown in figure 4, above-mentioned steps S3 the following steps are included:
S31, respectively by the sample in calculated same neighbour's candidate set to the distance between be ranked up, and obtain Preceding K apart from lesser sample pair.
S32, the K sample in different neighbour's candidate sets that will acquire is to being ranked up, and by preceding K apart from lesser Sample is to as neighbor search result.
It should be noted that the length M of sample signature need to be longer than finally dividing the depth N of completion binary tree.If sample Originally neighbour's candidate set is concentrated too much on, algorithm iteration is will lead to and spends too much of his time, at this point it is possible to before neighbor search Sample vector is normalized.
T is the threshold value for deciding whether cutting leaf node, if T value is larger, the number of repetition of step S22 and S23 are got over It is few, while neighbour's candidate set generated is also larger (i.e. neighbour's candidate set quantity is more), and due to the limitation of computing resource, mistake Big neighbour's candidate set can make the more difficult progress of subsequent Processing Algorithm.If T value is smaller, the number of repetition of S22 and S23 are got over It is more, it will to consume more times.
The starting point depth of binary tree when N is neighbor search.If N is too small, neighbour's time can just be obtained by needing iteration repeatedly Select group.If N is excessive, the precision of neighbor search will receive influence.
Therefore, it can be directed to different data distributions, by testing the value of adjusting parameter M, N and T, to guarantee the number of iterations Should be in preset value, such as 9 times, 10 times, 11 times.
In addition, it should also be noted that, above-mentioned accidental projection can incite somebody to action according to Johnson-Linden Strauss lemma High dimension vector projects in low-dimensional vector, and can retain the location information of sample, is based on the assumption that realizing:
1) assume that sample signature can retain the location information of sample, to retain the range information between sample indirectly, i.e., Distance between sample between sample signature at a distance from it is approximate or approximate for most samples.The sample signature of generation should meet: If the distance between two samples is closer, there are more positional values equal in the two sample signatures.
If 2) sample in neighbour's candidate set is more, the sample in this neighbour's candidate set is more likely to connect two-by-two Closely.
For example, the higher-dimension neighbour of above-described embodiment can realize searching method by system shown in fig. 5.Such as Fig. 5 Shown, which includes: network interface, for being connected to the communication network of internet or other forms, to obtain sample vector; Input equipment, for input signal of collection system user, including parameter M, T, N, K etc.;Hard disk, for saving shaped like user The information of log;Central processing unit executes above-mentioned higher-dimension neighbour to the corresponding program of searching method for running program;It deposits Storage unit, temporary variable when for save routine execution, such as the number of iterations;Display is used to show phase to system user The information of pass, i.e. neighbor search result.
To sum up, higher-dimension neighbour according to an embodiment of the present invention is to searching method, first according to the numerical generation of sample vector Then corresponding sample signature generates neighbour's candidate set according to sample signature, and then calculate any in each neighbour's candidate set The distance between two samples, and by distance meet the sample of preset requirement to as neighbor search as a result, hereby it is achieved that high The effective search for tieing up neighbour couple, meets the search need of user, and this method is simple, it is easy to accomplish.
Further, the invention proposes a kind of non-transitorycomputer readable storage medium, it is stored thereon with computer Program, the program realize above-mentioned higher-dimension neighbour to searching method when being executed by processor.
The non-transitorycomputer readable storage medium of the embodiment of the present invention storing with above-mentioned higher-dimension by executing thereon Neighbour can be realized effective search of higher-dimension neighbour couple, meet the search need of user to the corresponding program of searching method.
Further, the invention proposes a kind of computer program product, when the instruction in computer program product by When managing device execution, above-mentioned higher-dimension neighbour is executed to searching method.
The computer program product of the embodiment of the present invention, it is wherein corresponding to searching method with above-mentioned higher-dimension neighbour by executing Program, can be realized effective search of higher-dimension neighbour couple, meet the search need of user.
Fig. 6 is structural schematic diagram of higher-dimension of the embodiment of the present invention neighbour to search system.As shown in fig. 6, higher-dimension neighbour It include: the first generation module 110, the second generation module 120 and processing module 130 to search system 100.
Wherein, the first generation module 110 is used for the corresponding sample signature of numerical generation according to sample vector.Second generates Module 120 is used to generate neighbour's candidate set according to sample signature.Processing module 130 is used to calculate appointing in each neighbour's candidate set It anticipates the distance between two samples, distance is met into the sample of preset requirement to as neighbor search result.
In one embodiment of the invention, the first generation module 110 is used to pass through projection matrix R(k×d)By sample vector It is mapped to object vector from original vector, and when the value of object vector is not less than zero, assigns 1 in the corresponding position of sample signature, And when the value of object vector is less than zero, 0 is assigned in the corresponding position of sample signature, wherein d is the dimension of original vector, and k is The dimension of object vector, d > k.
In one embodiment of the invention, the second generation module 120 executes following steps:
S21, construction depth are the binary tree of N, and sample signature is stored in the leaf node of binary tree, wherein from two The path for pitching root node to the leaf node of tree is corresponding with the numerical value of N-dimensional before sample signature, and the identical Sample preservation of top N signature exists In the same leaf node, N is less than the length M of sample signature;
S22, when the number of signatures of leaf node is greater than the first preset value T, by the different sample signature of N+1 bit value point Enter two different cotyledon nodes under the leaf node;
S23 carries out beta pruning to tree, the leaf node of n-th layer is wiped out;
S24 repeats step S22 and S23, until there is no the leaf nodes for needing cutting.
In one embodiment of the invention, distance is met the sample of preset requirement to as neighbour by processing module 130 When search result, specifically for respectively by the sample in calculated same neighbour's candidate set to the distance between arrange Sequence, and first K are obtained apart from lesser sample pair;The K sample in different neighbour's candidate sets that will acquire to being ranked up, And by preceding K apart from lesser sample to as neighbor search result.
It should be noted that aforementioned be also applied for the embodiment to the explanation of searching method embodiment to higher-dimension neighbour Higher-dimension neighbour to search system, details are not described herein again.
Higher-dimension neighbour according to an embodiment of the present invention is to search system, first by the first generation module according to sample vector The corresponding sample signature of numerical generation, then by the second generation module according to sample signature generate neighbour's candidate set, in turn The distance between any two sample in each neighbour's candidate set is calculated by processing module, and distance is met into preset requirement Sample to as neighbor search as a result, hereby it is achieved that effective search of higher-dimension neighbour couple, meets the search need of user, And the system is simple, it is easy to accomplish.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention Type.

Claims (12)

1. a kind of higher-dimension neighbour is to searching method, which comprises the following steps:
According to the corresponding sample signature of the numerical generation of sample vector;
Neighbour's candidate set is generated according to the sample signature;
The distance between any two sample in each neighbour's candidate set is calculated, distance is met into the sample of preset requirement to work For neighbor search result.
2. higher-dimension neighbour as described in claim 1 is to searching method, which is characterized in that the sample signature is binary set.
3. higher-dimension neighbour as claimed in claim 2 is to searching method, which is characterized in that described raw according to the numerical value of sample vector At sample signature, comprising:
Pass through projection matrix R(k×d)Sample vector is mapped to object vector from original vector, wherein d is the dimension of original vector Degree, k are the dimensions of object vector, and d is greater than k;
If the value of the object vector is not less than zero, 1 is assigned in the corresponding position of sample signature;
If the value of the object vector assigns 0 less than zero, in the corresponding position of sample signature.
4. higher-dimension neighbour as claimed in claim 3 is to searching method, which is characterized in that the projection matrix R(k×d)By Gauss point Cloth N (0,1/k) is generated at random.
5. higher-dimension neighbour as claimed in claim 3 is to searching method, which is characterized in that described to be generated according to the sample signature Neighbour's candidate set, comprising:
S21, construction depth are the binary tree of N, and the sample signature is stored in the leaf node of the binary tree, wherein Corresponding with the numerical value of N-dimensional before the sample signature from path of the root node of the binary tree to leaf node, top N signature is identical Sample preservation in the same leaf node, N be less than the sample signature length M;
S22, when the sample signature quantity in leaf node is greater than the first preset value T, by the different sample signature of N+1 bit value The different cotyledon node of two be divided under the leaf node;
S23 carries out beta pruning to tree, the leaf node of n-th layer is wiped out;
S24 repeats step S22 and S23, until there is no the leaf nodes for needing cutting.
6. higher-dimension neighbour as described in claim 1 is to searching method, which is characterized in that described that distance is met preset requirement Sample is to as neighbor search result, comprising:
Respectively by the sample in calculated same neighbour's candidate set to the distance between be ranked up, and before obtaining K away from From lesser sample pair;
The K sample in different neighbour's candidate sets that will acquire to being ranked up, and by preceding K apart from lesser sample to work For the neighbor search result.
7. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program quilt It is realized when processor executes if higher-dimension neighbour of any of claims 1-6 is to searching method.
8. a kind of computer program product is executed when the instruction in the computer program product is executed by processor as weighed Benefit requires higher-dimension neighbour described in any one of 1-6 to searching method.
9. a kind of higher-dimension neighbour is to search system characterized by comprising
First generation module, for the corresponding sample signature of numerical generation according to sample vector;
Second generation module, for generating neighbour's candidate set according to the sample signature;
Processing module will be default apart from meeting for calculating the distance between any two sample in each neighbour's candidate set It is required that sample to as neighbor search result.
10. higher-dimension neighbour as claimed in claim 9 is to search system, which is characterized in that first generation module is used for:
Pass through projection matrix R(k×d)Sample vector is mapped to object vector from original vector, and the value of the object vector not When less than zero, 1 is assigned in the corresponding position of sample signature, and when the value of the object vector is less than zero, in sample signature Assign 0 in corresponding position, wherein d is the dimension of original vector, and k is the dimension of object vector, d > k.
11. higher-dimension neighbour as claimed in claim 10 is to search system, which is characterized in that second generation module executes such as Lower step:
S21, construction depth are the binary tree of N, and the sample signature is stored in the leaf node of the binary tree, wherein Corresponding with the numerical value of N-dimensional before the sample signature from path of the root node of the binary tree to leaf node, top N signature is identical Sample preservation in the same leaf node, N be less than the sample signature length M;
The different sample signature of N+1 bit value is divided into this when the number of signatures of leaf node is greater than the first preset value T by S22 Two different cotyledon nodes under leaf node;
S23 carries out beta pruning to tree, the leaf node of n-th layer is wiped out;
S24 repeats step S22 and S23, until there is no the leaf nodes for needing cutting.
12. higher-dimension neighbour as claimed in claim 9 is to search system, which is characterized in that the processing module will be apart from satisfaction When the sample of preset requirement is to as neighbor search result, it is specifically used for:
Respectively by the sample in calculated same neighbour's candidate set to the distance between be ranked up, and before obtaining K away from From lesser sample pair;
The K sample in different neighbour's candidate sets that will acquire to being ranked up, and by preceding K apart from lesser sample to work For the neighbor search result.
CN201810179962.6A 2018-03-05 2018-03-05 High-dimensional neighbor pair searching method and system Active CN110309139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810179962.6A CN110309139B (en) 2018-03-05 2018-03-05 High-dimensional neighbor pair searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810179962.6A CN110309139B (en) 2018-03-05 2018-03-05 High-dimensional neighbor pair searching method and system

Publications (2)

Publication Number Publication Date
CN110309139A true CN110309139A (en) 2019-10-08
CN110309139B CN110309139B (en) 2024-02-13

Family

ID=68073598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810179962.6A Active CN110309139B (en) 2018-03-05 2018-03-05 High-dimensional neighbor pair searching method and system

Country Status (1)

Country Link
CN (1) CN110309139B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308122A (en) * 2020-10-20 2021-02-02 中国刑事警察学院 High-dimensional vector space sample fast searching method and device based on double trees

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1477563A (en) * 2003-07-03 2004-02-25 复旦大学 High-dimensional vector data quick similar search method
CN101334786A (en) * 2008-08-01 2008-12-31 浙江大学 Formulae neighborhood based data dimensionality reduction method
CN101556601A (en) * 2009-03-12 2009-10-14 华为技术有限公司 Method and device for searching k neighbor
US20100232701A1 (en) * 2009-03-12 2010-09-16 Siemens Product Lifecycle Management Software Inc. System and method for identifying wall faces in an object model
CN103377237A (en) * 2012-04-27 2013-10-30 常州市图佳网络科技有限公司 High dimensional data neighbor search method and fast approximate image search method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1477563A (en) * 2003-07-03 2004-02-25 复旦大学 High-dimensional vector data quick similar search method
CN101334786A (en) * 2008-08-01 2008-12-31 浙江大学 Formulae neighborhood based data dimensionality reduction method
CN101556601A (en) * 2009-03-12 2009-10-14 华为技术有限公司 Method and device for searching k neighbor
US20100232701A1 (en) * 2009-03-12 2010-09-16 Siemens Product Lifecycle Management Software Inc. System and method for identifying wall faces in an object model
CN103377237A (en) * 2012-04-27 2013-10-30 常州市图佳网络科技有限公司 High dimensional data neighbor search method and fast approximate image search method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308122A (en) * 2020-10-20 2021-02-02 中国刑事警察学院 High-dimensional vector space sample fast searching method and device based on double trees
CN112308122B (en) * 2020-10-20 2024-03-01 中国刑事警察学院 High-dimensional vector space sample rapid searching method and device based on double trees

Also Published As

Publication number Publication date
CN110309139B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
Munagala et al. I/O-complexity of graph algorithms
Goyal et al. A comprehensive approach to operation sequence similarity based part family formation in the reconfigurable manufacturing system
Zhu et al. Incremental and accuracy-aware personalized pagerank through scheduled approximation
KR20090048624A (en) Dynamic fragment mapping
CN111316296A (en) Structure of learning level extraction model
CN109992590B (en) Approximate space keyword query method and system with digital attributes in traffic network
Ou et al. Non-transitive hashing with latent similarity components
CN105159925B (en) A kind of data-base cluster data distributing method and system
CN108764307A (en) The density peaks clustering method of natural arest neighbors optimization
KR102039244B1 (en) Data clustering method using firefly algorithm and the system thereof
CN110309139A (en) Higher-dimension neighbour is to searching method and system
JP6705764B2 (en) Generation device, generation method, and generation program
Petra et al. On efficient Hessian computation using the edge pushing algorithm in Julia
Balaji et al. Distributed graph path queries using spark
CN107203554A (en) A kind of distributed search method and device
JP2014010828A (en) Data processing method, data query method in database, corresponding device, and data query device
US7805667B2 (en) System and method for identifying target node graphs from predetermined seed node subsets
CN104462503A (en) Method for determining similarity between data points
Yuan et al. Boundary-connection deletion strategy based method for community detection in complex networks
CN109739367A (en) Candidate word list generation method and device
JP6624062B2 (en) Information processing apparatus, information processing method, and program
Abdolazimi et al. Connected components of big graphs in fixed mapreduce rounds
CN105354243B (en) The frequent probability subgraph search method of parallelization based on merger cluster
Jánošová et al. Organizing Similarity Spaces Using Metric Hulls
Burdzy et al. Twin peaks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant