CN107122370A - A kind of distributed search method and device - Google Patents
A kind of distributed search method and device Download PDFInfo
- Publication number
- CN107122370A CN107122370A CN201610105198.9A CN201610105198A CN107122370A CN 107122370 A CN107122370 A CN 107122370A CN 201610105198 A CN201610105198 A CN 201610105198A CN 107122370 A CN107122370 A CN 107122370A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- fingerprint
- mark
- server
- retrieval request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of distributed search method and apparatus;Methods described includes:Receive after retrieval request, according to the fingerprint of the information to be retrieved carried in the retrieval request, similar fingerprints are searched in the fingerprint preserved;For each similar fingerprints found, proceed as follows respectively:The segmentation of the similar fingerprints and the segmentation of the fingerprint of the information to be retrieved are corresponding in turn to according to predefined procedure and compared, stops comparing after first identical segmentation is found;The mark carried in the mark of the identical segmentation and the retrieval request is compared, the similar fingerprints are if the same included in count results;Wherein, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;Return to the count results.The application can solve to carry out in a distributed system to count during analog information identification it is unstable, the problem of need extra counting deduplication operation.
Description
Technical field
The present invention relates to information retrieval field, more particularly to a kind of distributed search method and device.
Background technology
Analog information identification technology is widely used at present, and a kind of typical application scenarios are in magnanimity information
The presence of analog information is detected, for example, removing duplicate webpages are carried out in search engine crawler system;Another allusion quotation
Type application scenarios are for detecting the frequency that analog information occurs, such as being carried out in anti-spam system similar
The detection of number of mail.
SIMHASH is a kind of relatively conventional duplicate message recognizer, and SIMHASH can be by text
The information such as shelves are converted into the byte of one 64, herein referred as fingerprint;If two information are calculated
Fingerprint Hamming distances<N (rule of thumb the general values of this n are 3), is considered as two information phases
Seemingly;Wherein, Hamming distances refer to that the corresponding bit value of two bytes (such as described fingerprint) is different
Number of bits.Such as fingerprint FP1 and fingerprint FP2 are except the bit value on the 27th, the 55th
Difference, the bit value on other 62 is all identical, then it is assumed that fingerprint FP1 is FP2 similar fingerprints,
FP2 is also FP1 similar fingerprints.
At present, there are following two schemes using SIMHASH algorithms in a distributed system:
The first scheme:In storage information or retrieval information, letter to be stored is calculated using SIMHASH
64 fingerprints of breath or information to be retrieved;A server is chosen from multiple servers at random;Will meter
The fingerprint calculated is sent to selected server;Server is received after fingerprint, uses SIMHASH units
Scheme stores the fingerprint of the information to be stored, or retrieves the similar finger of the fingerprint of the information to be retrieved
Line is simultaneously counted to similar fingerprints.
The major defect of the first scheme is:The counting of similar fingerprints is inaccurate, will be to be stored during storage
The fingerprint random distribution of information is on a certain server in N platform servers, therefore retrieval result is
The count value of similar fingerprints in one server.The counting of similar fingerprints obtained by so is unstable,
Retrieval can obtain different retrieval results on different server.
Second scheme:Fingerprint is stored on multiple servers;Sent in retrieval to multiple servers
The fingerprint of information to be retrieved;Each server retrieves the similar fingerprints and meter of the fingerprint of information to be retrieved respectively
Number;Completed because storage and retrieval are distributed to multiple servers, it is therefore possible to duplicate counting
Problem;Fingerprint FP1 above is all preserved in such as two servers, if the two servers are all received
To the fingerprint FP2 of information to be retrieved, then similar fingerprints FP1 can all be counted.In order to be counted
In duplicate removal, second scheme, the server retrieved needs similar fingerprints and count value returning to hair
The client of retrieval is played, duplicate removal is carried out by collecting, comparing similar fingerprints by client.
The major defect of such scheme is:Extra deduplication operation is needed, processing speed is slow.If no
It is to repeat with the similar fingerprints that server is found, different server is sent to the similar finger of client identical
Line will waste network traffics.
The content of the invention
This application provides a kind of distributed search method and device, it can solve to enter in a distributed system
Row analog information counted when recognizing it is unstable, the problem of need extra counting deduplication operation.
The application is adopted the following technical scheme that.
A kind of distributed search method, applied to server, including:
Receive after retrieval request, according to the fingerprint of the information to be retrieved carried in the retrieval request, in institute
Similar fingerprints are searched in the fingerprint of preservation;
For each similar fingerprints found, proceed as follows respectively:By the segmentation of the similar fingerprints
Segmentation with the fingerprint of the information to be retrieved is corresponding in turn to according to predefined procedure compares, when finding first
Stop comparing after identical segmentation;By in the mark of the identical segmentation and the retrieval request
The mark of carrying is compared, and the similar fingerprints are if the same included in count results;Wherein, divide
The division of section and the mark being each segmented are determined according to the first pre-defined rule;
Return to the count results.
Alternatively, described method also includes:
If the mark of the identical segmentation is different with the mark carried in the retrieval request,
Corresponding similar fingerprints are included not in the count results.
Alternatively, the sequence number for being designated the segmentation of the segmentation.
Alternatively, the fingerprint according to the information to be retrieved carried in the retrieval request, is being preserved
Fingerprint in search similar fingerprints include:
According to the first pre-defined rule, in the fingerprint of the information to be retrieved carried from the retrieval request,
Obtain the corresponding segmentation of mark that the retrieval request is carried;
In the segmentation preserved, search and the acquired identical segmentation of segmentation;Wherein, protected
A kind of fingerprint that each segmentation deposited at least belongs in preserved fingerprint;
In the fingerprint preserved, the fingerprint corresponding to found segmentation is filtered out, by what is filtered out
Fingerprint of the fingerprint respectively with the information to be retrieved is compared, and finds out similar fingerprints.
Alternatively, it is described in the fingerprint preserved, filter out the fingerprint corresponding to found segmentation
Including:
The value of segmentation to be found is used as the corresponding key assignments of key name lookup;Corresponding to the value of one segmentation
Key assignments be all fingerprints for including the segmentation in the fingerprint that is preserved.
Alternatively, it is described in the segmentation preserved, search and the acquired identical segmentation of segmentation
Including:
In preserved, identifying as in the segmentation of index using retrieval request carrying, search and institute
The identical segmentation of the segmentation of acquisition.
A kind of distributed search method, applied to client, including:
Determine each to be segmented corresponding server respectively according to the second pre-defined rule;
Respectively retrieval request is sent to the corresponding server of each segmentation;Carried in the retrieval request described
The fingerprint of information to be retrieved, and the segmentation corresponding to the server mark;Wherein, the division of segmentation
And the mark being each segmented is determined according to the first pre-defined rule;
Server is added for the count results that the retrieval request is returned, retrieval result is obtained.
Alternatively, the sequence number for being designated the segmentation of the segmentation.
Alternatively, it is described to determine that being each segmented corresponding server includes respectively according to the second pre-defined rule:
The fingerprint of the information to be retrieved is divided into K segmentation according to first pre-defined rule;
Hash operation is carried out to the number of server using the value being each segmented respectively, it is true according to operation result
The corresponding server of the fixed segmentation.
A kind of distributed search device, is arranged at server, including:
Similar fingerprints searching modul, for receiving after retrieval request, according to what is carried in the retrieval request
The fingerprint of information to be retrieved, similar fingerprints are searched in the fingerprint preserved;
Counting module, for each similar fingerprints for finding, is proceeded as follows respectively:Should
The segmentation of similar fingerprints and the segmentation of the fingerprint of the information to be retrieved are corresponding in turn to ratio according to predefined procedure
Compared with stopping comparing after first identical segmentation is found;By the mark of the identical segmentation
It is compared with the mark carried in the retrieval request, the phase is if the same included in count results
Like fingerprint;Wherein, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
Respond module, for returning to the count results.
Alternatively, the counting module is additionally operable to the mark when the identical segmentation and the retrieval
When the mark carried in request is different, corresponding similar fingerprints are included not in the count results.
Alternatively, the sequence number for being designated the segmentation of the segmentation.
Alternatively, the similar fingerprints searching modul includes:
Acquiring unit, for according to the first pre-defined rule, from the described to be retrieved of retrieval request carrying
In the fingerprint of information, the corresponding segmentation of mark that the retrieval request is carried is obtained;
Comparing unit is segmented, in the segmentation preserved, searching identical with acquired segmentation
Segmentation;Wherein, a kind of fingerprint that each segmentation preserved at least belongs in preserved fingerprint;
Fingerprint comparing unit, in the fingerprint preserved, filtering out corresponding to found segmentation
Fingerprint, the fingerprint by the fingerprint filtered out respectively with the information to be retrieved is compared, finds out similar
Fingerprint.
Alternatively, the fingerprint comparing unit filters out found segmentation in the fingerprint preserved
Corresponding fingerprint includes:
The value of segmentation of the fingerprint comparing unit to be found is used as the corresponding key assignments of key name lookup;One
Key assignments corresponding to the value of individual segmentation is all fingerprints for including the segmentation in the fingerprint that is preserved.
Alternatively, the segmentation comparing unit is searched complete with acquired segmentation in the segmentation preserved
Exactly the same segmentation includes:
It is described segmentation comparing unit it is being preserved, using the retrieval request carry mark as index
In segmentation, search and the acquired identical segmentation of segmentation.
A kind of distributed search device, is arranged at client, including:
Determining module, for determining each to be segmented corresponding server respectively according to the second pre-defined rule;
Request module, for sending retrieval request to the corresponding server of each segmentation respectively;The retrieval
Carry the fingerprint of the information to be retrieved in request, and the segmentation corresponding to the server mark;Its
In, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
Computing module, for server to be added for the count results that the retrieval request is returned, is obtained
Retrieval result.
Alternatively, the sequence number for being designated the segmentation of the segmentation.
Alternatively, the determining module includes:
Division unit, for the fingerprint of the information to be retrieved to be divided into according to first pre-defined rule
K segmentation;
Hash operation unit, for carrying out Hash fortune to the number of server using the value being each segmented respectively
Calculate, the corresponding server of the segmentation is determined according to operation result.
The application includes advantages below:.
In at least one alternative of the application, retrieval can be carried out in multiple servers, therefore retrieval knot
Fruit is more comprehensive, and situation about being retrieved with respect to single server, retrieval result is relatively stable and accurate.
In at least one alternative of the application, server has been carried out when being counted to similar fingerprints
Handle again;Therefore server can only return to count results to client, and client returns Servers-all
The count results summation returned can obtain retrieval result, and processing speed is fast, and saves network traffics.
Even if client requirements obtain the specific data of similar fingerprints, because duplicate removal has been carried out in server, because
This can be avoided repeating to send same similar fingerprints, decrease unnecessary network traffics.
In at least one alternative of the application, fingerprint is divided into after multiple segmentations the different services that are mapped to
Stored on device;It can be first passed through when so retrieving and compare segmentation progress preliminary screening, then according to sieve
Select result to retrieve similar fingerprints again, accelerate processing speed.In a kind of embodiment of the alternative,
Server can also only compare and be asked with retrieving when preserving segmentation to identify as index in retrieval
The entrained segmentation identified as index is asked, number of comparisons is reduced, has further speeded up processing speed.
Certainly, implementing any product of the application must be not necessarily required to while reaching all the above excellent
Point.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the distributed search method of embodiment one;
Fig. 2 is the schematic flow sheet of the distributed search method of embodiment two;
Fig. 3 is the schematic diagram of the distributed search device of embodiment three;
Fig. 4 is the schematic diagram of determining module in embodiment three;
Fig. 5 is the schematic diagram of the distributed search device of example IV;
Fig. 6 is the schematic diagram of module in example IV.
Embodiment
The technical scheme of the application is described in detail below in conjunction with drawings and Examples.
If it should be noted that not conflicting, each feature in the embodiment of the present application and embodiment can
To be combined with each other, within the protection domain of the application.Patrolled in addition, though showing in flow charts
Volume order, but in some cases, can be shown or described to be performed different from order herein
Step.
In a typical configuration, the computing device of client or Verification System may include one or more
Processor (CPU), input/output interface, network interface and internal memory (memory).
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory
And/or the form, such as read-only storage (ROM) or flash memory (flash such as Nonvolatile memory (RAM)
RAM).Internal memory is the example of computer-readable medium.Internal memory potentially includes module 1, module 2 ... ...,
Module N (N is the integer more than 2).
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by
Any method or technique come realize information store.Information can be computer-readable instruction, data structure,
The module of program or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory
(PRAM), static RAM (SRAM), dynamic random access memory (DRAM),
Other kinds of random access memory (RAM), read-only storage (ROM), electrically erasable
Read-only storage (EEPROM), fast flash memory bank or the read-only storage of other memory techniques, read-only optical disc
Device (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, magnetic
The storage of band magnetic rigid disk or other magnetic storage apparatus or any other non-transmission medium, can be with available for storage
The information being accessed by a computing device.Defined according to herein, computer-readable medium does not include non-temporary
The data-signal and carrier wave of computer readable media (transitory media), such as modulation.
Embodiment one, a kind of distributed search method, applied to server, as shown in figure 1, including step
Rapid S110~S130.
S110, receive after retrieval request, according to the fingerprint of the information to be retrieved carried in the retrieval request,
Similar fingerprints are searched in the fingerprint preserved;
S120, each similar fingerprints for finding, are proceeded as follows respectively:By the similar fingerprints
Segmentation and the segmentation of fingerprint of the information to be retrieved be corresponding in turn to and compare according to predefined procedure, when finding
Stop comparing after first identical segmentation;By the mark of the identical segmentation and the retrieval
The mark carried in request is compared, and the similar fingerprints are if the same included in count results;Its
In, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
S130, the return count results.
In the present embodiment, identical each bit for referring to the two segmentations of two segmentations all corresponds to phase
Together, such as first is all that " 1 ", second are all " 0 ", by that analogy;That is, this two
The value of individual segmentation is identical.
In the present embodiment, it can also include in step S120:If the mark of the identical segmentation
Knowledge is different with the mark carried in the retrieval request, then can not be in the count results comprising corresponding
Similar fingerprints, such as directly ignore the similar fingerprints.
In the present embodiment, the count results returned can be arranged to difference according to conditions such as scene, demands
, such as there are following two implications in implication:
The first implication, count results represent all similar fingerprints being included in count results in the clothes
The summation for the count value being engaged in device;
Because same fingerprint is possible to multiple appearance, such as in the scene of spam, it is assumed that an envelope
The mail of identical content is repeatedly sent, then may occur repeatedly to store the fingerprint of the mail to arrive same
Situation in server (i.e. a kind of this fingerprint has multiple).In order to save memory space, server will not
Identical fingerprint is stored repeatedly, but is only stored once for a kind of fingerprint, then when storing this kind of fingerprint
Only increase the count value of the fingerprint;For the every kind of fingerprint preserved in server, it also saving corresponding
Count value (equivalent to number of every kind of fingerprint on book server or storage number of times);Fingerprint and its meter
Numerical value can be, but not limited to save as the form of fingerprint table.
It is described to be comprising the similar fingerprints in count results when count results are the first implication
Refer to:Count results are added to the count value (can be, but not limited to find from fingerprint table) of the similar fingerprints.
Not including the similar fingerprints in count results can refer to:Do not increase count results.Correspondingly, to certain
A kind of fingerprint carries out counting and refers to count results adding the corresponding count value of this kind of fingerprint.
Second of implication, count results represent the species of all similar fingerprints being included in count results
Number;There are five kinds of fingerprints in the server such as retrieved to fingerprint FP1:FP2、FP3、FP4、
FP5, FP6;Assuming that what is carried in retrieval request is designated 1, FP2, FP3, FP4, FP5, FP6
Mark with first identical segmentation of FP1 is 1, then count results are 5;Assuming that retrieval please
That asks middle carrying is designated 2, and only FP3 is designated 2 with first identical segmentation of FP1,
Then count results are 1;By that analogy.
It is described to be comprising the similar fingerprints in count results when count results are second of implication
Refer to:Count results plus 1.Not including the similar fingerprints in count results can refer to:Meter is not increased
Number result.Correspondingly, counting is carried out to a certain fingerprint to refer to count results adding 1.
In above two implication, " same fingerprint " refers to that each all corresponds to identical fingerprint;Content
The fingerprint that identical information is calculated using same algorithm just belongs to same fingerprint.It can be seen that, the
A kind of count results statistics of implication be duplicate removal after, the number of similar fingerprints in server contains for second
The count results statistics of justice be duplicate removal after, in server similar fingerprints species number.
In the present embodiment, if server receives the retrieval request that two or more carry identical fingerprints
The mark of carrying (different) or receive when two or more marks are carried in a retrieval request,
Can be with merging treatment.Such as one retrieval request carries fingerprint FP2, carries two marks, described to merge
Processing is specially:In step S210, similar fingerprints are searched according to fingerprint FP2;In step S220
In, it is corresponding in turn to when comparing segmentation, as long as in a similar fingerprints, and the identical segmentations of FP2
Mark is identical with any one mark carried in the retrieval request for carrying FP2, then is included in count results
The similar fingerprints, and if carry any one mark carried in FP2 retrieval request it is all different if do not exist
The similar fingerprints are included in count results.Two retrieval requests all carry fingerprint FP2, but carry different marks
The situation of knowledge is similar.
It is of course also possible to carry out above-mentioned steps respectively for the different retrieval requests for carrying identical fingerprints
S210~S230, or the retrieval request for carrying two or more marks is considered as two retrieval requests
Above-mentioned steps S210~S230 is carried out respectively.
In a kind of alternative of the present embodiment, first pre-defined rule can serve to indicate that the fingerprint
Number of fragments K and the mark that is each segmented;Wherein, K is more than or equal to 2, is less than or equal to
The integer of the digit of fingerprint, for the fingerprint of 64, K is preferably 4.First pre-defined rule can
To be pre-stored in the client, the first pre-defined rule used in each client and server is consistent
's.
In other alternatives, first pre-defined rule can also be other forms, such as by indicating
The start bit being each segmented, indicates segments K indirectly;For another example by the naming rule of given mark,
To indicate the mark being each segmented.
In a kind of alternative of the present embodiment, first pre-defined rule may be used to indicate that division
Mode, such as divide equally or specify the length being each segmented.In one example, fingerprint is 64, described
First pre-defined rule is that fingerprint is divided into 4 segmentations, each segmentation 16.In another example,
Fingerprint can also be divided into the segmentation of Length discrepancy.In other alternatives, respectively (this can also be defaulted as
The result that Shi Zhiwen digit divided by segments K are obtained needs to be integer), without again with described
First pre-defined rule indicates dividing mode.
In the present embodiment, belong to same analog information searching system or same analog information retrieval is provided
First pre-defined rule used in multiple servers of service is identical.
In a kind of alternative of the present embodiment, the mark of the segmentation can be, but not limited to the sequence for segmentation
Number.The fingerprint of 64 is such as divided into 4 segmentations, then the mark of this 4 segmentations from left to right may be used
To be followed successively by 1,2,3,4.The mark of the segmentation can also be set to it according to the first pre-defined rule
It is numbered or sequence, in a fingerprint, and segmentation and mark are corresponded;Than such as above-mentioned 4 points
The mark of section can also from left to right be followed successively by a, b, c, d.
In a kind of alternative of the present embodiment, the algorithm for searching similar fingerprints can be, but not limited to use
SIMHASH algorithms, are judged whether similar by Hamming distances;In other alternatives, also may be used
To be searched using other analog information recognizers.
It is described according to the letter to be retrieved carried in the retrieval request in a kind of alternative of the present embodiment
The fingerprint of breath, lookup similar fingerprints include in the fingerprint preserved:
The fingerprint according to the information to be retrieved carried in the retrieval request, in the fingerprint preserved
Searching similar fingerprints includes:
According to the first pre-defined rule, in the fingerprint of the information to be retrieved carried from the retrieval request,
Obtain the corresponding segmentation of mark that the retrieval request is carried;
In the segmentation preserved, search and the acquired identical segmentation of segmentation;Wherein, protected
A kind of fingerprint that each segmentation deposited at least belongs in preserved fingerprint;
In the fingerprint preserved, the fingerprint corresponding to found segmentation is filtered out, by what is filtered out
Fingerprint of the fingerprint respectively with the information to be retrieved is compared, and finds out similar fingerprints.
In this alternative, in the server in addition to preserving fingerprint, also preserve at least one in fingerprint
Individual segmentation;The segmentation can be sent when storage is asked to send jointly to fingerprint to be stored by client
Server, can also be by mark of the server according to the segmentation carried in storage request, from finger to be stored
Voluntarily obtained in line.Correspondingly, the client is when sending the fingerprint of information to be stored to server,
The mark of the segmentation obtained according to first pre-defined rule can be sent, or directly transmits segmentation to service
Device;And client can determine each to be segmented corresponding server according to second pre-defined rule, come
It will be segmented or the mark of segmentation be sent to corresponding server.
In this alternative, the fingerprint can be all preserved at least to the fingerprint of all preservations in the server
Partial fingerprints can also only be preserved and are segmented by one segmentation.In other alternatives, it can not also preserve
The segmentation of fingerprint, only preserves fingerprint in itself.
This alternative is equivalent to lookup is carried out in two steps, and the fingerprint first filtered out with information to be checked has
The fingerprint of same segment, then similar fingerprints are searched wherein, so reduce required during lookup similar fingerprints
The fingerprint to be compared, therefore the efficiency of lookup can be improved.
In this alternative, a kind of embodiment of fingerprint preservation corresponding with segmentation can be:It will divide
Segment value regard all fingerprints comprising the segmentation as corresponding key assignments (value) as key name (key);
It can be seen that, a corresponding key assignments of value being segmented is that the one or more in one or more fingerprints, key assignments refer to
Line is the fingerprint corresponding to the segmentation filtered out.
It is described in the fingerprint preserved in present embodiment, filter out corresponding to found segmentation
Fingerprint can include:
The value of segmentation to be found is used as the corresponding key assignments of key name lookup;Corresponding to the value of one segmentation
Key assignments be all fingerprints for including the segmentation in the fingerprint that is preserved.
In present embodiment, value identical segmentation only preserves one, including the segmentation in a server
One or more fingerprints both correspond to the segmentation preservation.Assuming that being divided into 4 for the fingerprint of 64
The value of one preserved in segmentation, server, 16 segmentations is " 1010101010101010 ", then
Fingerprint being preserved in the server, comprising " 1010101010101010 " this segmentation can be made
For the key assignments of the segmentation.
Preferably, segmentation can also be sorted out according to mark and preserves, such as be designated the segmentation of " 1 " all
Preserved using " 1 " as index, by that analogy.The finger in key assignments corresponding to the value of one segmentation
Line will not only include the segmentation, and the index of mark and the segmentation of the segmentation in the fingerprint is identical
's.What is be such as segmented is designated the sequence number of segmentation, and an index is for the value of 16 segmentations of " 1 "
" 1010101010101010 ", the then value that first of every kind of fingerprint is segmented in the corresponding key assignments of the segmentation
All it is " 1010101010101010 ";If a kind of value of the second/tri-/tetra- segmentations of fingerprint is
" 1010101010101010 ", then the fingerprint be not belonging to the key assignments of the segmentation.So screening with it is to be checked
When the fingerprint of inquiry information has the fingerprint of same segment, the fingerprint filtered out is less, further increases and looks into
The efficiency looked for.
The corresponding relation for setting up and preserving segmentation and fingerprint otherwise can also be used in other embodiment,
Segmentation and affiliated fingerprint can such as be corresponded and preserved.
In a kind of embodiment of this alternative, the server can also classify according to the mark of segmentation
Segmentation is preserved, i.e.,:The mark being segmented when preserving to be segmented is used as index;To such as 1 be designated
Segmentation is saved together, and is indexed as " 1 ";The segmentation for being designated " 2 " is saved together, indexes and is
" 2 ", by that analogy;
It is described in the segmentation preserved in present embodiment, search identical with acquired segmentation
Segmentation can include:
In preserved, identifying as in the segmentation of index using retrieval request carrying, search and institute
The identical segmentation of the segmentation of acquisition.
The embodiment can reduce the scope for searching segmentation, corresponding only with the fingerprint of information to be checked
Identical fingerprint is segmented, just more whether is further similar fingerprints, so can further improve lookup
The speed of similar fingerprints.
In a kind of alternative of the present embodiment, by the segmentation of the similar fingerprints and the information to be retrieved
The segmentation of fingerprint is corresponding in turn to according to predefined procedure when comparing, can be first by similar fingerprints and information to be retrieved
Fingerprint be divided into segmentation, then compare successively;Such as the fingerprint FP1 and FP2 of 64, respectively
From being divided into 4 sections, FP1 segmentation FP1-1, FP1-2, FP1-3, FP1-4 is obtained, and FP2
It is segmented FP2-1, FP2-2, FP2-3, FP2-4;Assuming that predefined procedure is from left to right, then first compare
FP1-1 and FP2-1, stops comparing if identical, that is, no longer the similar fingerprints are carried out
Segmentation correspondence compares;It is incomplete same, compare FP1-2 and FP2-2;By that analogy.Can also side stroke
Point side is compared, such as assumes predefined procedure from left to right, to be a segmentation by 16, then take respectively
The 1st~16 in FP1, FP2 is compared, and stops comparing if identical, if not exclusively
It is identical, take the in FP1, FP2 the 17th~32 to be compared respectively, by that analogy.
In a kind of alternative of the present embodiment, if client also requires to return to similar fingerprints in itself,
Only send the similar fingerprints counted (i.e.:The similar fingerprints included in count results), do not enter
The similar fingerprints that row is counted are (i.e.:The similar fingerprints not included in count results) do not send.
Embodiment two, a kind of distributed search method, applied to client, as shown in Fig. 2 including step
Rapid S210~S230.
S210, determine according to the second pre-defined rule to be each segmented corresponding server respectively;
S220, to the corresponding server of each segmentation send retrieval request respectively;Taken in the retrieval request
Fingerprint with the information to be retrieved, and the segmentation corresponding to the server mark;Wherein, it is segmented
Division and the mark that is each segmented determined according to the first pre-defined rule;
S230, the count results addition for returning to server for the retrieval request, obtain retrieval result.
In the present embodiment, it is stored in during same fingerprint on multiple servers;When storage on multiple servers
During same fingerprint, the corresponding service of fingerprint difference segmentation can be determined also according to first pre-defined rule
Device, also preserves corresponding segmentation or mark during server storage fingerprint.If a kind of fingerprint is only stored in one
On platform server, the method that the present embodiment can also be applicable, but be only possible in this case for the fingerprint
Once counted, so the problem of counting duplicate removal is not present.In a kind of alternative of the present embodiment,
The fingerprint of the information to be retrieved can be, but not limited to calculate by SIMHASH algorithms and obtain;Other energy
Enough calculating the duplicate message recognizer of fingerprint (or being characterized word) can also be applicable.The calculating can
Client is sent to be carried out by client, or after can also being calculated by miscellaneous equipment.
In the present embodiment, the details of first pre-defined rule is referring to embodiment one;The client is used
The first pre-defined rule it is identical with server.
In a kind of alternative of the present embodiment, the mark of the segmentation can be, but not limited to the sequence for segmentation
Number.The fingerprint of 64 is such as divided into 4 segmentations, then the mark of this 4 segmentations from left to right may be used
To be followed successively by 1,2,3,4.The mark of the segmentation can also be set to it according to the first pre-defined rule
It is numbered or sequence, in a fingerprint, and segmentation and mark are corresponded;Than such as above-mentioned 4 points
The mark of section can also from left to right be followed successively by a, b, c, d.
In a kind of alternative of the present embodiment, each segmentation correspondence is determined respectively according to the second pre-defined rule
Server can include:
The fingerprint of the information to be retrieved is divided into K segmentation according to first pre-defined rule;
HASH (Hash) computing is carried out to the number of server using the value being each segmented respectively, according to
Operation result determines the corresponding server of the segmentation.
In this alternative, the Hash result of K segmentation has J, and J scope is greater than or equal to
1st, less than or equal to K.That is, segmentation may be corresponded with server, it is also possible to two or
More than two segmentations correspond to same server.Because the mark of segmentation and segmentation are to correspond
, so the corresponding relation of segmentation and server, is equivalent to the mark of segmentation and the correspondence pass of server
System.
In other alternatives, second pre-defined rule can also be other forms;Such as according to other
Computational methods or rule obtain the corresponding server of each segmentation, for another example according to point prestored in client
The corresponding relation of section and server identification determines that be each segmented corresponding server (prestores in different clients
Corresponding relation can be different, so retrieval pressure can be made to share on different server), for another example
Corresponding server is determined according to the mark of segmentation.When it is determined that being segmented corresponding server, if not
Need to use segmentation in itself, client can not carry out staged operation to fingerprint, as long as getting segmentation
Mark.
In a kind of alternative of the present embodiment, if identified corresponding server number is less than K,
There is a situation where that two or more segmentations correspond to same server, then in step S120,
To when sending retrieval request corresponding to two or more servers being segmented, two inspections are segmented into
Rope request is sent, and the mark of two segmentations can also be placed in a retrieval request and sent.
In a kind of alternative of the present embodiment, the client is if necessary to the specific interior of similar fingerprints
Hold, can also send mark and information to be retrieved fingerprint to server when, it is desirable to server return into
The similar fingerprints that row is counted.
Illustrate above-described embodiment with a specific example below;In this example, segments K is 4, point
Segmented mode is respectively;64 fingerprints are obtained using SIMHASH.Assuming that fingerprint FP1 and FP2 are phase
Like fingerprint, only the 27th, the 55th difference.Segmentation the sequence number for being designated segmentation, four segmentation
Mark is from left to right followed successively by 1,2,3,4.Predefined procedure is from left to right.
Memory phase:
Client by the fingerprint FP1 of information to be stored be divided into 4 segmentation FP1-1, FP1-2, FP1-3,
FP1-4, the mark of segmentation is respectively 1,2,3,4;HASH computings, root are carried out according to each segmentation
Determine that four are respectively mapped to server A, server B, server according to the result of HASH computings
C and server D.Storage request with fingerprint FP1 and mark 1 is sent to server A by client,
Storage request with fingerprint FP1 and mark 2 is sent to server B, fingerprint FP1 and mark will be carried
3 storage request is sent to server C, and the storage request with fingerprint FP1 and mark 4 is sent into service
Device D.
By taking server A as an example, first according to the first of fingerprint FP1 segmentation FP1-1 to identify 1
To have searched whether same segment in the set of index, if it is not, will segmentation FP1-1 be stored in
In set of the mark 1 for index.The form of the set can be, but not limited to as segmentation table, server A
The segmentation for being designated 1 is all stored in the segmentation table.
Server A can also arrive fingerprint FP1 storages with FP1-1's using FP1-1 value as key
Value is as in the value corresponding to key;The form of the value can be, but not limited to as fingerprint list.
If receiving the storage request with fingerprint FP3 after server A again, and taken in storage request
Band is designated 1, it is assumed that fingerprint FP3 first segmentation FP3-1 is identical with FP1-1, then takes
, will when business device A has searched whether same segment according to FP3-1 in the segmentation table with mark 1 for index
FP1-1 can be found, then FP3 is also added to using corresponding to FP1-1 value as key by server A
In value, that is, it is added to using in the fingerprint list corresponding to FP1-1 value as key.
The way of other servers is similar, repeats no more.
Even if it should be noted that not using the way of memory phase in this example, such as directly storage refers to
Line can realize counting duplicate removal during retrieval on the server in itself, similarly.Provided using in this example
Storage method, it is possible to increase retrieval similar fingerprints when efficiency, but on count duplicate removal do not influence.
Retrieval phase:
Client by the fingerprint FP2 of information to be retrieved be divided into 4 segmentation FP2-1, FP2-2, FP2-3,
FP2-4, mark is respectively 1,2,3,4;HASH computings are carried out according to each segmentation, according to HASH
The result of computing determines that four are respectively mapped to four servers, it is assumed that also be exactly server A,
Server B, server C and server D.Client please by the retrieval with fingerprint FP2 and mark 1
Ask and be sent to server A, the retrieval request with fingerprint FP2 and mark 2 is sent to server B, by band
There are fingerprint FP2 and the retrieval request of mark 3 to be sent to server C, will be with fingerprint FP2 and mark 4
Retrieval request be sent to server D.
By taking server C as an example, to identify in 3 segmentation tables for index, search and the complete phases of FP2-3
With segmentation, it will find FP1-3 (or with the identical segmentations of FP1-3, it is assumed here that
For FP1-3);FP2 similar fingerprints are searched in using FP1-3 value as the corresponding fingerprint tables of key,
Obtain comprising one or more fingerprints including fingerprint FP1.Compare FP1-1 and FP2-1 first, find
It is identical, but the mark 1 of the segmentation is different from mark 3 in retrieval request, and therefore server C is not
Fingerprint FP1 is included in count results.
Server A also enters to go above-mentioned similar operations, but the difference is that in mark 1 and retrieval request
Identify 1 identical, therefore server A can include fingerprint FP1 in count results.
Due to the 27th of FP1 and FP2,55 it is different, i.e., FP1-2 is different with FP2-2, FP1-4
It is different with FP2-4, therefore server B and server D be to look for less than identical in corresponding segment table
Segmentation, therefore in obtained similar fingerprints there is no FP1.
As can be seen here, by such scheme, server end can realize counting duplicate removal.
In the present example, it is assumed that be designated 3 storage request, retrieval request and be all changed to be sent to server
A, then server A can filter out FP1 twice when retrieving similar fingerprints according to FP2, but due to first
Individual identical segmentation is designated 1, therefore for being designated 3 retrieval request, is not counting
As a result FP1 is included in, so also a counting is once when being counted to FP1.It can be seen that, even if carrying phase
Multiple retrieval requests of fingerprint, different identification with information to be retrieved are dealt into same server, similarly
Counting duplicate removal can be carried out in server.
In the present example, by when storage is with retrieval segmentation it is followed it is regular it is identical, determine corresponding clothes
The rule of business device is also identical (to be carried out Hash operation according to the value of segmentation, is determined according to Hash result corresponding
Server), so carrying FP1, identifying the storage request of 1 (or 3) and carrying FP2, mark 1
The retrieval request of (or 3) can be dealt into same server, it is to avoid the situation of missing inspection.
Embodiment three, a kind of distributed search device, are arranged at server, as shown in figure 3, including:
Similar fingerprints searching modul 31, for receiving after retrieval request, takes according in the retrieval request
The fingerprint of the information to be retrieved of band, similar fingerprints are searched in the fingerprint preserved;
Counting module 32, for each similar fingerprints for finding, is proceeded as follows respectively:
The segmentation of the similar fingerprints and the segmentation of the fingerprint of the information to be retrieved is right successively according to predefined procedure
It should compare, stop comparing after first identical segmentation is found;By the identical segmentation
The mark carried in mark and the retrieval request is compared, and is if the same included in count results
The similar fingerprints;Wherein, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
Respond module 33, for returning to the count results.
Wherein, similar fingerprints searching modul 31 is the portion of responsible retrieval similar fingerprints in apparatus described above
Point, can be the combination of software, hardware or both.
Wherein, counting module 32 is the responsible part for counting and count duplicate removal in apparatus described above,
It can be the combination of software, hardware or both.
Wherein, respond module 33 is the part of responsible returning result in apparatus described above, can be soft
The combination of part, hardware or both.
In a kind of alternative of the present embodiment, the counting module can be also used for when described identical
Segmentation mark it is different with the mark carried in the retrieval request when, not in the count results bag
Containing corresponding similar fingerprints, such as directly ignore the similar fingerprints.
In a kind of alternative of the present embodiment, the mark of the segmentation can be, but not limited to as the segmentation
Sequence number.
In a kind of alternative of the present embodiment, the similar fingerprints searching modul 31 as shown in figure 4,
It can include:
Acquiring unit 311, for according to the first pre-defined rule, being treated from described in retrieval request carrying
In the fingerprint for retrieving information, the corresponding segmentation of mark that the retrieval request is carried is obtained;
Comparing unit 312 is segmented, it is complete with acquired segmentation in the segmentation preserved, searching
Identical is segmented;Wherein, a kind of fingerprint that each segmentation preserved at least belongs in preserved fingerprint;
Fingerprint comparing unit 313, the found segmentation institute in the fingerprint preserved, filtering out
Corresponding fingerprint, the fingerprint by the fingerprint filtered out respectively with the information to be retrieved is compared, and is found out
Similar fingerprints.
Wherein, acquiring unit 311 is to be responsible in the similar fingerprints searching modul 31 obtaining to be compared
The part of segmentation, can be the combination of software, hardware or both.
Wherein, segmentation comparing unit 312 be responsible in the similar fingerprints searching modul 31 finding it is identical
The part of segmentation, can be the combination of software, hardware or both.
Wherein, fingerprint comparing unit 313 is to be responsible for filtering out phase in the similar fingerprints searching modul 31
Can be the combination of software, hardware or both like the part of fingerprint.
In a kind of alternative of the present embodiment, the fingerprint comparing unit 313 is in the fingerprint preserved
In, filtering out the fingerprint corresponding to found segmentation includes:
The value of segmentation of the fingerprint comparing unit 313 to be found is used as the corresponding key of key name lookup
Value;Key assignments corresponding to the value of one segmentation is all fingerprints for including the segmentation in the fingerprint that is preserved.
In a kind of alternative of the present embodiment, the segmentation comparing unit 312 is in the segmentation preserved
In, search includes with the identical segmentation of acquired segmentation:
The segmentation comparing unit 312 is used as rope in mark being preserved, being carried using the retrieval request
In the segmentation drawn, search and the acquired identical segmentation of segmentation.
Other implementation details can be found in embodiment one.
Example IV, a kind of distributed search device, are arranged at client, as shown in figure 5, including:
Determining module 41, for determining each to be segmented corresponding server respectively according to the second pre-defined rule;
Request module 42, for sending retrieval request to the corresponding server of each segmentation respectively;It is described
Carry the fingerprint of the information to be retrieved in retrieval request, and the segmentation corresponding to the server mark
Know;Wherein, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
Computing module 43, for server to be added for the count results that the retrieval request is returned,
Obtain retrieval result.
Wherein it is determined that module 41 is to be responsible for determining the mark of segmentation in apparatus described above and corresponding
The part of server, can be the combination of software, hardware or both.
Wherein, request module 42 is to be responsible for sending the part of retrieval request in apparatus described above, can be with
It is the combination of software, hardware or both.
Wherein, computing module 43 is the part for being responsible for calculating retrieval result in apparatus described above, can be with
It is the combination of software, hardware or both.
In a kind of alternative of the present embodiment, the mark of the segmentation can be, but not limited to as the segmentation
Sequence number.
In a kind of alternative of the present embodiment, the determining module 41 is as shown in fig. 6, can include:
Division unit 411, for the fingerprint of the information to be retrieved to be drawn according to first pre-defined rule
It is divided into K segmentation;
Hash operation unit 412, for being breathed out respectively using the value being each segmented to the number of server
Uncommon computing, the corresponding server of the segmentation is determined according to operation result.
Wherein, division unit 411 is to be responsible for dividing the part of fingerprint in the determining module 41, can be with
It is the combination of software, hardware or both.The division unit 411 can also be arranged on the determining module
In 41, the segmentation that Hash operation unit 412 is marked off using division unit 411 carries out Hash operation.
Wherein, Hash operation unit 412 is the responsible portion for carrying out Hash operation in the determining module 41
Point, can be the combination of software, hardware or both.
Other implementation details can be found in embodiment two.
Embodiment five, a kind of distributed search method, including the method that server is applied in embodiment one
With the method that client is applied in embodiment two.
Embodiment six, a kind of distributed search system, including the device of server is arranged in embodiment three
With the device that client is arranged in example IV.
One of ordinary skill in the art will appreciate that all or part of step in the above method can pass through journey
Sequence instructs related hardware to complete, and described program can be stored in computer-readable recording medium, such as only
Read memory, disk or CD etc..Alternatively, all or part of step of above-described embodiment can also make
Realized with one or more integrated circuits.Correspondingly, each module/unit in above-described embodiment can be with
Realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The application is not limited
In the combination of the hardware and software of any particular form.
Certainly, the application can also have other various embodiments, spiritual and its essence without departing substantially from the application
In the case of, those skilled in the art work as can make various corresponding changes and change according to the application
Shape, but these corresponding changes and deformation should all belong to the protection domain of claims hereof.
Claims (18)
1. a kind of distributed search method, applied to server, including:
Receive after retrieval request, according to the fingerprint of the information to be retrieved carried in the retrieval request, in institute
Similar fingerprints are searched in the fingerprint of preservation;
For each similar fingerprints found, proceed as follows respectively:By the segmentation of the similar fingerprints
Segmentation with the fingerprint of the information to be retrieved is corresponding in turn to according to predefined procedure compares, when finding first
Stop comparing after identical segmentation;By in the mark of the identical segmentation and the retrieval request
The mark of carrying is compared, and the similar fingerprints are if the same included in count results;Wherein, divide
The division of section and the mark being each segmented are determined according to the first pre-defined rule;
Return to the count results.
2. the method as described in claim 1, it is characterised in that also include:
If the mark of the identical segmentation is different with the mark carried in the retrieval request,
Corresponding similar fingerprints are included not in the count results.
3. the method as described in claim 1, it is characterised in that:
The sequence number for being designated the segmentation of the segmentation.
4. the method as described in any one of claims 1 to 3, it is characterised in that described according to the inspection
The fingerprint of the information to be retrieved carried in rope request, lookup similar fingerprints include in the fingerprint preserved:
According to the first pre-defined rule, in the fingerprint of the information to be retrieved carried from the retrieval request,
Obtain the corresponding segmentation of mark that the retrieval request is carried;
In the segmentation preserved, search and the acquired identical segmentation of segmentation;Wherein, protected
A kind of fingerprint that each segmentation deposited at least belongs in preserved fingerprint;
In the fingerprint preserved, the fingerprint corresponding to found segmentation is filtered out, by what is filtered out
Fingerprint of the fingerprint respectively with the information to be retrieved is compared, and finds out similar fingerprints.
5. method as claimed in claim 4, it is characterised in that described in the fingerprint preserved,
Filtering out the fingerprint corresponding to found segmentation includes:
The value of segmentation to be found is used as the corresponding key assignments of key name lookup;Corresponding to the value of one segmentation
Key assignments be all fingerprints for including the segmentation in the fingerprint that is preserved.
6. method as claimed in claim 4, it is characterised in that described in the segmentation preserved,
Search includes with the identical segmentation of acquired segmentation:
In preserved, identifying as in the segmentation of index using retrieval request carrying, search and institute
The identical segmentation of the segmentation of acquisition.
7. a kind of distributed search method, applied to client, including:
Determine each to be segmented corresponding server respectively according to the second pre-defined rule;
Respectively retrieval request is sent to the corresponding server of each segmentation;Carried in the retrieval request described
The fingerprint of information to be retrieved, and the segmentation corresponding to the server mark;Wherein, the division of segmentation
And the mark being each segmented is determined according to the first pre-defined rule;
Server is added for the count results that the retrieval request is returned, retrieval result is obtained.
8. method as claimed in claim 7, it is characterised in that:
The sequence number for being designated the segmentation of the segmentation.
9. method as claimed in claim 7 or 8, it is characterised in that described according to the second pre- set pattern
Then determine that being each segmented corresponding server includes respectively:
The fingerprint of the information to be retrieved is divided into K segmentation according to first pre-defined rule;
Hash operation is carried out to the number of server using the value being each segmented respectively, it is true according to operation result
The corresponding server of the fixed segmentation.
10. a kind of distributed search device, is arranged at server, it is characterised in that including:
Similar fingerprints searching modul, for receiving after retrieval request, according to what is carried in the retrieval request
The fingerprint of information to be retrieved, similar fingerprints are searched in the fingerprint preserved;
Counting module, for each similar fingerprints for finding, is proceeded as follows respectively:Should
The segmentation of similar fingerprints and the segmentation of the fingerprint of the information to be retrieved are corresponding in turn to ratio according to predefined procedure
Compared with stopping comparing after first identical segmentation is found;By the mark of the identical segmentation
It is compared with the mark carried in the retrieval request, the phase is if the same included in count results
Like fingerprint;Wherein, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
Respond module, for returning to the count results.
11. device as claimed in claim 10, it is characterised in that:
The counting module is additionally operable to when in the mark and the retrieval request of the identical segmentation
When the mark of carrying is different, corresponding similar fingerprints are included not in the count results.
12. device as claimed in claim 10, it is characterised in that:
The sequence number for being designated the segmentation of the segmentation.
13. the device as any one of claim 10~12, it is characterised in that the similar finger
Line searching modul includes:
Acquiring unit, for according to the first pre-defined rule, from the described to be retrieved of retrieval request carrying
In the fingerprint of information, the corresponding segmentation of mark that the retrieval request is carried is obtained;
Comparing unit is segmented, in the segmentation preserved, searching identical with acquired segmentation
Segmentation;Wherein, a kind of fingerprint that each segmentation preserved at least belongs in preserved fingerprint;
Fingerprint comparing unit, in the fingerprint preserved, filtering out corresponding to found segmentation
Fingerprint, the fingerprint by the fingerprint filtered out respectively with the information to be retrieved is compared, finds out similar
Fingerprint.
14. device as claimed in claim 13, it is characterised in that the fingerprint comparing unit is in institute
In the fingerprint of preservation, filtering out the fingerprint corresponding to found segmentation includes:
The value of segmentation of the fingerprint comparing unit to be found is used as the corresponding key assignments of key name lookup;One
Key assignments corresponding to the value of individual segmentation is all fingerprints for including the segmentation in the fingerprint that is preserved.
15. device as claimed in claim 13, it is characterised in that the segmentation comparing unit is in institute
In the segmentation of preservation, search includes with the identical segmentation of acquired segmentation:
It is described segmentation comparing unit it is being preserved, using the retrieval request carry mark as index
In segmentation, search and the acquired identical segmentation of segmentation.
16. a kind of distributed search device, is arranged at client, it is characterised in that including:
Determining module, for determining each to be segmented corresponding server respectively according to the second pre-defined rule;
Request module, for sending retrieval request to the corresponding server of each segmentation respectively;The retrieval
Carry the fingerprint of the information to be retrieved in request, and the segmentation corresponding to the server mark;Its
In, the division of segmentation and the mark being each segmented are determined according to the first pre-defined rule;
Computing module, for server to be added for the count results that the retrieval request is returned, is obtained
Retrieval result.
17. device as claimed in claim 16, it is characterised in that:
The sequence number for being designated the segmentation of the segmentation.
18. the device as described in claim 16 or 17, it is characterised in that the determining module includes:
Division unit, for the fingerprint of the information to be retrieved to be divided into according to first pre-defined rule
K segmentation;
Hash operation unit, for carrying out Hash fortune to the number of server using the value being each segmented respectively
Calculate, the corresponding server of the segmentation is determined according to operation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610105198.9A CN107122370A (en) | 2016-02-25 | 2016-02-25 | A kind of distributed search method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610105198.9A CN107122370A (en) | 2016-02-25 | 2016-02-25 | A kind of distributed search method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107122370A true CN107122370A (en) | 2017-09-01 |
Family
ID=59717519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610105198.9A Pending CN107122370A (en) | 2016-02-25 | 2016-02-25 | A kind of distributed search method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107122370A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109471921A (en) * | 2018-11-23 | 2019-03-15 | 深圳市元征科技股份有限公司 | A kind of text duplicate checking method, device and equipment |
CN109582674A (en) * | 2018-11-28 | 2019-04-05 | 亚信科技(南京)有限公司 | A kind of date storage method and system |
CN110135353A (en) * | 2019-05-17 | 2019-08-16 | 北京海鑫高科指纹技术有限公司 | A kind of method and system excluding victim and relevant people scene fingers and palms line |
CN110149529A (en) * | 2018-11-01 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Processing method, server and the storage medium of media information |
CN116467481A (en) * | 2022-12-14 | 2023-07-21 | 喜鹊科技(广州)有限公司 | Information processing method and system based on cloud computing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495894A (en) * | 2011-12-12 | 2012-06-13 | 成都市华为赛门铁克科技有限公司 | Method, device and system for searching repeated data |
CN103248609A (en) * | 2012-02-06 | 2013-08-14 | 同方股份有限公司 | System, device and method for detecting data from end to end |
CN103646080A (en) * | 2013-12-12 | 2014-03-19 | 北京京东尚科信息技术有限公司 | Microblog duplication-eliminating method and system based on reverse-order index |
CN110399464A (en) * | 2019-07-30 | 2019-11-01 | 广州吉信网络科技开发有限公司 | A kind of similar news method of discrimination, system and electronic equipment |
-
2016
- 2016-02-25 CN CN201610105198.9A patent/CN107122370A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495894A (en) * | 2011-12-12 | 2012-06-13 | 成都市华为赛门铁克科技有限公司 | Method, device and system for searching repeated data |
CN103248609A (en) * | 2012-02-06 | 2013-08-14 | 同方股份有限公司 | System, device and method for detecting data from end to end |
CN103646080A (en) * | 2013-12-12 | 2014-03-19 | 北京京东尚科信息技术有限公司 | Microblog duplication-eliminating method and system based on reverse-order index |
CN110399464A (en) * | 2019-07-30 | 2019-11-01 | 广州吉信网络科技开发有限公司 | A kind of similar news method of discrimination, system and electronic equipment |
Non-Patent Citations (2)
Title |
---|
王源: "一种基于Simhash的文本快速去重算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
观澜而索源: "海量数据相似度计算之simhash短文本查找", 《CSDN》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110149529A (en) * | 2018-11-01 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Processing method, server and the storage medium of media information |
CN109471921A (en) * | 2018-11-23 | 2019-03-15 | 深圳市元征科技股份有限公司 | A kind of text duplicate checking method, device and equipment |
CN109582674A (en) * | 2018-11-28 | 2019-04-05 | 亚信科技(南京)有限公司 | A kind of date storage method and system |
CN109582674B (en) * | 2018-11-28 | 2023-12-22 | 亚信科技(南京)有限公司 | Data storage method and system |
CN110135353A (en) * | 2019-05-17 | 2019-08-16 | 北京海鑫高科指纹技术有限公司 | A kind of method and system excluding victim and relevant people scene fingers and palms line |
CN116467481A (en) * | 2022-12-14 | 2023-07-21 | 喜鹊科技(广州)有限公司 | Information processing method and system based on cloud computing |
CN116467481B (en) * | 2022-12-14 | 2023-12-01 | 要务(深圳)科技有限公司 | Information processing method and system based on cloud computing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122370A (en) | A kind of distributed search method and device | |
CN106033416B (en) | Character string processing method and device | |
JP5328808B2 (en) | Data clustering method, system, apparatus, and computer program for applying the method | |
CN114389834B (en) | Method, device, equipment and product for identifying abnormal call of API gateway | |
EP2095277B1 (en) | Fuzzy database matching | |
CN104636349B (en) | A kind of index data compression and the method and apparatus of index data search | |
CN109062936B (en) | Data query method, computer readable storage medium and terminal equipment | |
US20160147867A1 (en) | Information matching apparatus, information matching method, and computer readable storage medium having stored information matching program | |
CN104142946A (en) | Method and system for aggregating and searching service objects of same type | |
CN110728526A (en) | Address recognition method, apparatus and computer readable medium | |
US7584173B2 (en) | Edit distance string search | |
CN116631561B (en) | Patient identity information matching method and device based on feature division and electronic equipment | |
CN105138912A (en) | Method and device for generating phishing website detection rules automatically | |
CN112035621A (en) | Enterprise name similarity detection method based on statistics | |
CN109286622B (en) | Network intrusion detection method based on learning rule set | |
CN115189914A (en) | Application Programming Interface (API) identification method and device for network traffic | |
US20190130034A1 (en) | Fingerprint clustering for content-based audio recognition | |
US8370390B1 (en) | Method and apparatus for identifying near-duplicate documents | |
CN108319626B (en) | Object classification method and device based on name information | |
CN114124484A (en) | Network attack identification method, system, device, terminal equipment and storage medium | |
CN101414299B (en) | Method and apparatus for repairing composite document | |
CN113821630A (en) | Data clustering method and device | |
CN114943285B (en) | Intelligent auditing system for internet news content data | |
CN109460407A (en) | A kind of information storage means and system | |
CN111428482B (en) | Information identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170901 |
|
RJ01 | Rejection of invention patent application after publication |