[summary of the invention]
Based on this, be necessary to provide a kind of fuzzy query method that can improve search efficiency.
A kind of fuzzy query method may further comprise the steps:
According to default index value and the mapping relations between the data item database table is set up global index;
With the ordering of described global index;
Obtain key word of the inquiry, adopt between the index area of interval binary search and described key word of the inquiry coupling;
Index value in obtaining between described index area obtains the data item corresponding with described index value according to described mapping relations, and described data item is returned as Query Result.
Preferably, the mapping relations between described index value and the data item are the mapping relations of the corresponding data item of a plurality of index values.
Preferably, the step between the index area of the interval binary search of described employing and described key word of the inquiry coupling is specially:
The match index that adopts dichotomy location and described key word of the inquiry to mate;
Take described match index as end points obtains between described index area border.
Preferably, the default index value of described basis and the mapping relations between the data item also comprise after database table is set up global index:
The inquiry times of storage index value in described global index;
The described inquiry times of buffer memory reaches with described inquiry times greater than the corresponding data item of the index value of threshold value greater than the index value of threshold value.
Preferably, described obtaining after the key word of the inquiry also comprises:
In buffer memory, search the index value with described key word of the inquiry coupling, and the data item corresponding with the described index value that finds in the buffer memory returned as Query Result, and will add 1 with the inquiry times of the index value of described key word of the inquiry coupling.
In addition, also be necessary to provide a kind of fuzzy query system that can improve search efficiency.
A kind of fuzzy query system comprises with lower module:
Index module is used for according to default index value and the mapping relations between the data item database table being set up global index;
Order module is used for the ordering with described global index;
Search module, be used for obtaining key word of the inquiry, adopt between the index area of interval binary search and described key word of the inquiry coupling;
The value module, the index value in being used for obtaining between described index area obtains the data item corresponding with described index value according to described mapping relations, and described data item is returned as Query Result.
Preferably, the mapping relations between described index value and the data item are the mapping relations of the corresponding data item of a plurality of index values.
Preferably, the described module of searching comprises for the first submodule that adopts dichotomy location and the match index of described key word of the inquiry coupling with for take second submodule of described match index as the border that end points obtains between described index area.
Preferably, also comprise for the inquiry times at described global index storage index value, and the described inquiry times of buffer memory greater than the index value of threshold value and with the cache module of described inquiry times greater than the corresponding data item of index value of threshold value.
Preferably, the described module of searching also is used for searching the index value that mates with described key word of the inquiry at buffer memory, and the data item corresponding with the described index value that finds in the buffer memory returned as Query Result, and will add 1 with the inquiry times of the index value of described key word of the inquiry coupling.
Above-mentioned fuzzy query method and system have set up unified global index to tables of data first, then the lower dichotomy of complexity service time carries out between index area that the index of Search and Orientation and keyword coupling consists of, continuity and the order of index itself have been utilized efficiently, thereby reduced the time complexity of inquiring about in the situation that sacrificed segment space complexity (being used for the storage index), improve inquiry velocity, thereby improved the efficient of fuzzy query.
[embodiment]
As shown in Figure 1, in one embodiment, a kind of fuzzy query method may further comprise the steps:
Step S102 sets up global index according to default index value and the mapping relations between the data item to database table.
Database is made of a plurality of database tables, stores many data records in the database table, and a data record is a data item.For example, store two tables of user message table and organizational chart in the database.The data of storing in the user message table are user's relevant information; The data of storing in the organizational chart are the relevant information of each department in the composition structure of company.
In one embodiment, query manipulation is frequent and increase, delete and upgrade and operate less database table in the first specified data storehouse, then is the frequently definition of data item key words in the database table of these search operations.Preferably, can be a plurality of key words of definition of data item.Then key word is set up global index as index value in database.Index value and data item have mapping relations, and be preferred, is many-to-one mapping relations, i.e. the corresponding data item of a plurality of index values.And for the situation of the identical key word of having of a plurality of data item, can shine upon respectively a plurality of different data item with a plurality of identical index values.
For example, for index value " lm ", both can be the abbreviation at " dawn ", also can be the abbreviation of " Li Meng ".At this moment, only need set up two index values " lm " respectively corresponding " dawn " and " Li Meng " gets final product.
Step S104 sorts global index.
In one embodiment, after global index establishes, according to default ordering rule, global index is sorted again.Preferably, default ordering rule is lexcographical order, and namely first with index value changed string (Chinese character converts first phonetic to), then the lexcographical order according to character string sorts.For example: index value sequence [ab, a, abe, b, abc] is [a, ab, abc, abe, b] after sorting from small to large according to lexcographical order.
Step S106 obtains key word of the inquiry, adopts between the index area of interval binary search and key word of the inquiry coupling.
Step S108 obtains index value interior between the index area, obtains the data item corresponding with index value according to mapping relations, and data item is returned as Query Result.
Interval dichotomy (RBS, Range Binary Search) is a kind of data search method based on traditional dichotomy.Compare with traditional dichotomy, traditional binary search is the position of determining the key word of the inquiry place in a sequence, and interval dichotomy is the continuum of all matching values compositions of location and key word of the inquiry coupling.
In one embodiment, obtain key word of the inquiry, adopt between the index area of interval binary search and key word of the inquiry coupling: the match index that (1) adopts dichotomy location and key word of the inquiry to mate; (2) obtain take this match index as end points between this index area the border (for example, border during with storage of array index value sequence between the index area is the index value corresponding to left and right sides end points of certain continuum in the continuous integral number sequence that consists of of the subscript of array, continuum [a that for example consists of for array index, b], then a is this interval left end point, and b is this interval right endpoint).
Concrete, the step of obtaining the border between the index area take match index as end points is preferably: adopt the left and right sides end points between the index area that dichotomy obtains respectively and key word of the inquiry mates take this match index as end points.
The below illustrates the detailed process of above-mentioned steps S106 with a concrete example.In this example, the index value sequence of establishing in the global index is stored among the array E, and arranges from small to large by lexcographical order.The key word that expectation is searched is key.During fuzzy search, the matching relationship of key and index value is: if the index value character string with key as bebinning character (comprising the identical situation of index value character string and key), then this index value and key word key the coupling, otherwise, do not mate.
Two character strings by the relatively big or small method of lexcographical order are: compare first the ASCII character value (being converted into phonetic for Chinese character) of the initial character of two character strings, the character string that initial character ASCII character value is large is larger; If the ASCII character value of initial character equates, then continue relatively next bit character, until last position.If relatively to a certain when position, certain character string ends up, then giving tacit consent to this ASCII character of this character string is 0 (i.e. inevitable minimum).For example, according to lexcographical order, a<b, a<ab, abcde<abcf.
Search among the step S106 and the index area of key word of the inquiry coupling between detailed process be:
(1) initialization integer vernier variable low, mid and high make low=0, high=E.length-1, and wherein, E.length represents the length of array E.Make again the lower LOW of being designated as of the left end point between the index area, the lower HIGH that is designated as of right endpoint, and suppose the lower LOW=HIGH=-1 of being designated as of initial left and right sides end points.
(2) when low<=high, (3) are carried out in circulation, otherwise, carry out (4).
(3) assignment mid=(low+high)/2, relatively key and index value E[mid] size, E[mid] for being designated as the element of mid among the array E time:
If by aforesaid lexcographical order relatively after, key<E[mid], E[mid then] drop on the right side of the right endpoint between the index area, be E[mid] larger than the right endpoint in index interval, thereby can eliminate all subscripts greater than the index value of mid-1, then assignment high=mid-1 and return (2) and continue relatively;
If by aforesaid lexcographical order relatively after, key>E[mid] and key and E[mid] do not mate, E[mid then] drop on the left side of the left end point between the index area, be E[mid] less than the left end point in index interval, thereby can eliminate all subscripts less than the index value of mid+1, then assignment low=mid+1 and return (2) and continue relatively;
If by aforesaid lexcographical order relatively after, if key<=E[mid] and key and E[mid] coupling, E[mid then] drop between the index area in, i.e. E[mid] be match index.
Then, carry out subprocess S3 and S4, and with the rreturn value assignment of S3 to LOW, with the rreturn value assignment of S4 to HIGH.Subprocess S3 and S4 are respectively applied to locate left end point and the right endpoint in index interval.
(4) value of judgement LOW if the value of LOW is-1, is then returned null value, does not namely find match index; If LOW<=HIGH then returns the subscript LOW of the left end point between the index area and the subscript HIGH of right endpoint.
Subprocess S3 is used for obtaining the left end point between the index area, and detailed process is:
(a) initialization integer vernier variable s3_low=low, s3_high=mid, s3_mid=0 (low herein and high are the local vernier variable among the abovementioned steps S106).
(b) when s3_low<s3_high, carry out (c), otherwise, carry out (d).
(c) assignment s3_mid=(s3_low+s3_high)/2, relatively key and E[s3_mid]:
If key and E[s3_mid] do not mate, then subscript is less than the element before the s3_mid+1, i.e. s3_low=s3_mid+1.
If key and E[s3_mid] coupling, then eliminate subscript greater than the element of s3_mid-1, namely s3_high=s3_mid-1 carries out (b) (final (c) will enter because of s3_low=s3_high (d) after finishing).
(d) compare key and E[s3_low], if key coupling E[s3_low] then return s3_low, otherwise return s3_low+1.
Subprocess S4 is used for obtaining the right endpoint between the index area, and detailed process is:
(I) s4_low=mid during the initialization integer variable, s4_high=high, s4_mid=0 (low herein and high are the local vernier variable among the abovementioned steps S106).
(II) when s4_low<s4_high, carry out (III), otherwise, carry out (IV).
(III) assignment s4_mid=(s4_low+s4_high)/2, relatively key and E[s4_mid] size, if key and E[s4_mid] do not mate, then eliminate all the lower later element of s4_mid-1, i.e. s4_high=s4_mid-1 of being marked among the array E.
If key and E[s4_mid] coupling, then eliminate the record of s4_mid+1 front, namely s4_low=s4_mid+1 carries out (II) (final, as (III) will to carry out because of s4_low=s4_high (IV) after the end).
(IV) relatively key and E[s4_low] size, if key and E[s4_low] coupling, then return s4_low, otherwise return s4_low-1.
Need to prove, in other embodiments, can also be preferably according to the length of key word of the inquiry for the method between the index area of above-mentioned location and key word of the inquiry coupling and to adjust.Namely, if certain key word of the inquiry is grown (with respect to index value), then can estimate and draw the coupling interval less (key word of the inquiry is longer, then greater than most of index values in the global index, therefore the index value that matches is also relatively less), then after having determined match index, directly adopt the left and right sides end points of sequential search method (comparing one by one successively in order) position matching index.
For example, if when certain is inquired about, the coupling interval of mating with key word of the inquiry includes only two index values, then relatively can determine the interval left and right sides end points of coupling according to the sequential search method through three times.If still adopt this moment dichotomy to determine the left and right sides end points in coupling interval, when the length of index value sequence is longer, then need two just can obtain the result several times.The above-mentioned method of the left and right sides end points of the legal position of sequential search match index that adopts when key word of the inquiry is longer can be accelerated inquiry velocity.
By step S106 located and the index area of keyword match between the border after, then according to aforesaid mapping relations, will with the index area between in all data item corresponding to all index values find out, generates a Query Result tabulation, then Query Result is tabulated and returns as Query Result.
The result is a lot of when fuzzy query, and the interface that shows Query Result hour, usually can not disposable all Query Results be showed the user.And in some situation, the user often only is concerned about modal or follows the immediate result of fuzzy search.Therefore, after according to default mapping relations database table being set up global index, also can comprise: the inquiry times of storage index value in global index; The caching query number of times reaches with inquiry times greater than the corresponding data item of the index value of threshold value greater than the index value of threshold value.
In one embodiment, also can comprise after obtaining key word of the inquiry: in buffer memory, search the index value with this key word of the inquiry coupling, and the data item corresponding with index value that find in the buffer memory returned as Query Result, and will add 1 with the inquiry times of the index value of key word of the inquiry coupling.
For example, the number of times of key word " scb " coupling " market department " is maximum, and the number of times of coupling name " Shen Changbin " is less.Therefore, when receiving key word for the query requests of " scb ", in buffer memory, find first the maximum data item " market department " of inquiry times corresponding to " scb ", then first " market department " data item is returned.Simultaneously, can continue in global index, to carry out aforesaid step S106, at last all fuzzy query results be returned.This processing mode has effectively utilized user's browse queries result's time so that the user waits for that the time of Query Result is shorter, has improved user's experience.
As shown in Figure 2, in one embodiment, a kind of fuzzy query system comprises index module 102, order module 104, searches module 106 and value module 108, wherein:
Index module 102 is used for according to default index value and the mapping relations between the data item database table being set up global index.
Database is made of a plurality of database tables, stores many data records in the database table, and a data record is a data item.For example, store two tables of user message table and organizational chart in the database.The data of storing in the user message table are user's relevant information; The data of storing in the organizational chart are the relevant information of each department in the composition structure of company.
In one embodiment, query manipulation is frequent and increase, delete and upgrade and operate less database table in the index module 102 first specified data storehouses, then is the frequently definition of data item key words in the database table of these search operations.Preferably, index module 102 can be a plurality of key words of definition of data item.Then key word is set up global index as index value in database.Index value and data item have mapping relations, and be preferred, is many-to-one mapping relations, i.e. the corresponding data item of a plurality of index values.And for the situation of the identical key word of having of a plurality of data item, can shine upon respectively a plurality of different data item with a plurality of identical index values.
For example, for index value " lm ", both can be the abbreviation at " dawn ", also can be the abbreviation of " Li Meng ".At this moment, only need set up two index values " lm " respectively corresponding " dawn " and " Li Meng " gets final product.
Order module 104 is used for global index is sorted.
In one embodiment, after global index establishes, according to default ordering rule, global index is sorted again.Preferably, default ordering rule is lexcographical order, and namely first with index value changed string (Chinese character converts first phonetic to), then the lexcographical order according to character string sorts.For example: index sequence [ab, a, abe, b, abc] is [a, ab, abc, abe, b] after sorting from small to large according to lexcographical order.
Enquiry module 106 is used for obtaining key word of the inquiry, adopts between the index area of interval binary search and key word of the inquiry coupling.
Index value in value module 108 is used for obtaining between the index area obtains the data item corresponding with index value according to mapping relations, and data item is returned as Query Result.
Interval dichotomy (RBS, Range Binary Search) is a kind of data search method based on traditional dichotomy.Compare with traditional dichotomy, traditional binary search is the position of determining the key word of the inquiry place in a sequence, and interval dichotomy is the continuum of all matching values compositions of location and key word of the inquiry coupling.
In one embodiment, search module comprise be used to the first submodule (not shown) that adopts interval dichotomy location and the match index of key word of the inquiry coupling and for obtain take match index as end points between the index area the border (for example, border during with storage of array index value sequence between the index area is the index value corresponding to left and right sides end points of certain continuum in the continuous integral number sequence that consists of of the subscript of array, continuum [a that for example consists of for array index, b], then a is this interval left end point, and b is this interval right endpoint) the second submodule (not shown).
Concrete, the left and right sides end points between the index area that the second submodule adopts dichotomy to obtain respectively take this match index as end points and key word of the inquiry mates.
Detailed process between the index area of the enquiry module 106 interval binary searches of employing and key word of the inquiry coupling please refer to the query script that aforesaid step S106 describes, and does not repeat them here.
Need to prove, in other embodiments, in the time of between the index area that the second submodule mates with key word of the inquiry in the location, can also be preferably according to the length of key word of the inquiry and adjust.Namely, if certain key word of the inquiry is grown (with respect to index value), then can estimate and draw the coupling interval less (key word of the inquiry is longer, then greater than most of index values in the global index, therefore the index value that matches is also relatively less), then after having determined match index, directly adopt the left and right sides end points of sequential search method (comparing one by one successively in order) position matching index.
For example, if when certain is inquired about, the coupling interval of mating with key word of the inquiry includes only two index values, then relatively can determine the interval left and right sides end points of coupling according to the sequential search method through three times.If still adopt this moment dichotomy to determine the left and right sides end points in coupling interval, when the length of index value sequence is longer, then need two just can obtain the result several times.The above-mentioned method of the left and right sides end points of the legal position of sequential search match index that adopts when key word of the inquiry is longer can be accelerated inquiry velocity.
Enquiry module 106 located and the index area of keyword match between the border after, 108 of value modules are according to aforesaid mapping relations, will with the index area between in all data item corresponding to all index values find out, generate a Query Result tabulation, then the Query Result tabulation is returned as Query Result.
The result is a lot of when fuzzy query, and the interface that shows Query Result hour, usually can not disposable all Query Results be showed the user.And in some situation, the user often only is concerned about modal or follows the immediate result of fuzzy search, therefore, preferably, in one embodiment, also comprise for the inquiry times at aforesaid global index storage index value, and the caching query number of times greater than the index value of threshold value and with the cache module (in figure indicate) of inquiry times greater than the corresponding data item of index value of threshold value.
In one embodiment, searching module also is used for searching the index value that mates with key word of the inquiry at buffer memory, and the data item corresponding with index value that find in the buffer memory returned as Query Result, and will add 1 with the inquiry times of the index value of key word of the inquiry coupling.
For example, the number of times of key word " scb " coupling " market department " is maximum, and the number of times of coupling name " Shen Changbin " is less.Therefore, when receiving key word for the query requests of " scb ", in buffer memory, find first the maximum data item " market department " of inquiry times corresponding to " scb ", then first " market department " data item is returned.Simultaneously, enquiry module 106 can continue in global index inquiry all with the key word of key word of the inquiry coupling, value module 108 is returned all fuzzy query results at last.This processing mode has effectively utilized user's browse queries result's time so that the user waits for that the time of Query Result is shorter, has improved user's experience.
The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to claim of the present invention.Should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.