CN108846016A - A kind of searching algorithm towards Chinese word segmentation - Google Patents

A kind of searching algorithm towards Chinese word segmentation Download PDF

Info

Publication number
CN108846016A
CN108846016A CN201810422499.3A CN201810422499A CN108846016A CN 108846016 A CN108846016 A CN 108846016A CN 201810422499 A CN201810422499 A CN 201810422499A CN 108846016 A CN108846016 A CN 108846016A
Authority
CN
China
Prior art keywords
suffix
node
string
index
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810422499.3A
Other languages
Chinese (zh)
Other versions
CN108846016B (en
Inventor
金城
陶仕谦
唐士芳
吴渊
张玥杰
冯瑞
薛向阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201810422499.3A priority Critical patent/CN108846016B/en
Publication of CN108846016A publication Critical patent/CN108846016A/en
Application granted granted Critical
Publication of CN108846016B publication Critical patent/CN108846016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention belongs to text search engine technical field, specially a kind of searching algorithm towards Chinese word segmentation.Inventive algorithm is broadly divided into two stages:Offline building index stage and online lookup stage.In the building index stage offline, the suffix set of strings of all original character set of strings is extracted first, and improved suffix tree is then generated by suffix set of strings;The stage is being searched online, the query result of keyword is obtained according to the index model based on suffix tree first, then the matching degree of quantized key word and query result, returns after finally query result sorts from high to low by matcher.The present invention passes through a kind of improved efficiency for balancing index construct time and occupied space based on the index structure of suffix tree, being much higher than to result set violence calculating matching degree and sequence using the search efficiency of index structure of the invention.

Description

A kind of searching algorithm towards Chinese word segmentation
Technical field
The invention belongs to text search engine technical fields, and in particular to a kind of searching algorithm towards Chinese word segmentation.
Background technique
Search engine is a kind of online information research tool, and a series of search results for meeting user's search key are returned Back to user.Today's society is how the epoch of an information explosion are quickly accurately positioned user and think facing to countless information The information wanted is most urgent one of demand, therefore information search technique is also rapidly progressed and applies.
Searching for the most common form is text search, and no matter the target resource of user, which is text, image, audio, even regards Frequently, as long as the format of input is text, in the range of can summing up in the point that the present invention searches for.Now in addition to Google, must answer, Yahoo Outside the whole network station function of search of equal offers, the search need of specific area is also increasing.In specific area (for example be only oriented to TV programme), since the type of resource has limitation, so the condition of search can generally accomplish very clear, other data set Size also within the acceptable range, many targetedly optimizations can be made to search engine under these premises.
The relevant technologies of Chinese search system mainly have inverted index, forward index, documents signed (DS), suffix tree etc. at present. Wherein inverted index comprehensive performance is preferable and the most frequently used, but in practical applications, using the big text set of inverted index model treatment It is all the test of very severe to cpu resource, memory headroom and I/O when conjunction.
Summary of the invention
It is an object of the invention to propose a kind of searching algorithm towards Chinese word segmentation, it is applied to intelligentized Chinese search Automotive engine system enables rapidly to return to search result according to keyword, and result is sorted from high to low by matching degree After show user.
Searching algorithm proposed by the present invention towards Chinese word segmentation can be mainly divided into two stages:Offline building index Stage and online lookup stage.In the building index stage offline, the suffix set of strings of all original character set of strings is extracted first, Then improved suffix tree is generated by suffix set of strings;The stage is being searched online, first according to the index model based on suffix tree The query result of keyword is obtained, then the matching degree of quantized key word and query result, finally by query result by matching Program returns after sorting from high to low.
One, the index stage is constructed offline, the specific steps are:
(1) suffix set of strings is generated by original data set
T (S) indicates original data set composed by the character string S with separator ($) and end mark (#), wherein i-th of word The index ID of symbol string is i (1≤i≤n).Assuming that WBS indicate from separator suffix string, NWBS indicate not from separator Locate the suffix string started.By the suffix set of strings T (WBS) and T (NWBS) of T (S) generation tape index ID, specific step is as follows:
The first step:All character strings in T (S) are traversed, all suffix string s of each character string are extractedi, constitute set T* (s1),T*(s2)…T*(sn)[1].Wherein suffix string refer to character string S since the i of position to a substring of the end S end mark, Even S C1C2…CnIt indicates, then CiCi+1…CnReferred to as S suffix string (1≤i≤n);
Second step:Reject T*(s1),T*(s2)…T*(sn) in all suffix headed by separator ($) or end mark (#) String;
Third step:Traverse T*(si) in all suffix strings, if the initial character of suffix string is identical with the initial character of former character string, Or it is identical with the initial character after separator ($) in former character string, then it is added after index ID is added at the suffix string end to T (WBS), conversely, being then added after index ID is added at the suffix string end to T (NWBS).
(2) suffix set of strings T (WBS) and T (NWBS) are established respectively and improves suffix tree
Improving suffix tree is that the mark in each edge is stored in node on the basis of traditional suffix tree [1].That is handle For each node as a storage unit, structure is as shown in Figure 1.Nodal stored information includes node identification, end mark section Point pointer, separator child node pointer, general child node pointer set and match index ID sequence, wherein node identification is to terminate Symbol, separator or general character string.
Establishing improvement suffix tree to any suffix set of strings T, specific step is as follows:
The first step:Creation one only includes the improvement suffix tree of a node, the node identification of the node, all child nodes Pointer and match index ID sequence are sky, this node are denoted as the root node root for improving suffix tree.
Second step:All elements in suffix set of strings T are sequentially inserted into and are improved in suffix tree.The insertion of each suffix string Process is all to find insertion position from root node.
Using the improvement suffix tree in Fig. 2 as example, following three kinds of situations are divided into when sewing string after such insertion:
Situation is 1.:The suffix string for such as needing to be inserted into has already appeared in present tree, then directly in the match index ID of node Call number is added in sequence.Such as the suffix string to be inserted into be " student #2 ", due to " student " node in present tree Through occurring, therefore call number is added directly in the match index ID sequence of node, as a result as shown in Fig. 3 (a).
Situation is 2.:Prefix if you need to the suffix string of insertion is identical as current existing node, then is to need directly addition node ?.Such as the suffix string for needing to be inserted into is " Fudan University $ student #3 ", since " Fudan University " node has existed, so directly adding Node " student " and " # ", as a result as shown in Fig. 3 (b).
Situation is 3.:The suffix string for such as needing to be inserted into is identical as the prefix in present node, then first divides present node, then It is inserted into other nodes.Such as the suffix string for needing to be inserted into be " big student #4 ", due to suffix string prefix " big " with work as prosthomere Prefix in point " university " is identical, so needing first to divide present node, is then inserted into other nodes, as a result such as Fig. 3 (c) institute Show.
Third step:The match index ID sequence of each node of recurrence Construction.By it is preceding it is found that end mark node match index ID sequence whole suffix strings be inserted into complete when construction complete.Therefore, all non-end mark node N (s) need to only be constructed Match index ID sequence Q (N (s)), shown in specific method such as formula (1):
Q (N (s))=Q (N (s#)) Q (N (s $)) Q (N (s*)) # (1)
Wherein, N (s#), N (s $) and N (s*) respectively indicate the end mark child node of node N (s), separator child node and All general child nodes.
Two, the stage is searched online, the specific steps are:
(1) match point is inquired
To arbitrary node N (s), from N (s), inquiry string c1…cnMatched node process such as formula (2) institute Show:
Wherein, R (N (s)) indicates that query result, N (s) are matched node, and s is node identification.
Provide inquiry string c1…cn, all child nodes of root node are first looked for, the initial character etc. of node identification is found In c1Child node N (s), then execute R (N (s), c1…cn), all match points are found, search result R (N (s)) is finally obtained =(S, Q (N (s))).Wherein, Q (N (s)) is the match index ID sequence of N (s).
(2) it sorts to result set
Negentropy is defined to measure inquiry string c1…cnWith the matching degree of search result character string s, entropy is smaller, It is lower with degree;Conversely, entropy is bigger, matching degree is higher.
Assuming that the computational algorithm of negentropy value is following (initial entropy is 0):
(a) it obtains from c1Position i in s;
(b) s is traversed backward since i, the ending until encountering separator $ or full stop # or s, it is assumed that period time M character is gone through;
If what is (c) encountered is the ending of s, judge whether last character is full stop #, if it is, negentropy value Increase m2, algorithm terminates;Otherwise, negentropy value increases m, and algorithm terminates;
If that (d) encounter is separator $, negentropy value increases m2, algorithm terminates;
(e) i is updated to the position of the latter character of the separator encountered, is returned to (b).
The participle negentropy value that all s are concentrated according to above step calculated result is worth descending to result set progress by it Sequence.
(3) duplicate keys in result set are eliminated and generate search result sequence
The Q (N (s)) for successively taking out result set after sorting, is put into search result sequence after executing corresponding operating, search knot Infructescence column initial value is sky.Formula (3) is to Q (N (si)) execute concrete operations:
SR (i)=(D (Q (N (si)))-SR(i-1))∩SR(i-1),1≤i≤n# (3)
Wherein, SR (i) indicates the match index ID sequence Q (N (s for having merged i-th of nodei)) after search result sequence Column, SR (1) and SR (n) are respectively the original state and end-state of search result sequence;D(Q(N(si))) indicate to Q (N (si)) execute deduplication operation;(D-SR) Q (N (s after duplicate removal is indicatedi)) in removal occurred in search result sequence Call number;(D-SR) ∩ SR indicates the end that (D-SR) is added to current search result sequence SR.
After above-mentioned steps, finally obtained search result sequence is SR (n).
The present invention is balanced the index construct time and accounted for based on the index structure of suffix tree by the way that one kind is improved come good With space, it is much higher than the effect that matching degree is calculated to result set violence and is sorted using the search efficiency of index structure of the invention Rate, and it is compared to the fuzzy search of other full-text index structures realization, when index structure of the invention uses less building Between and committed memory cost while can have very high search efficiency.
Detailed description of the invention
Fig. 1:Improve the structure chart of suffix tree node.
Fig. 2:Improve suffix tree exemplary diagram.
Fig. 3:Different situations comparison diagram when being inserted into suffix string.
Specific embodiment
For search performance of the research present invention on different size data set, we construct respectively data volume be 10000, 20000,50000,100000 and 200,000 five data sets, and on each data set with the Lucene engine based on inverted list Carry out multiple groups comparative experiments.
The random length that generates is search string each 25 that 2-4 is not waited, collectively forms 75 kinds of search strings.For every A kind of search string all carries out 100000 search, and under the premise of search result is correct, the time that record is searched for every time disappears Consumption.
In order to which Lucene can complete to index identical task with the present invention, when establishing initial index in initial Space is added in each intercharacter of sequence, and making each character is considered as a word, in each intercharacter of search string Also space is added, to realize the identical function of search of the present invention.
Experimental result is as shown in table 1:
1 present invention index of table and the comparison of Lucene indexed search time
By table as it can be seen that inventive algorithm suffers from search efficiency more better than Lucene on any data set, and tie Fruit is more obvious in small data set, can be with using the search efficiency of inventive algorithm in the case where data set is less than 50000 Reach 7-10 times of Lucene.
With reference to selected works:
[1]E.Ukkonen,On-Line Construction of Suffix Trees,Algorithmica,14 (1995),249-260。

Claims (3)

1. a kind of searching algorithm towards Chinese word segmentation, which is characterized in that be divided into two stages:The offline building index stage and Line searches the stage;
(1) the index stage is constructed offline, the specific steps are:
(1) suffix set of strings is generated by original data set
T (S) indicates original data set composed by the character string S with separator ($) and end mark (#), wherein i-th of character string Index ID be i, 1≤i≤n, it is assumed that WBS indicate from separator suffix string, NWBS expression do not opened from separator The suffix string of beginning;By the suffix set of strings T (WBS) and T (NWBS) of T (S) generation tape index ID, specific step is as follows:
The first step:All character strings in T (S) are traversed, all suffix string s of each character string are extractedi, constitute set T*(s1), T*(s2)…T*(sn), wherein suffix string refers to character string S since the i of position to a substring of the end S end mark, even S use C1C2…CnIt indicates, then CiCi+1…CnReferred to as S suffix string, 1≤i≤n;
Second step:Reject set T*(s1),T*(s2)…T*(sn) in all suffix headed by separator ($) or end mark (#) String;
Third step:Traverse T*(si) in all suffix strings, if the initial character of suffix string is identical with the initial character of former character string, or It is identical with the initial character after separator ($) in former character string, then it is added after index ID is added at the suffix string end to T (WBS), Conversely, being then added after index ID is added at the suffix string end to T (NWBS);
(2) suffix set of strings T (WBS) and T (NWBS) are established respectively and improves suffix tree
So-called improvement suffix tree is the mark in each edge to be stored in node, i.e., on the basis of traditional suffix tree every For a node as a storage unit, nodal stored information includes node identification, end mark child node pointer, separator child node Pointer, general child node pointer set and match index ID sequence, wherein node identification is end mark, separator or general character String;
Establishing improvement suffix tree to any suffix set of strings T, specific step is as follows:
The first step:Creation one only includes the improvement suffix tree of a node, the node identification of the node, all child node pointers It is sky with match index ID sequence, this node is denoted as the root node root for improving suffix tree;
Second step:All elements in suffix set of strings T are sequentially inserted into and are improved in suffix tree;The insertion process of each suffix string It is all to find insertion position from root node;
Third step:The match index ID sequence of each node of recurrence Construction;By it is preceding it is found that end mark node match index ID sequence It is listed in when the insertion of whole suffix strings is completed construction complete;Only all non-end mark node N (s) need to be constructed by formula (1) Match index ID sequence Q (N (s)):
Q (N (s))=Q (N (s#)) Q (N (s $)) Q (N (s*)) # (1)
Wherein, N (s#), N (s $) and N (s*) respectively indicate the end mark child node of node N (s), separator child node and all General child node;
(2) stage is searched online, the specific steps are:
(1) match point is inquired
To arbitrary node N (s), from N (s), by formula (2) inquiry string c1…cnMatched node:
Wherein, R (N (s)) indicates that query result, N (s) are matched node, and s is node identification;
Provide inquiry string c1…cn, all child nodes of root node are first looked for, the initial character for finding node identification is equal to c1 Child node N (s), then execute R (N (s), c1…cn), find all match points, finally obtain search result R (N (s))= (S,Q(N(s)));Wherein, Q (N (s)) is the match index ID sequence of N (s);
(2) it sorts to result set
Negentropy is defined to measure inquiry string c1…cnWith the matching degree of search result character string s, entropy is smaller, matches journey It spends lower;Conversely, entropy is bigger, matching degree is higher;
The participle negentropy value for calculating all s, is ranked up result set by its value is descending;
(3) duplicate keys in result set are eliminated and generate search result sequence
The Q (N (s)) for successively taking out result set after sorting, is put into search result sequence, search result sequence after executing corresponding operating Column initial value is sky;It is to Q (N (s that the execution corresponding operating, which is by formula (3),i)) perform the following operations:
SR (i)=(D (Q (N (si)))-SR(i-1))∩SR(i-1),1≤i≤n# (3)
Wherein, SR (i) indicates the match index ID sequence Q (N (s for having merged i-th of nodei)) after search result sequence, SR (1) and SR (n) be respectively search result sequence original state and end-state;D(Q(N(si))) indicate to Q (N (si)) execute Deduplication operation;(D-SR) Q (N (s after duplicate removal is indicatedi)) in the index that had occurred in search result sequence of removal Number;(D-SR) ∩ SR indicates the end that (D-SR) is added to current search result sequence SR;
Finally obtained search result sequence is SR (n).
2. the searching algorithm according to claim 1 towards Chinese word segmentation, which is characterized in that each suffix string is inserted Entering process all is to find insertion position from root node, be divided into following 3 kinds of situations:
Situation is 1.:The suffix string for such as needing to be inserted into has already appeared in present tree, then directly in the match index ID sequence of node Middle addition call number;
Situation is 2.:Prefix if you need to the suffix string of insertion is identical as current existing node, then directly adds node;
Situation is 3.:The suffix string for such as needing to be inserted into is identical as the prefix in present node, then first divides present node, then insert again Enter other nodes.
3. the searching algorithm according to claim 1 towards Chinese word segmentation, which is characterized in that the participle for calculating s is negative The step of entropy, is as follows:If initial entropy is 0;
(a) it obtains from c1Position i in s;
(b) s is traversed backward since i, the ending until encountering separator $ or full stop # or s, it is assumed that period has traversed m A character;
If what is (c) encountered is the ending of s, judge whether last character is full stop #, if it is, negentropy value increases m2, algorithm terminates;Otherwise, negentropy value increases m, and algorithm terminates;
If that (d) encounter is separator $, negentropy value increases m2, algorithm terminates;
(e) i is updated to the position of the latter character of the separator encountered, returns to step (b).
CN201810422499.3A 2018-05-05 2018-05-05 Chinese word segmentation oriented search algorithm Active CN108846016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810422499.3A CN108846016B (en) 2018-05-05 2018-05-05 Chinese word segmentation oriented search algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810422499.3A CN108846016B (en) 2018-05-05 2018-05-05 Chinese word segmentation oriented search algorithm

Publications (2)

Publication Number Publication Date
CN108846016A true CN108846016A (en) 2018-11-20
CN108846016B CN108846016B (en) 2021-08-20

Family

ID=64212741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810422499.3A Active CN108846016B (en) 2018-05-05 2018-05-05 Chinese word segmentation oriented search algorithm

Country Status (1)

Country Link
CN (1) CN108846016B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109686413A (en) * 2018-12-24 2019-04-26 杭州费尔斯通科技有限公司 A kind of chemical molecular formula search method based on es inverted index
CN110597855A (en) * 2019-08-14 2019-12-20 中山大学 Data storage method, terminal equipment and computer readable storage medium
CN112232903A (en) * 2020-09-27 2021-01-15 北京五八信息技术有限公司 Business object display method and device
CN112802553A (en) * 2020-12-29 2021-05-14 北京优迅医疗器械有限公司 Method for comparing genome sequencing sequence and reference genome based on suffix tree algorithm
CN112966505A (en) * 2021-01-21 2021-06-15 哈尔滨工业大学 Method, device and storage medium for extracting persistent hot phrases from text corpus
WO2021139154A1 (en) * 2020-01-10 2021-07-15 百度在线网络技术(北京)有限公司 Data prefetching method and apparatus, electronic device, and computer-readable storage medium
CN113450028A (en) * 2021-08-31 2021-09-28 深圳格隆汇信息科技有限公司 Behavior fund analysis method and system
CN115244539A (en) * 2020-05-18 2022-10-25 谷歌有限责任公司 Word or word segment lemmatization inference method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631929A (en) * 2013-12-09 2014-03-12 江苏金智教育信息技术有限公司 Intelligent prompt method, module and system for search
CN103838783A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Suffix tree clustering method suitable for Chinese web page documents
CN103838785A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Vertical search engine in patent field
CN107844731A (en) * 2016-09-17 2018-03-27 复旦大学 Long-term sequence δ abnormal point detecting methods based on probabilistic suffix tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838783A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Suffix tree clustering method suitable for Chinese web page documents
CN103838785A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Vertical search engine in patent field
CN103631929A (en) * 2013-12-09 2014-03-12 江苏金智教育信息技术有限公司 Intelligent prompt method, module and system for search
CN107844731A (en) * 2016-09-17 2018-03-27 复旦大学 Long-term sequence δ abnormal point detecting methods based on probabilistic suffix tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZENG D等: "Domain-specific Chinese word segmentation using suffix tree and mutual information", 《INFORMATION SYSTEMS FRONTIERS》 *
袁津生等: "改进后缀树的中文检索结果聚类研究", 《计算机工程与应用》 *
韦美峰等: "基于后缀树聚类的主题搜索引擎研究", 《情报理论与实践》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109686413A (en) * 2018-12-24 2019-04-26 杭州费尔斯通科技有限公司 A kind of chemical molecular formula search method based on es inverted index
CN110597855A (en) * 2019-08-14 2019-12-20 中山大学 Data storage method, terminal equipment and computer readable storage medium
CN110597855B (en) * 2019-08-14 2022-03-29 中山大学 Data query method, terminal device and computer readable storage medium
WO2021139154A1 (en) * 2020-01-10 2021-07-15 百度在线网络技术(北京)有限公司 Data prefetching method and apparatus, electronic device, and computer-readable storage medium
CN115244539A (en) * 2020-05-18 2022-10-25 谷歌有限责任公司 Word or word segment lemmatization inference method
US11763083B2 (en) 2020-05-18 2023-09-19 Google Llc Inference methods for word or wordpiece tokenization
CN112232903A (en) * 2020-09-27 2021-01-15 北京五八信息技术有限公司 Business object display method and device
CN112802553A (en) * 2020-12-29 2021-05-14 北京优迅医疗器械有限公司 Method for comparing genome sequencing sequence and reference genome based on suffix tree algorithm
CN112802553B (en) * 2020-12-29 2024-03-15 北京优迅医疗器械有限公司 Suffix tree algorithm-based genome sequencing sequence and reference genome comparison method
CN112966505A (en) * 2021-01-21 2021-06-15 哈尔滨工业大学 Method, device and storage medium for extracting persistent hot phrases from text corpus
CN112966505B (en) * 2021-01-21 2021-10-15 哈尔滨工业大学 Method, device and storage medium for extracting persistent hot phrases from text corpus
CN113450028A (en) * 2021-08-31 2021-09-28 深圳格隆汇信息科技有限公司 Behavior fund analysis method and system

Also Published As

Publication number Publication date
CN108846016B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN108846016A (en) A kind of searching algorithm towards Chinese word segmentation
US11048966B2 (en) Method and device for comparing similarities of high dimensional features of images
US9020951B2 (en) Methods for indexing and searching based on language locale
CN106503223B (en) online house source searching method and device combining position and keyword information
CN111868710A (en) Random extraction forest index structure for searching large-scale unstructured data
KR20090065130A (en) Indexing and searching method for high-demensional data using signature file and the system thereof
JP2012533819A (en) Method and system for document indexing and data querying
CN105404677A (en) Tree structure based retrieval method
CN107229714B (en) Full-text search engine based on distributed database
CN102915381B (en) Visual network retrieval based on multi-dimensional semantic presents system and presents control method
Grossi et al. Encodings for range selection and top-k queries
CN116431837B (en) Document retrieval method and device based on large language model and graph network model
CN108776705B (en) Text full-text accurate query method, device, equipment and readable medium
Zhao et al. Exploiting structured reference data for unsupervised text segmentation with conditional random fields
CN116860991A (en) API recommendation-oriented intent clarification method based on knowledge graph driving path optimization
CN105426490A (en) Tree structure based indexing method
Yadav et al. Wavelet tree based hybrid geo-textual indexing technique for geographical search
JP6495206B2 (en) Document concept base generation device, document concept search device, method, and program
CN110909128B (en) Method, equipment and storage medium for carrying out data query by using root list
CN112199461A (en) Document retrieval method, device, medium and equipment based on block index structure
CN116089599B (en) Information query method, system and storage medium
Li et al. Grouping www image search results by novel inhomogeneous clustering method
Belkasmi et al. 2: A Human-Centric Skyline Relaxation Approach
Huang et al. A Partition-Based Bi-directional Filtering Method for String Similarity JOINs
Kocoń et al. Heterogeneous named entity similarity function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant