CN108846016A - A kind of searching algorithm towards Chinese word segmentation - Google Patents
A kind of searching algorithm towards Chinese word segmentation Download PDFInfo
- Publication number
- CN108846016A CN108846016A CN201810422499.3A CN201810422499A CN108846016A CN 108846016 A CN108846016 A CN 108846016A CN 201810422499 A CN201810422499 A CN 201810422499A CN 108846016 A CN108846016 A CN 108846016A
- Authority
- CN
- China
- Prior art keywords
- suffix
- node
- string
- index
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention belongs to text search engine technical field, specially a kind of searching algorithm towards Chinese word segmentation.Inventive algorithm is broadly divided into two stages:Offline building index stage and online lookup stage.In the building index stage offline, the suffix set of strings of all original character set of strings is extracted first, and improved suffix tree is then generated by suffix set of strings;The stage is being searched online, the query result of keyword is obtained according to the index model based on suffix tree first, then the matching degree of quantized key word and query result, returns after finally query result sorts from high to low by matcher.The present invention passes through a kind of improved efficiency for balancing index construct time and occupied space based on the index structure of suffix tree, being much higher than to result set violence calculating matching degree and sequence using the search efficiency of index structure of the invention.
Description
Technical field
The invention belongs to text search engine technical fields, and in particular to a kind of searching algorithm towards Chinese word segmentation.
Background technique
Search engine is a kind of online information research tool, and a series of search results for meeting user's search key are returned
Back to user.Today's society is how the epoch of an information explosion are quickly accurately positioned user and think facing to countless information
The information wanted is most urgent one of demand, therefore information search technique is also rapidly progressed and applies.
Searching for the most common form is text search, and no matter the target resource of user, which is text, image, audio, even regards
Frequently, as long as the format of input is text, in the range of can summing up in the point that the present invention searches for.Now in addition to Google, must answer, Yahoo
Outside the whole network station function of search of equal offers, the search need of specific area is also increasing.In specific area (for example be only oriented to
TV programme), since the type of resource has limitation, so the condition of search can generally accomplish very clear, other data set
Size also within the acceptable range, many targetedly optimizations can be made to search engine under these premises.
The relevant technologies of Chinese search system mainly have inverted index, forward index, documents signed (DS), suffix tree etc. at present.
Wherein inverted index comprehensive performance is preferable and the most frequently used, but in practical applications, using the big text set of inverted index model treatment
It is all the test of very severe to cpu resource, memory headroom and I/O when conjunction.
Summary of the invention
It is an object of the invention to propose a kind of searching algorithm towards Chinese word segmentation, it is applied to intelligentized Chinese search
Automotive engine system enables rapidly to return to search result according to keyword, and result is sorted from high to low by matching degree
After show user.
Searching algorithm proposed by the present invention towards Chinese word segmentation can be mainly divided into two stages:Offline building index
Stage and online lookup stage.In the building index stage offline, the suffix set of strings of all original character set of strings is extracted first,
Then improved suffix tree is generated by suffix set of strings;The stage is being searched online, first according to the index model based on suffix tree
The query result of keyword is obtained, then the matching degree of quantized key word and query result, finally by query result by matching
Program returns after sorting from high to low.
One, the index stage is constructed offline, the specific steps are:
(1) suffix set of strings is generated by original data set
T (S) indicates original data set composed by the character string S with separator ($) and end mark (#), wherein i-th of word
The index ID of symbol string is i (1≤i≤n).Assuming that WBS indicate from separator suffix string, NWBS indicate not from separator
Locate the suffix string started.By the suffix set of strings T (WBS) and T (NWBS) of T (S) generation tape index ID, specific step is as follows:
The first step:All character strings in T (S) are traversed, all suffix string s of each character string are extractedi, constitute set T*
(s1),T*(s2)…T*(sn)[1].Wherein suffix string refer to character string S since the i of position to a substring of the end S end mark,
Even S C1C2…CnIt indicates, then CiCi+1…CnReferred to as S suffix string (1≤i≤n);
Second step:Reject T*(s1),T*(s2)…T*(sn) in all suffix headed by separator ($) or end mark (#)
String;
Third step:Traverse T*(si) in all suffix strings, if the initial character of suffix string is identical with the initial character of former character string,
Or it is identical with the initial character after separator ($) in former character string, then it is added after index ID is added at the suffix string end to T
(WBS), conversely, being then added after index ID is added at the suffix string end to T (NWBS).
(2) suffix set of strings T (WBS) and T (NWBS) are established respectively and improves suffix tree
Improving suffix tree is that the mark in each edge is stored in node on the basis of traditional suffix tree [1].That is handle
For each node as a storage unit, structure is as shown in Figure 1.Nodal stored information includes node identification, end mark section
Point pointer, separator child node pointer, general child node pointer set and match index ID sequence, wherein node identification is to terminate
Symbol, separator or general character string.
Establishing improvement suffix tree to any suffix set of strings T, specific step is as follows:
The first step:Creation one only includes the improvement suffix tree of a node, the node identification of the node, all child nodes
Pointer and match index ID sequence are sky, this node are denoted as the root node root for improving suffix tree.
Second step:All elements in suffix set of strings T are sequentially inserted into and are improved in suffix tree.The insertion of each suffix string
Process is all to find insertion position from root node.
Using the improvement suffix tree in Fig. 2 as example, following three kinds of situations are divided into when sewing string after such insertion:
Situation is 1.:The suffix string for such as needing to be inserted into has already appeared in present tree, then directly in the match index ID of node
Call number is added in sequence.Such as the suffix string to be inserted into be " student #2 ", due to " student " node in present tree
Through occurring, therefore call number is added directly in the match index ID sequence of node, as a result as shown in Fig. 3 (a).
Situation is 2.:Prefix if you need to the suffix string of insertion is identical as current existing node, then is to need directly addition node
?.Such as the suffix string for needing to be inserted into is " Fudan University $ student #3 ", since " Fudan University " node has existed, so directly adding
Node " student " and " # ", as a result as shown in Fig. 3 (b).
Situation is 3.:The suffix string for such as needing to be inserted into is identical as the prefix in present node, then first divides present node, then
It is inserted into other nodes.Such as the suffix string for needing to be inserted into be " big student #4 ", due to suffix string prefix " big " with work as prosthomere
Prefix in point " university " is identical, so needing first to divide present node, is then inserted into other nodes, as a result such as Fig. 3 (c) institute
Show.
Third step:The match index ID sequence of each node of recurrence Construction.By it is preceding it is found that end mark node match index
ID sequence whole suffix strings be inserted into complete when construction complete.Therefore, all non-end mark node N (s) need to only be constructed
Match index ID sequence Q (N (s)), shown in specific method such as formula (1):
Q (N (s))=Q (N (s#)) Q (N (s $)) Q (N (s*)) # (1)
Wherein, N (s#), N (s $) and N (s*) respectively indicate the end mark child node of node N (s), separator child node and
All general child nodes.
Two, the stage is searched online, the specific steps are:
(1) match point is inquired
To arbitrary node N (s), from N (s), inquiry string c1…cnMatched node process such as formula (2) institute
Show:
Wherein, R (N (s)) indicates that query result, N (s) are matched node, and s is node identification.
Provide inquiry string c1…cn, all child nodes of root node are first looked for, the initial character etc. of node identification is found
In c1Child node N (s), then execute R (N (s), c1…cn), all match points are found, search result R (N (s)) is finally obtained
=(S, Q (N (s))).Wherein, Q (N (s)) is the match index ID sequence of N (s).
(2) it sorts to result set
Negentropy is defined to measure inquiry string c1…cnWith the matching degree of search result character string s, entropy is smaller,
It is lower with degree;Conversely, entropy is bigger, matching degree is higher.
Assuming that the computational algorithm of negentropy value is following (initial entropy is 0):
(a) it obtains from c1Position i in s;
(b) s is traversed backward since i, the ending until encountering separator $ or full stop # or s, it is assumed that period time
M character is gone through;
If what is (c) encountered is the ending of s, judge whether last character is full stop #, if it is, negentropy value
Increase m2, algorithm terminates;Otherwise, negentropy value increases m, and algorithm terminates;
If that (d) encounter is separator $, negentropy value increases m2, algorithm terminates;
(e) i is updated to the position of the latter character of the separator encountered, is returned to (b).
The participle negentropy value that all s are concentrated according to above step calculated result is worth descending to result set progress by it
Sequence.
(3) duplicate keys in result set are eliminated and generate search result sequence
The Q (N (s)) for successively taking out result set after sorting, is put into search result sequence after executing corresponding operating, search knot
Infructescence column initial value is sky.Formula (3) is to Q (N (si)) execute concrete operations:
SR (i)=(D (Q (N (si)))-SR(i-1))∩SR(i-1),1≤i≤n# (3)
Wherein, SR (i) indicates the match index ID sequence Q (N (s for having merged i-th of nodei)) after search result sequence
Column, SR (1) and SR (n) are respectively the original state and end-state of search result sequence;D(Q(N(si))) indicate to Q (N
(si)) execute deduplication operation;(D-SR) Q (N (s after duplicate removal is indicatedi)) in removal occurred in search result sequence
Call number;(D-SR) ∩ SR indicates the end that (D-SR) is added to current search result sequence SR.
After above-mentioned steps, finally obtained search result sequence is SR (n).
The present invention is balanced the index construct time and accounted for based on the index structure of suffix tree by the way that one kind is improved come good
With space, it is much higher than the effect that matching degree is calculated to result set violence and is sorted using the search efficiency of index structure of the invention
Rate, and it is compared to the fuzzy search of other full-text index structures realization, when index structure of the invention uses less building
Between and committed memory cost while can have very high search efficiency.
Detailed description of the invention
Fig. 1:Improve the structure chart of suffix tree node.
Fig. 2:Improve suffix tree exemplary diagram.
Fig. 3:Different situations comparison diagram when being inserted into suffix string.
Specific embodiment
For search performance of the research present invention on different size data set, we construct respectively data volume be 10000,
20000,50000,100000 and 200,000 five data sets, and on each data set with the Lucene engine based on inverted list
Carry out multiple groups comparative experiments.
The random length that generates is search string each 25 that 2-4 is not waited, collectively forms 75 kinds of search strings.For every
A kind of search string all carries out 100000 search, and under the premise of search result is correct, the time that record is searched for every time disappears
Consumption.
In order to which Lucene can complete to index identical task with the present invention, when establishing initial index in initial
Space is added in each intercharacter of sequence, and making each character is considered as a word, in each intercharacter of search string
Also space is added, to realize the identical function of search of the present invention.
Experimental result is as shown in table 1:
1 present invention index of table and the comparison of Lucene indexed search time
By table as it can be seen that inventive algorithm suffers from search efficiency more better than Lucene on any data set, and tie
Fruit is more obvious in small data set, can be with using the search efficiency of inventive algorithm in the case where data set is less than 50000
Reach 7-10 times of Lucene.
With reference to selected works:
[1]E.Ukkonen,On-Line Construction of Suffix Trees,Algorithmica,14
(1995),249-260。
Claims (3)
1. a kind of searching algorithm towards Chinese word segmentation, which is characterized in that be divided into two stages:The offline building index stage and
Line searches the stage;
(1) the index stage is constructed offline, the specific steps are:
(1) suffix set of strings is generated by original data set
T (S) indicates original data set composed by the character string S with separator ($) and end mark (#), wherein i-th of character string
Index ID be i, 1≤i≤n, it is assumed that WBS indicate from separator suffix string, NWBS expression do not opened from separator
The suffix string of beginning;By the suffix set of strings T (WBS) and T (NWBS) of T (S) generation tape index ID, specific step is as follows:
The first step:All character strings in T (S) are traversed, all suffix string s of each character string are extractedi, constitute set T*(s1),
T*(s2)…T*(sn), wherein suffix string refers to character string S since the i of position to a substring of the end S end mark, even S use
C1C2…CnIt indicates, then CiCi+1…CnReferred to as S suffix string, 1≤i≤n;
Second step:Reject set T*(s1),T*(s2)…T*(sn) in all suffix headed by separator ($) or end mark (#)
String;
Third step:Traverse T*(si) in all suffix strings, if the initial character of suffix string is identical with the initial character of former character string, or
It is identical with the initial character after separator ($) in former character string, then it is added after index ID is added at the suffix string end to T (WBS),
Conversely, being then added after index ID is added at the suffix string end to T (NWBS);
(2) suffix set of strings T (WBS) and T (NWBS) are established respectively and improves suffix tree
So-called improvement suffix tree is the mark in each edge to be stored in node, i.e., on the basis of traditional suffix tree every
For a node as a storage unit, nodal stored information includes node identification, end mark child node pointer, separator child node
Pointer, general child node pointer set and match index ID sequence, wherein node identification is end mark, separator or general character
String;
Establishing improvement suffix tree to any suffix set of strings T, specific step is as follows:
The first step:Creation one only includes the improvement suffix tree of a node, the node identification of the node, all child node pointers
It is sky with match index ID sequence, this node is denoted as the root node root for improving suffix tree;
Second step:All elements in suffix set of strings T are sequentially inserted into and are improved in suffix tree;The insertion process of each suffix string
It is all to find insertion position from root node;
Third step:The match index ID sequence of each node of recurrence Construction;By it is preceding it is found that end mark node match index ID sequence
It is listed in when the insertion of whole suffix strings is completed construction complete;Only all non-end mark node N (s) need to be constructed by formula (1)
Match index ID sequence Q (N (s)):
Q (N (s))=Q (N (s#)) Q (N (s $)) Q (N (s*)) # (1)
Wherein, N (s#), N (s $) and N (s*) respectively indicate the end mark child node of node N (s), separator child node and all
General child node;
(2) stage is searched online, the specific steps are:
(1) match point is inquired
To arbitrary node N (s), from N (s), by formula (2) inquiry string c1…cnMatched node:
Wherein, R (N (s)) indicates that query result, N (s) are matched node, and s is node identification;
Provide inquiry string c1…cn, all child nodes of root node are first looked for, the initial character for finding node identification is equal to c1
Child node N (s), then execute R (N (s), c1…cn), find all match points, finally obtain search result R (N (s))=
(S,Q(N(s)));Wherein, Q (N (s)) is the match index ID sequence of N (s);
(2) it sorts to result set
Negentropy is defined to measure inquiry string c1…cnWith the matching degree of search result character string s, entropy is smaller, matches journey
It spends lower;Conversely, entropy is bigger, matching degree is higher;
The participle negentropy value for calculating all s, is ranked up result set by its value is descending;
(3) duplicate keys in result set are eliminated and generate search result sequence
The Q (N (s)) for successively taking out result set after sorting, is put into search result sequence, search result sequence after executing corresponding operating
Column initial value is sky;It is to Q (N (s that the execution corresponding operating, which is by formula (3),i)) perform the following operations:
SR (i)=(D (Q (N (si)))-SR(i-1))∩SR(i-1),1≤i≤n# (3)
Wherein, SR (i) indicates the match index ID sequence Q (N (s for having merged i-th of nodei)) after search result sequence, SR
(1) and SR (n) be respectively search result sequence original state and end-state;D(Q(N(si))) indicate to Q (N (si)) execute
Deduplication operation;(D-SR) Q (N (s after duplicate removal is indicatedi)) in the index that had occurred in search result sequence of removal
Number;(D-SR) ∩ SR indicates the end that (D-SR) is added to current search result sequence SR;
Finally obtained search result sequence is SR (n).
2. the searching algorithm according to claim 1 towards Chinese word segmentation, which is characterized in that each suffix string is inserted
Entering process all is to find insertion position from root node, be divided into following 3 kinds of situations:
Situation is 1.:The suffix string for such as needing to be inserted into has already appeared in present tree, then directly in the match index ID sequence of node
Middle addition call number;
Situation is 2.:Prefix if you need to the suffix string of insertion is identical as current existing node, then directly adds node;
Situation is 3.:The suffix string for such as needing to be inserted into is identical as the prefix in present node, then first divides present node, then insert again
Enter other nodes.
3. the searching algorithm according to claim 1 towards Chinese word segmentation, which is characterized in that the participle for calculating s is negative
The step of entropy, is as follows:If initial entropy is 0;
(a) it obtains from c1Position i in s;
(b) s is traversed backward since i, the ending until encountering separator $ or full stop # or s, it is assumed that period has traversed m
A character;
If what is (c) encountered is the ending of s, judge whether last character is full stop #, if it is, negentropy value increases
m2, algorithm terminates;Otherwise, negentropy value increases m, and algorithm terminates;
If that (d) encounter is separator $, negentropy value increases m2, algorithm terminates;
(e) i is updated to the position of the latter character of the separator encountered, returns to step (b).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810422499.3A CN108846016B (en) | 2018-05-05 | 2018-05-05 | Chinese word segmentation oriented search algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810422499.3A CN108846016B (en) | 2018-05-05 | 2018-05-05 | Chinese word segmentation oriented search algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108846016A true CN108846016A (en) | 2018-11-20 |
CN108846016B CN108846016B (en) | 2021-08-20 |
Family
ID=64212741
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810422499.3A Active CN108846016B (en) | 2018-05-05 | 2018-05-05 | Chinese word segmentation oriented search algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108846016B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109686413A (en) * | 2018-12-24 | 2019-04-26 | 杭州费尔斯通科技有限公司 | A kind of chemical molecular formula search method based on es inverted index |
CN110597855A (en) * | 2019-08-14 | 2019-12-20 | 中山大学 | Data storage method, terminal equipment and computer readable storage medium |
CN112232903A (en) * | 2020-09-27 | 2021-01-15 | 北京五八信息技术有限公司 | Business object display method and device |
CN112802553A (en) * | 2020-12-29 | 2021-05-14 | 北京优迅医疗器械有限公司 | Method for comparing genome sequencing sequence and reference genome based on suffix tree algorithm |
CN112966505A (en) * | 2021-01-21 | 2021-06-15 | 哈尔滨工业大学 | Method, device and storage medium for extracting persistent hot phrases from text corpus |
WO2021139154A1 (en) * | 2020-01-10 | 2021-07-15 | 百度在线网络技术(北京)有限公司 | Data prefetching method and apparatus, electronic device, and computer-readable storage medium |
CN113450028A (en) * | 2021-08-31 | 2021-09-28 | 深圳格隆汇信息科技有限公司 | Behavior fund analysis method and system |
CN115244539A (en) * | 2020-05-18 | 2022-10-25 | 谷歌有限责任公司 | Word or word segment lemmatization inference method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103631929A (en) * | 2013-12-09 | 2014-03-12 | 江苏金智教育信息技术有限公司 | Intelligent prompt method, module and system for search |
CN103838783A (en) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | Suffix tree clustering method suitable for Chinese web page documents |
CN103838785A (en) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | Vertical search engine in patent field |
CN107844731A (en) * | 2016-09-17 | 2018-03-27 | 复旦大学 | Long-term sequence δ abnormal point detecting methods based on probabilistic suffix tree |
-
2018
- 2018-05-05 CN CN201810422499.3A patent/CN108846016B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103838783A (en) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | Suffix tree clustering method suitable for Chinese web page documents |
CN103838785A (en) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | Vertical search engine in patent field |
CN103631929A (en) * | 2013-12-09 | 2014-03-12 | 江苏金智教育信息技术有限公司 | Intelligent prompt method, module and system for search |
CN107844731A (en) * | 2016-09-17 | 2018-03-27 | 复旦大学 | Long-term sequence δ abnormal point detecting methods based on probabilistic suffix tree |
Non-Patent Citations (3)
Title |
---|
ZENG D等: "Domain-specific Chinese word segmentation using suffix tree and mutual information", 《INFORMATION SYSTEMS FRONTIERS》 * |
袁津生等: "改进后缀树的中文检索结果聚类研究", 《计算机工程与应用》 * |
韦美峰等: "基于后缀树聚类的主题搜索引擎研究", 《情报理论与实践》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109686413A (en) * | 2018-12-24 | 2019-04-26 | 杭州费尔斯通科技有限公司 | A kind of chemical molecular formula search method based on es inverted index |
CN110597855A (en) * | 2019-08-14 | 2019-12-20 | 中山大学 | Data storage method, terminal equipment and computer readable storage medium |
CN110597855B (en) * | 2019-08-14 | 2022-03-29 | 中山大学 | Data query method, terminal device and computer readable storage medium |
WO2021139154A1 (en) * | 2020-01-10 | 2021-07-15 | 百度在线网络技术(北京)有限公司 | Data prefetching method and apparatus, electronic device, and computer-readable storage medium |
CN115244539A (en) * | 2020-05-18 | 2022-10-25 | 谷歌有限责任公司 | Word or word segment lemmatization inference method |
US11763083B2 (en) | 2020-05-18 | 2023-09-19 | Google Llc | Inference methods for word or wordpiece tokenization |
CN112232903A (en) * | 2020-09-27 | 2021-01-15 | 北京五八信息技术有限公司 | Business object display method and device |
CN112802553A (en) * | 2020-12-29 | 2021-05-14 | 北京优迅医疗器械有限公司 | Method for comparing genome sequencing sequence and reference genome based on suffix tree algorithm |
CN112802553B (en) * | 2020-12-29 | 2024-03-15 | 北京优迅医疗器械有限公司 | Suffix tree algorithm-based genome sequencing sequence and reference genome comparison method |
CN112966505A (en) * | 2021-01-21 | 2021-06-15 | 哈尔滨工业大学 | Method, device and storage medium for extracting persistent hot phrases from text corpus |
CN112966505B (en) * | 2021-01-21 | 2021-10-15 | 哈尔滨工业大学 | Method, device and storage medium for extracting persistent hot phrases from text corpus |
CN113450028A (en) * | 2021-08-31 | 2021-09-28 | 深圳格隆汇信息科技有限公司 | Behavior fund analysis method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108846016B (en) | 2021-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108846016A (en) | A kind of searching algorithm towards Chinese word segmentation | |
US11048966B2 (en) | Method and device for comparing similarities of high dimensional features of images | |
US9020951B2 (en) | Methods for indexing and searching based on language locale | |
CN106503223B (en) | online house source searching method and device combining position and keyword information | |
CN111868710A (en) | Random extraction forest index structure for searching large-scale unstructured data | |
KR20090065130A (en) | Indexing and searching method for high-demensional data using signature file and the system thereof | |
JP2012533819A (en) | Method and system for document indexing and data querying | |
CN105404677A (en) | Tree structure based retrieval method | |
CN107229714B (en) | Full-text search engine based on distributed database | |
CN102915381B (en) | Visual network retrieval based on multi-dimensional semantic presents system and presents control method | |
Grossi et al. | Encodings for range selection and top-k queries | |
CN116431837B (en) | Document retrieval method and device based on large language model and graph network model | |
CN108776705B (en) | Text full-text accurate query method, device, equipment and readable medium | |
Zhao et al. | Exploiting structured reference data for unsupervised text segmentation with conditional random fields | |
CN116860991A (en) | API recommendation-oriented intent clarification method based on knowledge graph driving path optimization | |
CN105426490A (en) | Tree structure based indexing method | |
Yadav et al. | Wavelet tree based hybrid geo-textual indexing technique for geographical search | |
JP6495206B2 (en) | Document concept base generation device, document concept search device, method, and program | |
CN110909128B (en) | Method, equipment and storage medium for carrying out data query by using root list | |
CN112199461A (en) | Document retrieval method, device, medium and equipment based on block index structure | |
CN116089599B (en) | Information query method, system and storage medium | |
Li et al. | Grouping www image search results by novel inhomogeneous clustering method | |
Belkasmi et al. | 2: A Human-Centric Skyline Relaxation Approach | |
Huang et al. | A Partition-Based Bi-directional Filtering Method for String Similarity JOINs | |
Kocoń et al. | Heterogeneous named entity similarity function |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |