WO2012151781A1 - Inverted index intersection method - Google Patents

Inverted index intersection method Download PDF

Info

Publication number
WO2012151781A1
WO2012151781A1 PCT/CN2011/076841 CN2011076841W WO2012151781A1 WO 2012151781 A1 WO2012151781 A1 WO 2012151781A1 CN 2011076841 W CN2011076841 W CN 2011076841W WO 2012151781 A1 WO2012151781 A1 WO 2012151781A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
result
inverted
docid
inverted index
Prior art date
Application number
PCT/CN2011/076841
Other languages
French (fr)
Chinese (zh)
Inventor
刘晓光
王刚
敖耐勇
吴迪
张帆
Original Assignee
南开大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南开大学 filed Critical 南开大学
Publication of WO2012151781A1 publication Critical patent/WO2012151781A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/328Management therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An inverted index intersection method. The method includes: pre-processing, for each inverted list, constructing a two-dimensional scatter diagram by taking the index of the docID as the horizontal ordinate and the value thereof as the vertical ordinate, generating a linear regression straight line based on the least square method to get the minimum sum of squares of the vertical deviation from all the points in the diagram to the straight line, deriving a left secure search distance and a right secure search distance, and saving the derived linear regression information. The inverted index intersection determines the secure search range of the docID to be found in the inverted list according to the saved linear regression information about the inverted list, then applies a certain existing search method to search within the range. The inverted index intersection method of the present invention can narrow the search range, decrease the search time, shorten the response time of the search engine, and improve user experience.

Description

倒排索引求交方法  Inverted index intersection method
【技术领域】 [Technical Field]
本发明属于倒排索引技术领域, 特别涉及倒排索引求交的方法。  The invention belongs to the technical field of inverted index, and particularly relates to a method for intersecting index intersection.
【背景技术】  【Background technique】
搜索引擎中使用最广泛的数据结构是倒排索引, 它由字典和倒排列表两部分组成。 其中 字典为关键词和倒排列表之间建立一一对应关系, 而倒排列表由一系列称为张贴的基本单元 组成。 每个张贴由包含相应关键词的网页的文档标识符 (称为 docID) 、 频率和位置等信息 组成。 在本发明中, 我们假设每个倒排列表仅由一系列 docID组成。  The most widely used data structure in search engines is the inverted index, which consists of a dictionary and an inverted list. The dictionary has a one-to-one correspondence between keywords and inverted lists, and the inverted list consists of a series of basic units called postings. Each post consists of information such as the document identifier (called docID), frequency, and location of the web page containing the corresponding keyword. In the present invention, we assume that each inverted list consists of only a series of docIDs.
参阅图 1, 示出了现有搜索引擎的处理流程, 具体步骤如下所述:  Referring to Figure 1, the processing flow of the existing search engine is shown. The specific steps are as follows:
步骤 S101、 获取用户查询请求。搜索引擎不断接收用户查询请求, 然后对查询进行分词, 得到与其对应的关键词。  Step S101: Acquire a user query request. The search engine continuously receives the user query request, and then segments the query to obtain the corresponding keywords.
步骤 S102、 对查询请求对应的倒排列表进行求交。 通过倒排索引中的字典找到查询的关 键词对应的倒排列表, 并且对它们进行求交。  Step S102: Perform an intersection on the inverted list corresponding to the query request. The inverted list corresponding to the keyword of the query is found by the dictionary in the inverted index, and they are intersected.
步骤 S103、 将求交结果按某种方式返回给用户。  Step S103: Return the result of the intersection to the user in a certain manner.
二分搜索、 插值搜索以及基于跳表的搜索, 是步骤 S102中最常用的搜索方法。 在整个处 理流程中 S102占用时间较多, 是我们优化的主要对象。  The binary search, the interpolation search, and the jump table based search are the most commonly used search methods in step S102. S102 takes up more time in the entire process, which is the main object of our optimization.
【发明内容】  [Summary of the Invention]
本发明的目的是针对现有的倒排索引求交方法占用时间较多的不足, 提供一种新型的基 于线性回归的倒排索引求交方法。  The object of the present invention is to provide a novel inverted regression index intersection method based on linear regression, in view of the shortcomings of the existing inverted index intersection method.
本发明提供的倒排索引求交方法, 包括:  The inverted index intersection method provided by the present invention includes:
第 1、 离线预处理:  First, offline preprocessing:
对每个倒排列表 ^, 以 docID 的索弓 | 为横坐标、 值) ^为纵坐标作二维散点图, 其中  For each inverted list ^, with the docID of the bow | for the abscissa, value) ^ for the ordinate as a two-dimensional scatter plot, where
>2, ^为非负整数, 基于最小二乘 >2, ^ is a non-negative integer, based on least squares
Figure imgf000003_0001
差 _ ;(0的平方和 (}^-/;( 2最小, 求出左安全搜索距离 L^max,^-1^,)-^和右安全 搜索距离 R^max^'- /;- ^, , 保存所求出的线性回归信息《, β,, L BR, (步骤 S201) ; 第 2、 倒排索引求交方法, 具体步骤是: (1)对于包含 个关键词 ί ,.,.Α的查询, 为正整数且 ≥2, 对应倒排列表 1),^2),...,^ 包含的(10(:10个数呈非降序, 初始化 docID索引 ζ' = 1, 关键词索引 7 = 2, 结果集合 Α = , 其中 = 1,2,...,|^(^|, 2< j<k (步骤 S401) ;
Figure imgf000003_0001
Difference _ ; (the sum of squares of 0 (}^-/; ( 2 is the smallest, find the left safe search distance L^max, ^- 1 ^,) - ^ and the right safe search distance R ^ max ^ '- /;- ^, , save the obtained linear regression information ", β,, L BR, (step S201); the second, inverted index intersection method, the specific steps are: (1) For a query containing a keyword ί , .,.Α, it is a positive integer and ≥ 2, corresponding to the inverted list 1), ^ 2 ), ..., ^ contains (1 0 (: 10 numbers) In non-descending order, initialize docID index ζ ' = 1, keyword index 7 = 2, result set Α = , where = 1, 2, ..., | ^ (^|, 2 < j < k (step S401);
(2)根据第 1步离线预处理已保存的 (t )的线性回归信息,确定 中的第 i个元素 在 中的安全搜索范围 ^ Q t- yi)-Lti\ n\\l{t])\ t yl) + Rt (步骤 S402) ; (2) According to the first step, the linear regression information of the saved (t) is preprocessed offline, and the safe search range of the i-th element in the determination is determined ^ Q t - y i )-L ti \ n\\l{ t ] )\ t y l ) + R t (step S402);
(3)采用已有的某种搜索方法, 确定) ^是否在步骤 S402 确定的安全搜索范围中 (步骤 S403) ; (3) using an existing search method to determine whether ^ is in the secure search range determined in step S402 (step S403);
(4)若步骤 S403的结果为是, 则检查 '< 是否成立 (步骤 S404) ;  (4) If the result of step S403 is YES, it is checked whether '< is established (step S404);
(5)若步骤 S404的结果为是, 则 + +且返回步骤 S402 (步骤 S405) ; (5) If the result of step S404 is YES, then + + and return to step S402 (step S405);
(6)若步骤 S404的结果为否, 则保存 y;到集合 A中且执行步骤 S407 (步骤 S406) ;(6) If the result of step S404 is no, save y; go to set A and perform step S407 (step S406);
(7)若步骤 S403的结果为否, 则执行步骤 S407; (7) If the result of step S403 is no, step S407 is performed;
(8)检查 <|^(^)|是否成立 (步骤 S407) ;  (8) Check if <|^(^)| is established (step S407);
(9)若步骤 S407的结果为是, 则 + +, j' = 2且返回步骤 S402 (步骤 S408) ; (9) If the result of step S407 is YES, then + +, j' = 2 and return to step S402 (step S408);
(10)若步骤 S407的结果为否, 则结束搜索, 并将 A作为最终结果集 (步骤 S409) 本发明的优点和有益效果:  (10) If the result of the step S407 is NO, the search is ended, and A is taken as the final result set (step S409). Advantages and advantages of the present invention:
基于线性回归的倒排索引求交方法可以收缩搜索范围, 减小搜索时间, 提高用户体验。  The inverted index intersection method based on linear regression can narrow the search range, reduce the search time, and improve the user experience.
【附图说明】 [Description of the Drawings]
图 1为搜索引擎处理流程图。  Figure 1 shows the flow chart of the search engine processing.
图 2为本发明的倒排索引求交方法的预处理方法的实施例图。  2 is a diagram showing an embodiment of a preprocessing method of an inverted index intersection method according to the present invention.
图 3为本发明的倒排索引求交方法的原理图。  FIG. 3 is a schematic diagram of the inverted index intersection method of the present invention.
图 4为本发明的倒排索引求交方法的实施例的流程图。  4 is a flow chart of an embodiment of an inverted index intersection method of the present invention.
图 5为不同倒排索引数据集上的平均拟合优度和平均收缩率。  Figure 5 shows the average goodness of fit and average shrinkage on different inverted index data sets.
图 6为 GOV数据上二分搜索和本发明的倒排索引求交方法的响应时间图。  6 is a response time diagram of a binary search on GOV data and an inverted index intersection method of the present invention.
【具体实施方式】 【Detailed ways】
为便于理解本发明的上述目的、 特征和优点, 下面结合附图和具体实施方式对本发明作 进一步的详细说明。 实施例 1 The present invention will be further described in detail below with reference to the drawings and specific embodiments. Example 1
参阅图 2, 示出了本发明的倒排索引求交方法的预处理方法的实施例图, 具体步骤如下所 步骤 S201、 对每个倒排列表 ), 以 docID的索引 为横坐标、 值 X.为纵坐标作二维散点 图, 其中 = l,2,...,k( 表示 i(t)包含的 docID个数且 | )|≥2, X.为非负整数, 基于 最 atReferring to FIG. 2, an embodiment of a preprocessing method of the inverted index intersection method of the present invention is shown. The specific steps are as follows: step S201, for each inverted list, and the index of the docID is the abscissa and the value X. A two-dimensional scatter plot for the ordinate, where = l, 2,...,k (represents the number of docIDs contained in i(t) and | )| ≥ 2, X. is a non-negative integer, based on the most a t away
Figure imgf000005_0001
all
Figure imgf000005_0001
搜索距离 R^max, ^y , 保存所求出的线性回归信息《, β,, L和 R。 The search distance R^max, ^y , saves the obtained linear regression information ", β, L and R.
J-vi , J -vi,
定义 R ∑(y -Ϋ) , R2被称为拟合优度, 显然 o≤i?2≤iDefine R ∑(y -Ϋ), R 2 is called goodness of fit, obviously o ≤ i? 2 ≤ i
Figure imgf000005_0002
Figure imgf000005_0002
它是回归因变量 Y与自变量 I之间的样本相关系数的平方。 因为相关系数是两个量之间线性 相关程度的一种度量, 因此 R2越接近 1, 就表示回归方程与数据拟合得越好, 测试数据有更 好的线性特征。 参阅图 3, 示出了基于所述预处理方法的倒排索引求交方法的基本原理。 给定倒排列表 i(t)和其线性回归直线;^ = ;(/) = «,+ , 水平向左距离回归直线 、 水平向右距离回归直 线 分别做回归直线的平行线, 显然 i(t)中所有点 (ί,> 都在两条平行线之间。 也就是说, 如 果要在 i(t)中搜索 docID y, 显然在 ;― ^y)处向左不超过 L、 向右不超过 的范围内, 我们 可以确定 y是否在 i(t)中。 再考虑到 i(t)本身的左右边界 0和^ t)|, 我们可以得到最终的搜 索范围为 + Rt)It is the square of the sample correlation coefficient between the regression dependent variable Y and the independent variable I. Since the correlation coefficient is a measure of the degree of linear correlation between the two quantities, the closer R 2 is to 1, the better the regression equation fits the data, and the test data has better linear characteristics. Referring to FIG. 3, the basic principle of the inverted index intersection method based on the preprocessing method is shown. Given the inverted list i(t) and its linear regression line; ^ = ;(/) = «,+ , horizontal to the left distance regression line, horizontal to right distance regression line respectively make the parallel line of the regression line, obviously i ( All points in t) (ί,> are between two parallel lines. That is, if you want to search for docID y in i(t), obviously at ;; ^y), no more than L, right to the left Within the range not exceeding, we can determine if y is in i(t). Considering the left and right boundaries of i(t) itself and ^ t)|, we can get the final search range of + R t )
Figure imgf000005_0003
实施例 2
Figure imgf000005_0003
Example 2
参阅图 4, 示出了本发明的倒排索引求交方法的实施例的流程图, 具体步骤如下所述: (1)对于包含 个关键词 ^^..., 的查询, 为正整数且 ≥2, 对应倒排列表 , (ί2),..., ( 包含的 docID个数呈非降序, 初始化 docID索引 z' = l, 关键词索引 7 = 2, 结果集合 A = , 其中 = 1,2,...,|^( , 2< j<k (步骤 S401) ; (2)根据步骤 S201离线预处理已保存的 t )的线性回归信息,确定 中的第 i个元素 在 中的安全搜索范围 ^ Q t- yi)-Lti\ n\\l{t])\ t yl) + Rt (步骤 S402); Referring to FIG. 4, a flowchart of an embodiment of the inverted index intersection method of the present invention is shown. The specific steps are as follows: (1) For a query containing a keyword ^^..., it is a positive integer and ≥2, corresponding inverted list , (ί 2 ),..., (the number of docIDs included is non-descending, initializing the docID index z' = l, keyword index 7 = 2, result set A = , where = 1,2,..., |^( , 2<j<k (step S401); (2) offline preprocessing the saved linear regression information according to step S201, determining the safe search range of the i-th element in the ^ t t - y i )-L ti \ n\\l{t ] )\ t y l ) + R t (step S402);
(3)采用已有的某种搜索方法, 确定) ^是否在步骤 S402 确定的安全搜索范围中 (步骤 S403) ; (3) using an existing search method to determine whether ^ is in the secure search range determined in step S402 (step S403);
(4)若步骤 S403的结果为是, 则检查 '< 是否成立 (步骤 S404) ;  (4) If the result of step S403 is YES, it is checked whether '< is established (step S404);
(5)若步骤 S404的结果为是, 则 + +且返回步骤 S402 (步骤 S405) ; (5) If the result of step S404 is YES, then + + and return to step S402 (step S405);
(6)若步骤 S404的结果为否, 则保存 y;到集合 A中且执行步骤 S407 (步骤 S406) ;(6) If the result of step S404 is no, save y; go to set A and perform step S407 (step S406);
(7)若步骤 S403的结果为否, 则执行步骤 S407; (7) If the result of step S403 is no, step S407 is performed;
(8)检查 是否成立 (步骤 S407) ;  (8) Check if it is established (step S407);
(9)若步骤 S407的结果为是, 则 + +, j' = 2且返回步骤 S402 (步骤 S408) ; (9) If the result of step S407 is YES, then + +, j' = 2 and return to step S402 (step S408);
(10)若步骤 S407的结果为否, 则结束搜索, 并将 A作为最终结果集 (步骤 S409) 。 对于现有的搜索方法, 在倒排列表 i(t)中的搜索范围大小为 (t)|; 对于本发明的基于线 注回归的倒排索引求交方法, 在倒排列表 i(t)中的搜索范围大小为(Lf +Rf)。 我们定义 (10) If the result of step S407 is NO, the search is ended, and A is taken as the final result set (step S409). For the existing search method, the search range size in the inverted list i(t) is (t)|; for the line-return regression-based inverted index intersection method of the present invention, in the inverted list i(t) The search range in size is (L f +R f ). We define
^ = (^+ ^/1 ^1为倒排列表 的收缩率。 cr越小, 安全搜索范围就越小, 本发明的基于 线性回归的倒排索弓 I求交方法就越快。 参阅图 5, 示出了在各种倒排索引数据集上不同长度范围内的倒排列表的平均拟合优度 R2和平均收缩率 cr。 对所用倒排索引数据集做如下说明: ^ = (^+ ^/1 ^1 is the shrinkage rate of the inverted list. The smaller the cr, the smaller the safe search range, and the faster the linear regression-based inverted line I intersection method of the present invention. 5, showing the average goodness of fit R 2 and the average shrinkage rate cr of the inverted list in different length ranges on various inverted index data sets. The following uses the inverted index data set as follows:
(1) GOV和 GOV2分别表示在 2002年和 2004年从 .gov域名抓取下来的数据集, BD表 示从百度公司获得的数据集;  (1) GOV and GOV2 represent data sets captured from .gov domain names in 2002 and 2004, respectively, and BD represents data sets obtained from Baidu.
(2) GOVPR表示根据 PageRank对 GOV数据集进行重排后得到的数据集;  (2) GOVPR indicates a data set obtained by rearranging the GOV data set according to PageRank;
(3) GOVR、 GOV2R和 BDR分别表示使用 Fisher- Yates方法对 GOV、 GOV2和 BD进行 随机重排后得到的数据集。  (3) GOVR, GOV2R, and BDR represent data sets obtained by randomly rearranging GOV, GOV2, and BD using the Fisher-Yates method, respectively.
可以看到, 在各种不同的数据集上, 都有非常好的拟合优度和收缩率。 也就是说在各种 不同的数据集上, 本发明的基于线性回归的倒排索引求交方法会有很好的效果。 参阅图 6, 示出了使用传统的二分搜索方法与本发明的基于线性回归的倒排索引求交方 法, 在 NVIDIA GTX480平台上, 对 GOV数据集进行倒排索引求交的响应时间图, 可以看出 来, 在计算阈值较大时, 本发明的倒排索引求交方法有较小的响应时间。 以上对本发明的倒排索引求交方法进行了详细介绍, 本文中应用了具体个例对本发明的 原理及实施方式进行阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想; 同时, 对于本领域的一般技术人员, 依据本发明的思想, 在具体实施方式及应用范围上均会 有改变之处, 综上所述, 本说明书内容不应理解为对本发明的限制。 It can be seen that there are very good goodness of fit and shrinkage on a variety of different data sets. That is to say, the linear regression-based inverted index intersection method of the present invention has a good effect on various data sets. Referring to FIG. 6, a traditional binary search method and a linear regression-based inverted index intersection method of the present invention are shown. On the NVIDIA GTX480 platform, a response time map of the GOV data set is performed by performing an inverted index intersection. It can be seen that the inverted index intersection method of the present invention has a small response time when the calculation threshold is large. The method for intersecting the index of the inverted index of the present invention is described in detail above. The principles and embodiments of the present invention are described in the following. The description of the above embodiments is only for helping to understand the method and the core idea of the present invention. At the same time, the content of the present invention is not limited by the scope of the present invention.

Claims

1、 一种倒排索引求交方法, 其特征在于, 包括: 1. An inverted index intersection method, characterized in that:
第 1、 离线预处理:  First, offline preprocessing:
对每个倒排列表 , 以 docID的索弓 | 为横坐标、值) ^为纵坐标作二维散点图, 其 中 = l,2,..Kt)|, | t)|表示 包含的 docID个数且 t)|≥2, 为非负整数, 基于最 , y^ 使得图中所有点 到该直线的竖
Figure imgf000008_0001
直离差 - ft (i)的平方和
Figure imgf000008_0002
- ft ( )最小, 求出左安全搜索距离 Lt = max, {f-1 和右安全搜索距离 R^max,^- /^(y^, 保存所求出的线性回归信息《, β,, L和 R; 第 2、 倒排索引求交方法, 具体步骤是:
For each inverted list, the docID of the docID | is the abscissa, the value) ^ is the ordinate for the two-dimensional scatter plot, where = l, 2,..Kt)|, | t)| represents the docID included Number and t)|≥2, which is a non-negative integer, based on the most, y^ makes all points in the graph to the vertical of the line
Figure imgf000008_0001
Straight deviation - the square sum of f t (i)
Figure imgf000008_0002
- f t ( ) is the smallest, find the left safe search distance L t = max, {f - 1 and the right safe search distance R ^ max, ^ - / ^ (y ^, save the obtained linear regression information ", β , L and R; 2nd, inverted index intersection method, the specific steps are:
第 2.1、 对于包含 个关键词 ^ ... 的查询, 为正整数且 ≥2, 对应倒排列表 (^, 2),...,^^)包含的 docID 个数呈非降序, 初始化 docID 索弓 = l, 关键词索引 7 = 2, 结果集合 A , 其中 = 1,2,···, | )|, 2<j≤k; 第 2.2、 根据第 1步离线预处理已保存的 4t;)的线性回归信息, 确定^ 中的第 个 元素 ^在 中的安全搜索范围 min lit2.1, for a query containing a keyword ^ ..., is a positive integer and ≥ 2, corresponding to the inverted list (^, 2), ..., ^ ^) contains the number of docID in non-descending order, initialize docID Cable bow = l, keyword index 7 = 2, result set A, where = 1,2,···, | )|, 2<j≤k; 2.2, according to step 1 offline pre-prepared saved 4t ; ) linear regression information, determine the first element in ^ ^ in the safe search range min lit
Figure imgf000008_0003
Figure imgf000008_0003
第 2.3、 采用已有的某种搜索方法, 确定) ^是否在第 2.2步确定的安全搜索范围中; 第 2.4、 若第 2.3步的结果为是, 则检查 <fc是否成立; 第 2.5、 若第 2.4步的结果为是, 则 + +且返回第 2.2步; 第 2.6、 若第 2.4步的结果为否, 则保存 ^到集合 A中且执行第 2.8步;  Section 2.3, using some existing search method, determine) ^ is in the safe search range determined in step 2.2; 2.4, if the result of step 2.3 is yes, then check if <fc is true; The result of step 2.4 is yes, then + + and return to step 2.2; 2.6, if the result of step 2.4 is no, save ^ to set A and perform step 2.8;
第 2.7、 若第 2.3步的结果为否, 则执行第 2.8步;  Clause 2.7. If the result of step 2.3 is no, proceed to step 2.8;
第 2.8、 检查 < ^( 是否成立; 第 2.9、 若第 2.8步的结果为是, 贝 IJ + + , = 2且返回第 2.2步;  Section 2.8, check < ^ (is it true; 2.9, if the result of step 2.8 is yes, IJ + + , = 2 and return to step 2.2;
第 2.10、 若第 2.8步的结果为否, 则结束搜索, 并将 A作为最终结果集( 2.10. If the result of step 2.8 is no, the search ends and A is used as the final result set (
PCT/CN2011/076841 2011-05-09 2011-08-01 Inverted index intersection method WO2012151781A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2011101181617A CN102136011A (en) 2011-05-09 2011-05-09 Reverse index intersection method
CN201110118161.7 2011-05-09

Publications (1)

Publication Number Publication Date
WO2012151781A1 true WO2012151781A1 (en) 2012-11-15

Family

ID=44295797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/076841 WO2012151781A1 (en) 2011-05-09 2011-08-01 Inverted index intersection method

Country Status (2)

Country Link
CN (1) CN102136011A (en)
WO (1) WO2012151781A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136011A (en) * 2011-05-09 2011-07-27 南开大学 Reverse index intersection method
CN106156000B (en) * 2015-04-28 2020-03-17 腾讯科技(深圳)有限公司 Search method and search system based on intersection algorithm
CN110083679B (en) * 2019-03-18 2020-08-18 北京三快在线科技有限公司 Search request processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503163A (en) * 2002-11-22 2004-06-09 �Ҵ���˾ International information search and deivery system providing search results personalized to a particular natural language
US20040205044A1 (en) * 2003-04-11 2004-10-14 International Business Machines Corporation Method for storing inverted index, method for on-line updating the same and inverted index mechanism
US20080133473A1 (en) * 2006-11-30 2008-06-05 Broder Andrei Z Efficient multifaceted search in information retrieval systems
CN102023985A (en) * 2009-09-17 2011-04-20 日电(中国)有限公司 Method and device for generating blind mixed invert index table as well as method and device for searching joint keywords
CN102136011A (en) * 2011-05-09 2011-07-27 南开大学 Reverse index intersection method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100454907C (en) * 2006-08-07 2009-01-21 华为技术有限公司 Method and device for realizing elastic sectionalization ring guiding protective inverting
CN101242430B (en) * 2008-02-22 2012-03-28 华中科技大学 Fixed data pre-access method in peer network order system
CN101930473A (en) * 2010-09-14 2010-12-29 何吴迪 Method for constructing cloud computing window search system with executable structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1503163A (en) * 2002-11-22 2004-06-09 �Ҵ���˾ International information search and deivery system providing search results personalized to a particular natural language
US20040205044A1 (en) * 2003-04-11 2004-10-14 International Business Machines Corporation Method for storing inverted index, method for on-line updating the same and inverted index mechanism
US20080133473A1 (en) * 2006-11-30 2008-06-05 Broder Andrei Z Efficient multifaceted search in information retrieval systems
CN102023985A (en) * 2009-09-17 2011-04-20 日电(中国)有限公司 Method and device for generating blind mixed invert index table as well as method and device for searching joint keywords
CN102136011A (en) * 2011-05-09 2011-07-27 南开大学 Reverse index intersection method

Also Published As

Publication number Publication date
CN102136011A (en) 2011-07-27

Similar Documents

Publication Publication Date Title
US10346257B2 (en) Method and device for deduplicating web page
JP5587434B2 (en) Method and apparatus for text classification
RU2439686C2 (en) Annotation by means of searching
WO2018040503A1 (en) Method and system for obtaining search results
EP2016510A1 (en) Method and device for efficiently ranking documents in a similarity graph
US20110246439A1 (en) Augmented query search
US10185751B1 (en) Identifying and ranking attributes of entities
CN102750379B (en) Fast character string matching method based on filtering type
CN1818908A (en) Feedbakc information use of searcher in search engine
WO2015196964A1 (en) Matching picture search method, picture search method and apparatuses
CN106951526B (en) Entity set extension method and device
WO2020168827A1 (en) Point of interest index based on geographic location
Sandholm et al. Real-time, location-aware collaborative filtering of web content
TW202001621A (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
WO2012151781A1 (en) Inverted index intersection method
CN106649538A (en) Method and device for finding human faces
CN105677695A (en) Method for calculating similarity of mobile applications based on content
CN110598123B (en) Information retrieval recommendation method, device and storage medium based on image similarity
CN103064907A (en) System and method for topic meta search based on unsupervised entity relation extraction
CN114201480A (en) Multi-source POI fusion method and device based on NLP technology and readable storage medium
CN116860825B (en) Verifiable retrieval method and system based on blockchain
CN104123382B (en) A kind of image set abstraction generating method under Social Media
CN107562872B (en) SQL-based query method and device for measuring spatial data similarity
CN107423294A (en) A kind of community image search method and system
CN115455131A (en) Data storage method, system, equipment and storage medium based on multi-source isomerism

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE