CN100495400C - Indexes on-line updating method of full text retrieval system - Google Patents

Indexes on-line updating method of full text retrieval system Download PDF

Info

Publication number
CN100495400C
CN100495400C CNB2006101128008A CN200610112800A CN100495400C CN 100495400 C CN100495400 C CN 100495400C CN B2006101128008 A CNB2006101128008 A CN B2006101128008A CN 200610112800 A CN200610112800 A CN 200610112800A CN 100495400 C CN100495400 C CN 100495400C
Authority
CN
China
Prior art keywords
index
document
secondary index
storehouse
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006101128008A
Other languages
Chinese (zh)
Other versions
CN101136016A (en
Inventor
杨建武
刘缙
李月敏
吴於茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Peking University Founder Research and Development Center
Original Assignee
BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Peking University
Peking University Founder Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIDA FANGZHENG TECHN INST Co Ltd BEIJING, Peking University, Peking University Founder Group Co Ltd filed Critical BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Priority to CNB2006101128008A priority Critical patent/CN100495400C/en
Publication of CN101136016A publication Critical patent/CN101136016A/en
Application granted granted Critical
Publication of CN100495400C publication Critical patent/CN100495400C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The method comprises: using the assistant index to realize the on-line update of index of a full-text retrieval system; adding new file into the assistant retrieval; the deleted files uses the Boolean vector identifier; so as to realize the real-time update of the index and to keep the consistence of the index.

Description

A kind of indexes on-line updating method of text retrieval system
Technical field
The invention belongs to intelligent information processing technology, what be specifically related to is a kind of indexes on-line updating method of text retrieval system.
Background technology
Along with computer technology and rapid development of network technique, the sharp increase of electronic document number.How to search needed data information fast, comprehensively, exactly in the information the inside of this magnanimity has become people's question of common concern, has also become a heat subject in the research field.Most of electronic document is the non-structured text data of being write as with natural language, and global search technology is an important means of handling text data at present.
Full-text search has multiple implementation, comprises inverted index, suffix array and signature file etc.
The corresponding relation of general index is the correspondence of from " number of documents " to " the document all speech ".Inverted index becomes from " speech " this relation the other way around to " all number of documents that this speech occurs ", thus can be apace by word and search to all documents that these speech occur.In the practical application, usually also can comprise information such as number of times that speech occurs and particular location in the inverted index in document.Retrieval for convenience, inverted list is normally orderly.
Below be giving an example of inverted index:
Be provided with two pieces of articles 1 and 2:
The content of article 1 is: Tom lives in Guangzhou, I live in Guangzhou too.
The content of article 2 is: He once lived in Shanghai.
1) at first we will obtain the keyword of these two pieces of articles, and we need take following treatment measures usually:
A. we have plenty of article content now, i.e. character string, and we will find out all words in the character string, i.e. participle earlier.English word is owing to use space-separated, relatively good processing.Between the Chinese word is the special word segmentation processing of needs that connects together.
B. in the article " in ", " once " speech such as " too " do not have any practical significance, in the Chinese " " word such as "Yes" do not have concrete implication usually yet, on behalf of the speech of notion, these to filter out.
Can be when c. the user wishes to look into " He " usually containing " he ", the article of " HE " is also found out, so capital and small letter need be unified in all words.
Can be when d. the user wishes to look into " live " usually containing " lives ", the article of " lived " is also found out, so need " lives ", " lived " is reduced into " live ".
E. the punctuation mark in the article is not represented certain conception of species usually, can filter out yet.
Through after the top processing, all keywords of article 1 are: [tom] [live] [guangzhou] [i] [live] [guangzhou].
All keywords of article 2 are: [he] [live] [shanghai].
2) keyword has been arranged after, we just can set up inverted index.Above corresponding relation be: " article number " is to " all keywords in the article ".Inverted index turns this relation around, becomes: " keyword " is to " have all articles of this keyword number ".Article 1,2 is through becoming behind the row:
Keyword article number
guangzhou?1
he?2
i?1
live?1,2
shanghai?2
tom?1
Usually only know keyword occurs not enough in which article, we also need to know the position of keyword occurrence number and appearance in article, two kinds of positions are arranged usually: a) character position, promptly write down this speech and be which character in the article (advantage be keyword bright when apparent the location fast); B) keyword position, promptly writing down this speech is which keyword in the article (advantage is to save index space, phrase (phase) inquiry soon).
After adding " frequency of occurrences " and " position occurring " information, our index structure becomes:
The position appears in keyword article number [frequency of occurrences]:
guangzhou?1[2]3,6
he?2[1]1
i?1[1]4
live?1[2],2[1]2,5,2
shanghai?2[1]3
tom?1[1]1
We illustrate that this structure: live has occurred 2 times in article 1 with this behavior example of live, occurred once in the article 2, what is its appearance position that this represents " 2; 5,2 "? we need analyze in conjunction with the article number and the frequency of occurrences, have occurred in the article 12 times, so " 2; 5 " just represent two positions that live occurs in article 1, occurred once in the article 2 that remaining " 2 " just represent that live is the 2nd key word in the article 2.
The suffix array indexing is the very high text index structure of a space efficiency that was proposed in 1993 by Manber and Myers, this structure has write down the dictionary sequence index of each suffix in the text, and it deposits all suffix in the text tabulation of its reference position in text according to the dictionary preface.
The signature document is meant the bit string that the keyword in the document is hashed to the F position, and the keyword of the former document of sequential access deposits the bit string of hash gained in file successively.
Below be its matching idea: suppose that we will judge now whether character string A and character string B mate, and at first hash to digital hash (A) and hash (B) to A and B respectively, if hash (A)!=hash (B) then A!=B; Yet hash (A)=hash (B) can not illustrate A=B.
Be concrete coupling example below:
Keyword x[0..5]: A A C T C T Hash (x[0..5])=17579;
Text y[0..9]: G C A A C T C T C A Hash (y[0..5])=17819;
Text y[0..9]: G C A A C T C T C A Hash (y[1..6])=17533;
Text y[0..9]: G C A A C T C T C A Hash (y[2..7])=17579.
Signature file has the following advantages:
1) file organization is simple, the former document sequence consensus of fundamental sum;
2) safeguard easily that generation is inserted, and deletes all very convenient;
3) requisite space is little, particularly adopts after the superimposed coding.
Wherein inverted index is most widely used mode, and it has good performance for the inquiry based on word.
In actual applications, normally what constantly change, new content can be added collection of document, and out-of-date content can deleted or renewal.If along with the variation of collection of document, index is not in time upgraded, the quality of result for retrieval will constantly descend, and retrieval is less than initiate document, perhaps retrieves not exist or document that content has changed.Therefore, the necessary continuous updating of index is so that in time reflect the variation of collection of document.
The simplest mode of index upgrade is that off-line is rebuild index, abandons out-of-date index database that is:, rebuilds index fully with up-to-date data.The Web search engine requires height because more amount of new data is big to recall precision, takes this mode in early days more.
The mode another kind of commonly used of index upgrade is an online updating.Typical online updating method is the update strategy that people such as Clarke adopts in text retrieval system MultiText.The index structure of MultiText is deposited in the mode of an end to end ring file on disk.(common file system is not directly supported the file of annular, but can be by level of abstraction ordinary file analog loop shape file.) at any time, this file all is made up of 3 continuous parts: index to be updated, the index that has upgraded and free space.
During retrieval, at first need the deterministic retrieval condition in which part of index.Because index is pressed lexicographic order and is arranged on disk, only need remember the border of this two parts index, need not to visit disk.(to be updated and upgraded) all has complete inverted index structure because two parts index, can use usual way to find index entry, ideally only needs a disk access just can obtain required postinglist (position array).
During renewal, the new document that adds is temporarily stored in the core buffer through handling the posting that generates.A background process constantly reads the part to be updated of index, after merging with posting in the internal memory, appends to the more end of new portion.In this process, part to be updated constantly shortens, and more new portion constantly increases, till part to be updated all changes more new portion into.
Though the online updating strategy of MultiText has been realized the continuous updating of index, and have recall precision preferably, also have multinomial deficiency:
Only be applicable to and add new document, be not suitable for the application of frequent deletion and modification document;
Figure C200610112800D00081
Can not guarantee real-time, newly-increased document will guarantee to be arrived by user search, will wait for a complete update cycle at least;
Figure C200610112800D00082
Can not guarantee consistance, in merging process, dictionary is divided into all the time and has upgraded and do not upgrade two parts, when the newly-increased document of retrieval, can retrieve in the time of can having and retrieve sometimes less than situation.
By the analysis of front as can be seen, the difficulty of index upgrade is often to need to rewrite most of index database in order to upgrade a few documents, though in the index database most documents with current upgrade irrelevant.With MultiText is example, even in order to upgrade one piece of document, also need to rewrite whole index database.
Summary of the invention
The objective of the invention is to propose a kind of indexes on-line updating method of new text retrieval system, make under the situation of the search function that does not influence text retrieval system, guarantee the real-time and the consistance of index upgrade.
Specific implementation method of the present invention is: a kind of indexes on-line updating method of new text retrieval system may further comprise the steps:
1) with the index database separated into two parts: master index storehouse and secondary index storehouse; Described secondary index storehouse is identical with the structure in master index storehouse, and described secondary index storehouse is complete is stored on internal memory and the disk, is responsible for temporary recently newly-increased document;
2) read the content of index to be updated;
3) action type of judgement index to be updated is newly-increased or deletion action, carries out following processing respectively:
A: newly-increased in this way operation, the content of adding index to be updated in the secondary index storehouse,
B: deletion action in this way, in the secondary index storehouse, preserve document deletion information, described document deletion information adopts boolean vector to preserve, and each document is corresponding to one of boolean vector.
The criteria for classification in described master index storehouse and secondary index storehouse is: described master index storehouse is formed by accounting for most documents that seldom changes, and secondary index is made up of a few documents of frequent change.
Further, judge whether secondary index needs to merge in the master index, merge if desired that secondary index and document deletion information that need are merged merge in the master index, and empty secondary index and the document deletion information that has merged.
Further, judge whether to be still waiting to upgrade the content of index, if having then jump to step 2), otherwise, judge whether to stop upgrading the request of index, if any, end operation, otherwise, proceed decision operation after waiting for a period of time.
Judge whether secondary index needs to merge in the master index, carry out according to the standard of following A, B or C:
A: the document number that sets in advance the file size of secondary index or hold when file size that surpasses setting or document number, then merges;
B: when the busy extent of system is lower than default parameter, then merge;
Both combinations of C:A, B.
Master index and secondary index can be index structure forms such as inverted index, suffix array and signature file.
The concrete classification in master index storehouse and secondary index storehouse needs decide according to concrete applied environment, comprise application the data total amount, the every day/per hour newly-increased data volume, hardware configuration situation.
Effect of the present invention is: among the present invention by utilizing secondary index to realize the index online updating of text retrieval system, thereby reach the real-time that under the situation of the search function that does not influence text retrieval system, guarantees index upgrade and the purpose of conforming index online updating.Experiment shows, under common PC environment (CPU is P42.0G, in save as 1.0GB), the index real-time online that the full-text search that the present invention realizes reaches upgrades and guarantee the purpose of integrality.Work as the secondary index number of files in the experiment less than 10,000 o'clock, newly-increased operation has very fast speed (all below 0.3 second), and deletion action speed is not influenced by secondary index, and retouching operation is deletion action and the combination of adding operation, both sums that is about consuming time.
Description of drawings
Fig. 1 is the process flow diagram of the method for the invention.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the present invention is described in detail.
The renewal operation of finding index database in the practical application has locality usually.According to these characteristics, in the inventive method with the index database separated into two parts: account for master index that most documents that seldom changes forms and the document that often changes recently and form little secondary index.
The overwhelming majority here, seldom change need decide according to concrete applied environment, comprise that data total amount, every day or per hour newly-increased data volume, the hardware configuration situation etc. of application decide.For example: in application, be received within the secondary index storehouse in will upgrading every day, when midnight, system was idle, secondary index merged in the master index.
Because the secondary index capacity is little, renewal operation thereon can be finished very soon, has guaranteed real-time; And it is all less to upgrade operation required time, temporary space and computational resource, thereby has avoided upgrading step by step the consistency problem that brings.
Be placed on the disk the master index if secondary index resembles, then can introduce performance issue.Because the performance of retrieval depends on magnetic disc access times, if secondary index is placed on the disk, the retrieval that just can finish of disk access so originally needs twice disk access at least in this method, expense is big nearly one times.Since the secondary index size much smaller than master index, can all be placed in the internal memory fully.But consider consistency problem, secondary index can not only be placed on internal memory, in case otherwise system break down, the full content of secondary index will be lost, index database is just imperfect.Therefore, also need a backup on the disk.
Secondary index among the present invention is: identical with the master index structure, but complete being stored on internal memory and the disk of while is responsible for the temporary index that increases document recently newly.
Below specific implementation method of the present invention is given an example.
The present invention's (CPU is P42.0G, in save as 1.0GB) under common PC environment experimentizes, and realizes the index online updating of full-text search according to method of the present invention.
As shown in Figure 1, specifically may further comprise the steps:
1) reads the content of index yet to be built;
2) if the action type of index yet to be built is to revise document then this retouching operation is resolved into deletion action to operate with newly-increased;
3) if action type is to increase document newly then execution in step 4, if action type is to delete document then execution in step 8;
4) on secondary index, add the index that increases document content newly;
5) judge whether secondary index needs to merge in the master index, merge if desired that then execution in step 6, otherwise jump to step 9;
6) secondary index and document deletion information are merged in the master index;
7) empty secondary index and the boolean vector that is used to preserve document deletion information, jump to step 9;
8) for the deletion document function, the corresponding positions of preserving the boolean vector of document deletion information is set to 1;
9) judge whether to also have the content of index yet to be built, if having then jump to step 1, otherwise execution in step 10;
10) judge whether to stop building the request of index,, otherwise jump to step 9 after waiting for a period of time if having then withdraw from.
For the deletion action in the above-mentioned steps, use a boolean vector to handle deletion action.The corresponding one piece of document of each of this boolean vector.When deleting one piece of document just correspondence the position be set to " 1 ".Retrieval and index merge algorithm all can be skipped and to correspond to " 1 " document, reached the effect of deletion from application point, carrying out index when merging, these are denoted as " 1 " document owing to being skipped by merge algorithm, will really from master index, disappear.
Because the present invention both can adopt the mode of inverted index structure, also can adopt the mode of filling suffix array and signature file retrieval, not different on method of operating.Adopt the index structure of inverted index structure in this experiment, whether carry out the merging of index by judging the number of documents and the system's busy extent decision that comprise in the secondary index as master index and secondary index.
The data that experiment is selected for use are the news category Chinese web pages that grasp from the Internet, and the news content that extracts webpage is as text, and each file is one piece of Press release, totally 100 ten thousand pieces, are total to 2.68GB.
Following two problems of the main investigation of experiment:
Figure C200610112800D00121
How long can does one piece of document of renewal need after using secondary index, requirement of real time?
Figure C200610112800D00122
Use secondary index how recall precision is influenced?
Use in the experiment that different secondary index size (being unit with the document number of holding) has measured that increment is newly-increased, deletion, upgrade one piece of document required averaging time.Experimental result is as shown in table 1, when the secondary index number of files less than 10,000 o'clock, newly-increased and deletion action has very fast speed (all below 0.3 second).That is to say that as long as the secondary index number of files is limited in below 10,000, the inventive method has good real-time.Simultaneously, it can also be seen that from experimental result deletion action speed is not influenced by secondary index, and retouching operation is deletion action and the combination of adding operation, both sums that is about consuming time.Description of test the inventive method has good real-time.
Table 1: the master index number of files is 100 ten thousand o'clock, and the increment index time overhead is with secondary index number of files situation of change
The secondary index number of files Add (second) consuming time Delete (second) consuming time Revise (second) consuming time
1 0.010 0.205 0.220
10 0.042 0.213 0.292
100 0.051 0.204 0.271
1,000 0.070 0.244 0.306
10,000 0.223 0.200 0.439
100,000 3.31 0.376 4.05
The experimental result of table 2 shows that it doesn't matter for the update time of index and the size of master index, because the renewal process of index is to have carried out on secondary index fully.
Table 2: the secondary index number of files is 10,000 o'clock, and the increment index time overhead is with master index number of files situation of change
The master index number of files Add (second) consuming time Delete (second) consuming time Revise (second) consuming time
1,000 0.223 0.200 0.439
10,000 0.223 0.200 0.439
100,000 0.223 0.200 0.439
1,000,000 0.223 0.200 0.439
In order to investigate the influence of secondary index to retrieval rate, retrieve with 100 terms in the experiment, calculate and retrieve averaging time.Experimental result sees Table 3.Time with no secondary index is benchmark, and the part of increase can be regarded the expense of secondary index as.The secondary index size is 10000 when following, and expense is all less than 5%, can be described as that the user can't perception.
Table 3: the master index number of files is 100 ten thousand o'clock, and retrieval rate is with secondary index number of files situation of change
The secondary index size Retrieve (second) consuming time The secondary index expense
0 0.422 0%
1 0.430 1.8%
10 0.429 1.7%
100 0.431 2.1%
1,000 0.433 2.6%
10,000 0.439 4.1%
100,000 0.981 132%
Comprehensive above experimental result, the method that the present invention proposes has realized the index online updating of text retrieval system, has under the situation of good retrieval performance, guarantees the real-time and the consistance of index upgrade.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (7)

1, a kind of indexes on-line updating method of text retrieval system may further comprise the steps:
1) with the index database separated into two parts: master index storehouse and secondary index storehouse; Described secondary index storehouse is identical with the structure in master index storehouse, and described secondary index storehouse is complete is stored on internal memory and the disk, is responsible for temporary recently newly-increased document;
2) read the content of index to be updated;
3) action type of judgement index to be updated is newly-increased or deletion action, carries out following processing respectively:
A: newly-increased in this way operation, the content of adding index to be updated in the secondary index storehouse,
B: deletion action in this way, in the secondary index storehouse, preserve document deletion information, described document deletion information adopts boolean vector to preserve, and each document is corresponding to one of boolean vector.
2, the indexes on-line updating method of text retrieval system as claimed in claim 1, it is characterized in that, the criteria for classification in described master index storehouse and secondary index storehouse is: described master index storehouse is formed by accounting for most documents that seldom changes, and secondary index is made up of a few documents of frequent change.
3, the indexes on-line updating method of text retrieval system as claimed in claim 2 is characterized in that, described step 3) further comprises following operation:
4) judge whether secondary index needs to merge in the master index, merge if desired that secondary index and document deletion information that need are merged merge in the master index, and empty secondary index and the document deletion information that has merged.
4, the indexes on-line updating method of text retrieval system as claimed in claim 3 is characterized in that, described step 4) further comprises following operation:
5) judge whether to be still waiting to upgrade the content of index, if having then jump to step 2), otherwise, judge whether to stop upgrading the request of index, if any, end operation, otherwise, proceed after waiting for a period of time to judge.
5, the indexes on-line updating method of text retrieval system as claimed in claim 4 is characterized in that, judges whether secondary index needs to merge in the master index, carries out according to the standard of following A, B or C:
A: the document number that sets in advance the file size of secondary index or hold when file size that surpasses setting or document number, then merges;
B: when the busy extent of system is lower than default parameter, then merge;
Both combinations of C:A, B.
As the indexes on-line updating method of each described text retrieval system of claim 1-5, it is characterized in that 6, master index and secondary index can be index structure forms such as inverted index, suffix array and signature file.
7, the indexes on-line updating method of text retrieval system as claimed in claim 2, it is characterized in that, the concrete classification in described master index storehouse and secondary index storehouse needs to decide according to concrete applied environment, comprises data total amount, every day or per hour newly-increased data volume, the hardware configuration situation of application.
CNB2006101128008A 2006-09-01 2006-09-01 Indexes on-line updating method of full text retrieval system Expired - Fee Related CN100495400C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101128008A CN100495400C (en) 2006-09-01 2006-09-01 Indexes on-line updating method of full text retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101128008A CN100495400C (en) 2006-09-01 2006-09-01 Indexes on-line updating method of full text retrieval system

Publications (2)

Publication Number Publication Date
CN101136016A CN101136016A (en) 2008-03-05
CN100495400C true CN100495400C (en) 2009-06-03

Family

ID=39160117

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101128008A Expired - Fee Related CN100495400C (en) 2006-09-01 2006-09-01 Indexes on-line updating method of full text retrieval system

Country Status (1)

Country Link
CN (1) CN100495400C (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408882B (en) * 2008-08-05 2012-10-31 北大方正集团有限公司 Method and system for searching authorization document
CN102096676B (en) * 2009-12-11 2014-04-09 中国移动通信集团公司 Data updating and query control method and system
US8244700B2 (en) * 2010-02-12 2012-08-14 Microsoft Corporation Rapid update of index metadata
CN102270201B (en) * 2010-06-01 2013-07-17 富士通株式会社 Multi-dimensional indexing method and device for network files
CN102004800A (en) * 2010-12-28 2011-04-06 北京数码大方科技有限公司 Data query method and device of PDM (Product Data Management) system
CN102081649B (en) * 2010-12-31 2012-08-15 深圳联友科技有限公司 Method and system for searching computer files
CN102890682B (en) 2011-07-21 2017-08-01 腾讯科技(深圳)有限公司 Build the method, search method, apparatus and system of index
CN103186622B (en) * 2011-12-30 2016-03-30 北大方正集团有限公司 The update method of index information and device in a kind of text retrieval system
CN103207872A (en) * 2012-01-17 2013-07-17 深圳市快播科技有限公司 Real-time indexing method and server
US9245003B2 (en) * 2012-09-28 2016-01-26 Emc Corporation Method and system for memory efficient, update optimized, transactional full-text index view maintenance
CN104424267A (en) * 2013-08-29 2015-03-18 北大方正集团有限公司 Index data inserting method and index data inserting system
CN104077379A (en) * 2014-06-25 2014-10-01 北京海泰方圆科技有限公司 Method for index updating
CN104361009B (en) * 2014-10-11 2017-10-31 北京中搜网络技术股份有限公司 A kind of real time indexing method based on inverted index
CN104598550B (en) * 2014-12-31 2018-09-25 北京奇艺世纪科技有限公司 A kind of update method and device of Internet video index
CN104504144A (en) * 2015-01-05 2015-04-08 浪潮(北京)电子信息产业有限公司 Method and device for acquiring index-related information
CN104899249B (en) * 2015-05-04 2018-07-13 中国科学院信息工程研究所 Reliable index upgrade system and method under a kind of mass data
CN105512339A (en) * 2015-12-31 2016-04-20 深圳市朗科科技股份有限公司 File searcher and searching method
CN106484815B (en) * 2016-09-26 2019-04-12 北京赛思信安技术股份有限公司 A kind of automatic identification optimization method based on mass data class SQL retrieval scene
CN109144994B (en) 2017-06-19 2022-04-29 华为技术有限公司 Index updating method, system and related device
CN109284350B (en) * 2018-11-16 2020-11-13 天津字节跳动科技有限公司 Method and device for updating search content, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种实时更新索引结构的设计与实现. 王智强,刘建毅.计算机系统应用,第10期. 2005
一种实时更新索引结构的设计与实现. 王智强,刘建毅.计算机系统应用,第10期. 2005 *

Also Published As

Publication number Publication date
CN101136016A (en) 2008-03-05

Similar Documents

Publication Publication Date Title
CN100495400C (en) Indexes on-line updating method of full text retrieval system
Huang et al. X-Engine: An optimized storage engine for large-scale E-commerce transaction processing
US8051045B2 (en) Archive indexing engine
US10019284B2 (en) Method for performing transactions on data and a transactional database
CN105630864B (en) Forced ordering of a dictionary storing row identifier values
CN100498782C (en) Method for quick updating data domain in full text retrieval system
US8560500B2 (en) Method and system for removing rows from directory tables
US8909615B2 (en) System and method of managing capacity of search index partitions
US9507816B2 (en) Partitioned database model to increase the scalability of an information system
US20040205044A1 (en) Method for storing inverted index, method for on-line updating the same and inverted index mechanism
CN102339315B (en) Index updating method and system of advertisement data
US20060106849A1 (en) Idle CPU indexing systems and methods
CN101158958B (en) Fusion enquire method based on MySQL storage engines
EP3814930B1 (en) System and method for bulk removal of records in a database
CN102955792A (en) Method for implementing transaction processing for real-time full-text search engine
CN109726177A (en) A kind of mass file subregion indexing means based on HBase
CN110109910A (en) Data processing method and system, electronic equipment and computer readable storage medium
CN109669925B (en) Management method and device of unstructured data
CN102789464A (en) Natural language processing method, device and system based on semanteme recognition
CN107526746B (en) Method and apparatus for managing document index
CN109815240A (en) For managing method, apparatus, equipment and the storage medium of index
CN111522791A (en) Distributed file repeating data deleting system and method
US20060122963A1 (en) System and method for performing a data uniqueness check in a sorted data set
JP2007501476A (en) Database system that does not drop objects and dependent objects
US20090259617A1 (en) Method And System For Data Management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220921

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Peking University

Patentee after: PEKING University FOUNDER R & D CENTER

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 5 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Peking University

Patentee before: PEKING University FOUNDER R & D CENTER

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090603