CN101295323B - Processing method and system for index updating - Google Patents

Processing method and system for index updating Download PDF

Info

Publication number
CN101295323B
CN101295323B CN2008101291333A CN200810129133A CN101295323B CN 101295323 B CN101295323 B CN 101295323B CN 2008101291333 A CN2008101291333 A CN 2008101291333A CN 200810129133 A CN200810129133 A CN 200810129133A CN 101295323 B CN101295323 B CN 101295323B
Authority
CN
China
Prior art keywords
index
data
new data
subsystem
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008101291333A
Other languages
Chinese (zh)
Other versions
CN101295323A (en
Inventor
袁哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2008101291333A priority Critical patent/CN101295323B/en
Publication of CN101295323A publication Critical patent/CN101295323A/en
Application granted granted Critical
Publication of CN101295323B publication Critical patent/CN101295323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a processing system used for updating an index, comprising a data receiving sub-system used for receiving new data; a data distribution sub-system used for distributing the new data; a multi-level index memory sub-system used for receiving the new data from the data distribution sub-system by a first-level index and reconstructing the first-level index according to the received new data; when the new data is transferred to other levels indexes from the first-level index level by level, the other levels indexes are sequentially carried out the reconstruction according to the transferred new data; and the processing system further comprises a disk index sub-system used for reconstructing the disk index according to the new data from the data distribution sub-system or the multi-level index memory sub-system. The invention further provides a processing method for updating the index, and the adaption of a memory index structure of a multi-level index can improve the index reconstruction speed and the timeliness of the information search and increase the using experience of a user.

Description

A kind of disposal route and system that is used for index upgrade
Technical field
The present invention relates to the information search technique in the internet, relate in particular to a kind of disposal route and system that is used for index upgrade.
Background technology
Along with the development of Internet technology, in order to satisfy the obtain demand of Internet user to different field information, information search technique becomes one of current very popular Internet technology.Information search service in the internet is to be provided by the various search engines in the internet, and having concentrated a large amount of information, the function of search engine in the database of search engine is the required information of search subscriber from database.
For the information of new warehouse-in in the search engine, need cut speech, coding, in-line arrangement usually, fall operation such as row, generate index and corresponding data, and original index is rebuild according to the index and the data of generation.For the less information of data volume, above-mentioned cut speech, coding, in-line arrangement, the operation such as row of falling can finish in internal memory, and the index and the data of generation also can be stored in the internal memory; And for the bigger information of data volume, then be by disk the index and the data that generate to be stored.Therefore, the framework that normally adopts internal memory and disk to combine in the prior art carries out information stores, and the new data that search engine received is loaded by internal memory earlier, regularly the new data that loads in the internal memory is sent to disk again and loads; Certainly, the index in internal memory and the disk also need be along with reconstruction is upgraded in the loading of new data.Concrete treatment scheme mainly may further comprise the steps as shown in Figure 1:
Step 101 judges whether to trigger the disk index upgrade, if then forward step 102 to; Otherwise, forward step 103 to.
The operation of disk index upgrade is to be triggered by the disk update cycle of setting, and the moment that finishes in each disk update cycle is the time point of disk index upgrade, and search engine begins to trigger the operation of carrying out the disk index upgrade.
Step 102, all new datas that receive in update cycle at a nearest disk that search engine is stored internal memory send to disk, carry out the reconstruction of disk index by disk according to the new data that is received, and carry out the loading of new data, finish current flow process then.Wherein, comprise in the new data that search engine receives: delete lists of documents, newly-increased lists of documents and upgrade at least a of lists of documents.
Step 103, search engine offers internal memory with the new data that receives.
Step 104, internal memory carries out the reconstruction of internal memory index according to the new data that receives, and carries out the loading of new data.
In actual applications, the user wishes that to having relatively high expectations such as the ageing of Search Results of knowledge type search, news search and forum's search or the like up-to-date information can as much as possible in time search.This can finish the index upgrade reconstruction of fresh information and providing of retrieval service with fast as far as possible renewal speed with regard to requiring search engine.Though the speed that the speed that the internal memory index is rebuild is rebuild than disk index is faster rebuild because internal memory must carry out index when whenever receiving new data, therefore, the time of internal memory index reconstruction also can be along with the increase of data in EMS memory amount linear growth; That is to say that in the update cycle, the Data Update speed in the internal memory can be more and more slower at a disk.
Hence one can see that, because search engine can be sent to internal memory with data earlier in the prior art, like this, the speed of internal memory index upgrade will inevitably become more and more slower along with the continuous increase of the data volume in the internal memory, thereby cause prior art not reach the high-timeliness requirement of user, make troubles to the user to Search Results.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of disposal route and system that is used for index upgrade, and the renewal speed owing to the internal memory index causes the ageing not high problem of information search slowly in the prior art to solve.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention provides a kind of disposal system that is used for index upgrade, this system comprises:
The Data Receiving subsystem is used to receive new data, and described new data comprises deletion lists of documents, newly-increased lists of documents and upgrades at least a of lists of documents;
The data distribution subsystem is used for described new data is distributed;
The multiple index memory subsystem comprises multiple index, and the indexed data capacity at different levels in the described multiple index increase progressively from higher level to the subordinate successively, and data index subordinate's index by the higher level and transmit step by step;
Described multiple index memory subsystem comprises: data transfer module, index rebuilding module and index handover module, wherein,
Described data transfer module is used for by the first order index new data from described data distribution subsystem being received, and described new data is delivered to other index at different levels step by step by described first order index;
Described index rebuilding module, link to each other with data transfer module, be used for described first order index being rebuild according to the new data that described first order index receives, and when described new data is delivered to other index at different levels step by step by described first order index, described other index at different levels are rebuild successively according to the new data that is transmitted; Described index at different levels is by master index and be equipped with index and constitute, and the reconstruction of index is that described master index and one of them index of being equipped with in the index are rebuild; Index rebuild finish after, with described master index be equipped with index upgrade and be the index after rebuilding;
Described index handover module links to each other with described index rebuilding module, is used for when the master index of described index at different levels is rebuild with one of them index that is equipped with index service being switched to another index.
Described system also comprises: the disk index subsystem is used for according to the new data from described data distribution subsystem or multiple index memory subsystem the disk index being rebuild.
Described Data Receiving subsystem further comprises:
Data detection module is used to detect the new data in each sense cycle;
Data merge module, when being used for described data detection module detecting many batches of new datas in sense cycle, described many batches of new datas are merged, and the new data after the described merging is offered described data distribution subsystem.
Described multiple index memory subsystem further comprises:
Administration module, link to each other with the index rebuilding module with described data transfer module, be used for managing, and the index in the described multiple index memory subsystem managed by the global document table by the document of circular file to described multiple index memory subsystem.
The present invention also provides a kind of disposal route that is used for index upgrade, and this method comprises:
By the new data of the reception of the first order index in the multiple index of multiple index memory subsystem from the data distribution subsystem, described new data comprises deletion lists of documents, newly-increased lists of documents and upgrades at least a of lists of documents, and the indexed data capacity at different levels in the described multiple index increase progressively from higher level to the subordinate successively, and data index subordinate's index by the higher level and transmit step by step;
According to the new data that is received described first order index is rebuild;
When described new data is delivered to other index at different levels step by step by described first order index, described other index at different levels are rebuild successively according to described new data;
Described index at different levels is by master index and be equipped with index and constitute, the reconstruction of index is that described master index and one of them index that is equipped with in the index are rebuild, and service switched to another index, index rebuild finish after, with described master index be equipped with index upgrade and be the index after rebuilding.
Described new data is delivered to other index at different levels step by step by first order index, specifically comprise: when higher level's indexed data amount reaches the threshold value of setting, new data in described higher level's index is delivered to subordinate's index, trigger described subordinate index and rebuild, and with the new data deletion of having transmitted in described higher level's index.
This method further comprises: when the afterbody indexed data amount in described multiple index reaches the threshold value of setting, new data in the described afterbody index is passed to the disk index subsystem, trigger described disk index subsystem and carry out the reconstruction of disk index, and with the new data deletion of having transmitted in the described afterbody index.
This method further comprises: described data distribution subsystem is according to described deletion lists of documents, newly-increased lists of documents and upgrade the sequential file that the lists of documents generation is made up of document identification, and in each finish time disk update cycle, the new data that this disk was received in the update cycle and the sequential file of generation send to the disk index subsystem, trigger described disk index subsystem and carry out the disk index and rebuild.
This method further comprises: by circular file the document in the described multiple index memory subsystem is managed.
This method further comprises: by the global document table index at different levels in the described multiple index memory subsystem are managed.
A kind of disposal route and system that is used for index upgrade provided by the present invention, the internal memory index structure of employing multiple index receives new data by the first order index in the multiple index, and according to the new data that is received first order index is rebuild; When new data is delivered to other index at different levels step by step by first order index, other index at different levels are rebuild successively according to new data.Because first order indexed data volume ratio is less, the index reconstruction speed is very fast, when new data arrives first order index, the index that can finish first order index that only requires a very short time is rebuild, also just only require a very short time and just can from first order index, search this data, thereby improved the index reconstruction speed, fully satisfied the user, increased user's experience searching for ageing requirement.
Description of drawings
Fig. 1 is for being used for the process flow diagram of the disposal route of index upgrade in the prior art;
Fig. 2 is a kind of composition structural representation that is used for the disposal system of index upgrade of the present invention;
Fig. 3 is the structural representation of multiple index among the present invention;
Fig. 4 is the process flow diagram of data distribution among the present invention;
The process flow diagram that Fig. 5 switches for index among the present invention;
Fig. 6 is the structural representation of global document table among the present invention;
The synoptic diagram of circular file management among Fig. 7 the present invention.
Embodiment
The technical solution of the present invention is further elaborated below in conjunction with the drawings and specific embodiments.
A kind of disposal system that is used for index upgrade provided by the present invention, as shown in Figure 2, this system comprises: Data Receiving subsystem 10, data distribution subsystem 20, multiple index memory subsystem 30 and disk index subsystem 40.
Wherein, Data Receiving subsystem 10 is used to receive new data.So-called new data is meant the Data Receiving subsystem 10 new data that receive, and comprises deletion lists of documents, newly-increased lists of documents in this new data and upgrades at least a of lists of documents.Deletion comprise in the lists of documents document identification that needs delete (ID, IDentity); Comprise document id and corresponding newly-increased document that needs increase in the newly-increased lists of documents; Upgrade the document id and the corresponding renewal document that comprise in the lists of documents that needs upgrade.
Data distribution subsystem 20 links to each other with Data Receiving subsystem 10, is used for distributing from the new data of Data Receiving subsystem 10.Data distribution subsystem 20 will send to multiple index memory subsystem 30 from the new data of Data Receiving subsystem 10; And according to the disk update cycle of setting, in each finish time disk update cycle, data distribution subsystem 20 sends to disk index subsystem 40 with all new datas that received in this disk update cycle.
Multiple index memory subsystem 30, link to each other with data distribution subsystem 20, multiple index memory subsystem 30 comprises multiple index, as shown in Figure 3, indexed data capacity at different levels increase progressively from higher level to the subordinate successively, data in the multiple index index subordinate's index by the higher level and transmit step by step, and index at different levels is all by master index be equipped with index and constitute.Fig. 3 is a n level index, and L1_Index represents the first order index of n level index, and Ln_Index represents the afterbody index of n level index, as can be seen from Figure 3, when multiple index is carried out information search, be that index at different levels are all searched for, thereby obtain final search result.Be respectively equipped with the threshold value of data transfer among the present invention for index at different levels, when the data volume in certain grade of index reached the threshold value of correspondence, this grade index passed to subordinate's index with the new data of self.
With 3 grades of index be example the transmission between index at different levels describes to data, the data capacity of L1_Index is 2,000,000, the data capacity of L2_Index is 8,000,000, the data capacity of L3_Index is 20,000,000.The threshold value of setting L1_Index is 1,000,000, and the threshold value of L2_Index is 4,000,000, and the threshold value of L3_Index is 10,000,000.L1_Index receives the new data from data distribution subsystem 20, and according to the new data that receives L1_Index is carried out index and rebuild, and L1_Index receives new data at every turn, all needs self index is rebuild; Data volume in L1_Index reaches at 1,000,000 o'clock, and the L1_Index new data that self is current all passes to L2_Index, and after transmission finished, the new data that L1_Index will transmit was deleted from L1_Index.Similarly, L2_Index receives the new data from L1_Index, and carries out the reconstruction of self index according to the new data that receives, and L2_Index receives new data at every turn, all needs self index is rebuild; Data volume in L2_Index reaches at 4,000,000 o'clock, and the L2_Index new data that self is current all passes to L3_Index, triggers L3_Index and carries out the index reconstruction, and after transmission finished, the new data that L2_Index will transmit was deleted from L2_Index.
From above-mentioned for example as can be seen, after higher level's index is given subordinate's index with data transfer, the data deletion in higher level's index, thereby can guarantee that the data in the multiple index do not repeat.In addition, the threshold value of index correspondences at different levels can be set according to actual needs, but the threshold value of every grade of index correspondence setting all is less than indexed data capacity at different levels usually, thereby guarantee index at different levels when data volume reaches threshold value and goes forward side by side the transmission of line data, also having living space is used for receiving the new data of higher level's index.
Disk index subsystem 40, link to each other with data distribution subsystem 20, data distribution subsystem 20 is in each finish time disk update cycle, all new datas that this disk that self is stored was received in the update cycle send to disk index subsystem 40, carry out the reconstruction of disk index by disk index subsystem 40 according to the new data that receives.As can be seen, disk index subsystem 40 is every through a disk update cycle, all can carry out the reconstruction of disk index according to the new data that data distribution subsystem 20 sends, and makes that the information in the disk index subsystem 40 can access renewal timely.Accordingly, because all new datas of being stored in the multiple index memory subsystem 30 all have its corresponding time of reception, multiple index memory subsystem 30 is in the finish time of each disk update cycle, with the new data that is received in update cycle at this disk stored in the multiple index and corresponding index deletion, duplicate in data in the multiple index memory subsystem 30 and the disk index subsystem 40 preventing.
Above-mentioned Data Receiving subsystem 10 comprises that also interconnective data detection module 11 and data merge module 12.Data detection module 11 is used to detect the new data in each sense cycle.The size of sense cycle can be set according to actual needs; Data detection module 11 detects the arrival whether new data is arranged in each sense cycle according to the sense cycle of setting.Data merge module 12, when being used for data detection module 11 detecting many batches of new datas in sense cycle, detected many batches of new datas are merged, and the new data after will merging offer data distribution subsystem 20.Set sense cycle among the present invention, and many batches of new datas in each sense cycle are merged, provide new data to data distribution subsystem 20 with the fixing cycle, can reduce the time jitter of multiple index memory subsystem 30 on index is rebuild.For example: sense cycle is 2 seconds, detects two crowdes of new data A, B in 2 seconds, carries out offering data distribution subsystem 20 after data merge, and can trigger multiple index memory subsystem 30 and carry out secondary index reconstruction; Middle compared to existing technology new data A triggers a secondary index and rebuilds, and new data B triggers a secondary index again and rebuilds, and the index reconstruction time fluctuation among the present invention is less, thereby can reduce the time jitter of multiple index memory subsystem 30 on index is rebuild.
Above-mentioned data distribution subsystem 20 also comprises: disk update cycle setting module 21, sequential file generation module 22 and data distribution module 23.Disk update cycle setting module 21 is used to set the disk update cycle.Sequential file generation module 22, be used for comprising deletion lists of documents, newly-increased lists of documents and upgrading the sequential file that the lists of documents generation is made up of document id according to the new data that receives, the state that also comprises each document correspondence in this sequential file, for example: for the document of needs deletion, the state of document id correspondence is deletion, for document newly-increased and that upgrade, the state of document id correspondence is effective.Data distribution module 23, link to each other with sequential file generation module 22 with disk update cycle setting module 21, the new data that is used for receiving sends to multiple index memory subsystem 30, and according to the disk update cycle of setting, in each finish time disk update cycle, the new data that this disk was received in the update cycle and the sequential file of generation send to disk index subsystem 40.
The data distribution flow process that is realized by data distribution subsystem 20 among the present invention mainly may further comprise the steps as shown in Figure 4:
Step 401 is according to the new data formation sequence file that receives.
Data distribution subsystem 20 is according to the deletion lists of documents that is comprised in the new data, newly-increased lists of documents and upgrade lists of documents, and the formation sequence file comprises each document id and corresponding document status in the new data in this sequential file.
Step 402 judges according to the disk update cycle of setting whether needs are rebuild the disk index, if then execution in step 403; Otherwise, execution in step 404.
Data distribution subsystem 20 is according to the disk update cycle of setting, determine the time point that rebuild for the disk index each finish time disk update cycle, if when arriving this time point, judging needs to rebuild the disk index, and the other times section is then judged does not need to rebuild the disk index.
Step 403, the new data that receives in disk index subsystem 40 sends these disk index cycles and the sequential file of generation trigger disk index subsystem 40 and carry out the disk index and rebuild, and finish current flow process then.
Disk index subsystem 40 knows that according to sequential file needs carry out the increase of which document, deletion or upgrade operation, execute increase, deletion and the renewal operation of respective document after, rebuild the disk index.
Step 404 sends new data to multiple index memory subsystem 30.
When having new data to arrive in data distribution subsystem 20, data distribution subsystem 20 is transmitted to multiple index memory subsystem 30 with new data, triggers multiple index memory subsystem 30 and carries out the reconstruction of internal memory index.
Multiple index memory subsystem 30 among the present invention also comprises: data transfer module 31 and index rebuilding module 32.Data transfer module 31 is used for the new data from data distribution subsystem 20 is delivered to other index at different levels step by step by first order index, and the detailed process front that new data transmits step by step describes in detail, repeats no more herein.Index rebuilding module 32 links to each other with data transfer module 31, is used for the new data that receives according to index at different levels, and index at different levels are rebuild.
In addition, in order to guarantee that index at different levels in the multiple index memory subsystem 30 are when rebuilding, still can provide normal information search service for the user, index at different levels of the present invention is all by master index be equipped with index and form, thereby make one of them index when rebuilding, can also select the another one index to provide service for the user.Based on this, multiple index memory subsystem 30 among the present invention also comprises index handover module 33, link to each other with index rebuilding module 32, be used for when the master index of index at different levels is rebuild with one of them index that is equipped with index, service is switched to another index, so that normally providing of service to be provided.
Rebuilding, be equipped with index with master index below, service is provided is example, and the flow process that index is switched is described in detail, and as shown in Figure 5, mainly may further comprise the steps:
Step 501 when receiving new data, is carried out index by master index and is rebuild, and triggers index and switches, and service is switched to index fully, provides service by indexed.
For example: L1_Index is by master index a and be equipped with index b and form, has new data to come then in L1_Index, and master index a rebuilds, and all services on the L1_Index all switch to index b fully, during master index a rebuilds, provides service by index b fully.
After step 502, master index were rebuild and finished, whether also providing service, if then execution in step 503 if determining to be equipped with index; Otherwise, execution in step 504.
Step 503 is waited for that being equipped with index provides service, and is returned step 502, up to determining that being equipped with index is not providing service.
Step 504 is replaced the master index after rebuilding and is equipped with index, and by the master index after rebuilding or be equipped with index service is provided.
Still with describing for example in the step 501, master index a obtains new master index a1 after rebuilding, and need also upgrade this moment to being equipped with index b, be equipped with the synchronous of index and master index to reach, therefore, new master index a1 is replaced index b fully, become new index b1 fully.
It is to be noted, adopt the framework of multiple index among the present invention, because when higher level's indexed data amount reaches threshold value, just new data is delivered to subordinate's index, triggering subordinate's index rebuilds, therefore, in the time of can existing higher level's index to carry out document deletion or renewal operation, the index change in higher level's index can't in time be reacted to the situation of subordinate's index.Above-mentioned situation can cause the user searching for to be, old document or deleted document are still searched to come out.So, the present invention adopts the global document table that index is carried out unified management, as shown in Figure 6, in the global document table, all document ids of multiple index memory subsystem 30 are arranged in order, the corresponding document bitmap of each document id comprises n bit in each document bitmap, whether each bit is used for identifying the document respectively and is present in the index at different levels.With 3 grades of index is example, has 3 bits in the document bitmap of document id 1 correspondence, the corresponding L1_Index of first bit, second corresponding L2_Index of bit, the 3rd the corresponding L3_Index of bit; During first bit position 1, show that document id 1 is present among the L1_Index, during first bit position 0, show that document id 1 is not present among the L1_Index.The function of other two bits and first bit are similar, repeat no more.
This shows, when document transmits between the different stage index, need be the bit position 0 of the index correspondence at the document place before transmitting, and the bit position 1 of the index correspondence at the document place after transmitting.For example: in 3 grades of above-mentioned index, document id 1 need be with first bit position 0 of the document bitmap of document id 1 correspondence when L1_Index is delivered to L2_Index, and with second bit position 1.In addition, if document id 1 all exists in 3 grades of index, when then document id 1 is updated in L1_Index, need be with second bit and the 3rd bit position 0 of the document bitmap of document id 1 correspondence, with the document id 1 among expression L2_Index and the L3_Index is invalid, thereby when having avoided user search, document id 1 is upgraded preceding old document searching come out.When document id 1 is performed deletion action, document id in the global document table 1 and corresponding document bitmap need be emptied.
Because increase, deletion, renewal operation to document in the multiple index memory subsystem 30 are very frequent, therefore, the present invention adopts the framework of circular file that each document in the multiple index memory subsystem 30 is managed, can be by the head pointer of circular file and moving of tail pointer, the convenient various operations that realize document.Further describe below in conjunction with Fig. 7, indicate the effective range of document among Fig. 7 by head pointer (Head) and tail pointer (Tail), original document comprises document 1, document 2, document 3, document 4, document 5 and document 6; When needing to increase document 7 newly, only need Tail is moved one backward, and the corresponding content of document 7 is inserted in the circular file in order gets final product; When needing deletion document 1, document 2 and document 3, need not physically to remove document 1, document 2 and document 3, only need Head is moved three backward, point to document 4 and get final product; Continue newly-increased document 8 and document 9 if desired, still insert document in order, and corresponding mobile Tail, when arriving the circular file afterbody, the head that then returns circular file continues insertion and gets final product.In addition, deleting the document 5 in the circular file if desired, then do not need mobile Head and Tail, by the global document table shown in Fig. 6, with each bit position 0 of the document bitmap of document 5 correspondences, is invalid getting final product with expression document 5 states.
Adopt the framework of circular file, only need move the convenient management that Head and Tail can realize document according to deletion, increase and the renewal needs of document, the operation of creating new file and duplicating effective document again compared to existing technology, the way to manage of circular file can not consume the too much document management time.
At hardware aspect, set up an administration module 34 in the multiple index memory subsystem 30 of the present invention, link to each other with index rebuilding module 32 with data transfer module 31, in order to the document in the multiple index memory subsystem 30 being managed, and the index in the multiple index memory subsystem 30 is managed by the global document table by circular file.
In addition, can also be among the present invention with the next stage index of disk index subsystem 40 as multiple index memory subsystem 30, be that the afterbody indexed data amount of multiple index memory subsystem 30 is when reaching the threshold value of setting, new data in the afterbody index is passed to disk index subsystem 40, trigger disk index subsystem 40 and carry out the reconstruction of disk index, after transmitting end, new data deletion with having transmitted in the afterbody index duplicates to avoid the data in multiple index memory subsystem 30 and the disk index subsystem 40.Under this embodiment, no longer need to be provided with the disk index upgrade cycle, data distribution subsystem 20 does not need regularly to send new data triggering disk index subsystem 40 to disk index subsystem 40 yet and carries out the reconstruction of disk index, 40 of disk index subsystems need be when the new data during multiple index memory subsystem 30 is arranged arrives, carry out the reconstruction of disk index and get final product, thereby the data distribution of having simplified data distribution subsystem 20 is operated.
The present invention is by multiple index memory subsystem 30, the realization multiple index is carried out index respectively and is rebuild, because the data capacity of L1_Index is smaller usually, make that the index reconstruction speed of L1_Index is very fast, when new data arrives multiple index memory subsystem 30, the index that can finish L1_Index that only requires a very short time is rebuild, and also just only requiring a very short time just can search this data from L1_Index.For knowledge search, the problem that the user proposes searched engine search is at once come out, thereby has accelerated the speed that problem is solved, and has improved the probability that problem is solved; For news search, news flash can be searched for fast, fully satisfies the user to searching for ageing requirement, has increased user's experience.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.

Claims (10)

1. a disposal system that is used for index upgrade is characterized in that, this system comprises:
The Data Receiving subsystem is used to receive new data, and described new data comprises deletion lists of documents, newly-increased lists of documents and upgrades at least a of lists of documents;
The data distribution subsystem is used for described new data is distributed;
The multiple index memory subsystem comprises multiple index, and the indexed data capacity at different levels in the described multiple index increase progressively from higher level to the subordinate successively, and data index subordinate's index by the higher level and transmit step by step;
Described multiple index memory subsystem comprises: data transfer module, index rebuilding module and index handover module, wherein,
Described data transfer module is used for by the first order index new data from described data distribution subsystem being received, and described new data is delivered to other index at different levels step by step by described first order index;
Described index rebuilding module, link to each other with data transfer module, be used for described first order index being rebuild according to the new data that described first order index receives, and when described new data is delivered to other index at different levels step by step by described first order index, described other index at different levels are rebuild successively according to the new data that is transmitted; Described index at different levels is by master index and be equipped with index and constitute, and the reconstruction of index is that described master index and one of them index of being equipped with in the index are rebuild; Index rebuild finish after, with described master index be equipped with index upgrade and be the index after rebuilding;
Described index handover module links to each other with described index rebuilding module, is used for when the master index of described index at different levels is rebuild with one of them index that is equipped with index service being switched to another index.
2. according to the described disposal system that is used for index upgrade of claim 1, it is characterized in that, described system also comprises: the disk index subsystem is used for according to the new data from described data distribution subsystem or multiple index memory subsystem the disk index being rebuild.
3. according to claim 1 or the 2 described disposal systems that are used for index upgrade, it is characterized in that described Data Receiving subsystem further comprises:
Data detection module is used to detect the new data in each sense cycle;
Data merge module, when being used for described data detection module detecting many batches of new datas in sense cycle, described many batches of new datas are merged, and the new data after the described merging is offered described data distribution subsystem.
4. according to claim 1 or the 2 described disposal systems that are used for index upgrade, it is characterized in that described multiple index memory subsystem further comprises:
Administration module, link to each other with the index rebuilding module with described data transfer module, be used for managing, and the index in the described multiple index memory subsystem managed by the global document table by the document of circular file to described multiple index memory subsystem.
5. a disposal route that is used for index upgrade is characterized in that, this method comprises:
By the new data of the reception of the first order index in the multiple index of multiple index memory subsystem from the data distribution subsystem, described new data comprises deletion lists of documents, newly-increased lists of documents and upgrades at least a of lists of documents, and the indexed data capacity at different levels in the described multiple index increase progressively from higher level to the subordinate successively, and data index subordinate's index by the higher level and transmit step by step;
According to the new data that is received described first order index is rebuild;
When described new data is delivered to other index at different levels step by step by described first order index, described other index at different levels are rebuild successively according to described new data;
Described index at different levels is by master index and be equipped with index and constitute, the reconstruction of index is that described master index and one of them index that is equipped with in the index are rebuild, and service switched to another index, index rebuild finish after, with described master index be equipped with index upgrade and be the index after rebuilding.
6. according to the described disposal route that is used for index upgrade of claim 5, it is characterized in that, described new data is delivered to other index at different levels step by step by first order index, specifically comprise: when higher level's indexed data amount reaches the threshold value of setting, new data in described higher level's index is delivered to subordinate's index, trigger described subordinate index and rebuild, and with the new data deletion of having transmitted in described higher level's index.
7. according to the described disposal route that is used for index upgrade of claim 5, it is characterized in that, this method further comprises: when the afterbody indexed data amount in described multiple index reaches the threshold value of setting, new data in the described afterbody index is passed to the disk index subsystem, trigger described disk index subsystem and carry out the reconstruction of disk index, and with the new data deletion of having transmitted in the described afterbody index.
8. according to the described disposal route that is used for index upgrade of claim 5, it is characterized in that, this method further comprises: described data distribution subsystem is according to described deletion lists of documents, newly-increased lists of documents and upgrade the sequential file that the lists of documents generation is made up of document identification, and in each finish time disk update cycle, the new data that this disk was received in the update cycle and the sequential file of generation send to the disk index subsystem, trigger described disk index subsystem and carry out the disk index and rebuild.
9. according to the described disposal route that is used for index upgrade of claim 5, it is characterized in that this method further comprises: the document in the described multiple index memory subsystem is managed by circular file.
10. according to the described disposal route that is used for index upgrade of claim 5, it is characterized in that this method further comprises: the index at different levels in the described multiple index memory subsystem are managed by the global document table.
CN2008101291333A 2008-06-30 2008-06-30 Processing method and system for index updating Active CN101295323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101291333A CN101295323B (en) 2008-06-30 2008-06-30 Processing method and system for index updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101291333A CN101295323B (en) 2008-06-30 2008-06-30 Processing method and system for index updating

Publications (2)

Publication Number Publication Date
CN101295323A CN101295323A (en) 2008-10-29
CN101295323B true CN101295323B (en) 2011-11-02

Family

ID=40065607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101291333A Active CN101295323B (en) 2008-06-30 2008-06-30 Processing method and system for index updating

Country Status (1)

Country Link
CN (1) CN101295323B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103602B (en) * 2009-12-17 2013-02-27 腾讯科技(深圳)有限公司 System and method for increasing retrieval speed
US8971641B2 (en) 2010-12-16 2015-03-03 Microsoft Technology Licensing, Llc Spatial image index and associated updating functionality
CN102890682B (en) * 2011-07-21 2017-08-01 腾讯科技(深圳)有限公司 Build the method, search method, apparatus and system of index
CN102508884A (en) * 2011-10-18 2012-06-20 盘古文化传播有限公司 Method and device for acquiring hotpot events and real-time comments
CN103294731A (en) * 2012-03-05 2013-09-11 阿里巴巴集团控股有限公司 Real-time index creating and real-time searching method and device
US9245003B2 (en) * 2012-09-28 2016-01-26 Emc Corporation Method and system for memory efficient, update optimized, transactional full-text index view maintenance
CN104424267A (en) * 2013-08-29 2015-03-18 北大方正集团有限公司 Index data inserting method and index data inserting system
CN103425802B (en) * 2013-09-10 2017-01-25 北京信息科技大学 Method for quickly retrieving magnetic disk file
CN103678577B (en) * 2013-12-10 2017-10-24 新浪网技术(中国)有限公司 A kind of data-updating method and device
CN104199827B (en) * 2014-07-24 2017-08-04 北京大学 The high dimensional indexing method of large scale multimedia data based on local sensitivity Hash
CN105512325B (en) * 2015-12-21 2018-12-25 华为技术有限公司 Update, deletion and the method for building up and device of multi-edition data index
CN106407376B (en) * 2016-09-12 2019-12-20 杭州数梦工场科技有限公司 Index reconstruction method and device
CN106528623B (en) * 2016-09-28 2018-05-22 深圳云天励飞技术有限公司 A kind of search engine accelerating method and device
CN108694188B (en) * 2017-04-07 2023-05-12 腾讯科技(深圳)有限公司 Index data updating method and related device
CN110019221B (en) * 2017-12-18 2022-07-19 本无链科技(深圳)有限公司 Memory mapping type database system
CN109783444A (en) * 2018-12-26 2019-05-21 亚信科技(中国)有限公司 Multichannel file index method, device, computer equipment and storage medium
CN110399535B (en) * 2019-02-26 2023-10-10 腾讯科技(深圳)有限公司 Data query method, device and equipment
CN109918472A (en) * 2019-02-27 2019-06-21 北京百度网讯科技有限公司 Method, apparatus, equipment and the medium of storage and inquiry data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1979469A (en) * 2005-11-29 2007-06-13 国际商业机器公司 Index and its extending and searching method
CN101094179A (en) * 2007-07-16 2007-12-26 中兴通讯股份有限公司 Method and device for looking up route indexed in multiple stages

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1979469A (en) * 2005-11-29 2007-06-13 国际商业机器公司 Index and its extending and searching method
CN101094179A (en) * 2007-07-16 2007-12-26 中兴通讯股份有限公司 Method and device for looking up route indexed in multiple stages

Also Published As

Publication number Publication date
CN101295323A (en) 2008-10-29

Similar Documents

Publication Publication Date Title
CN101295323B (en) Processing method and system for index updating
CN101820386B (en) Method and system for facilitating forwarding a packet in a content-centric network
CN109391645B (en) Block chain lightweight processing method, block chain node and storage medium
CN102169507B (en) Implementation method of distributed real-time search engine
CN103518364B (en) The data-updating method of distributed memory system and server
CN108121782B (en) Distribution method of query request, database middleware system and electronic equipment
CN101217571B (en) Write/read document operation method applied in multi-copy data grid system
US20040085980A1 (en) System and method for maintaining transaction cache consistency in mobile computing environment
CN113268472B (en) Distributed data storage system and method
CN101770515A (en) Data block comparison based data updating method
CN108829720B (en) Data processing method and device
CN103473229A (en) Memory retrieval system and method, and real-time retrieval system and method
US11176111B2 (en) Distributed database management system with dynamically split B-tree indexes
CN110597452A (en) Data processing method and device of storage system, storage server and storage medium
CN104765661A (en) Multiple-node hot standby method of metadata service nodes in cloud storage service
CN102819586A (en) Uniform Resource Locator (URL) classifying method and equipment based on cache
CN109710586B (en) A kind of clustered node configuration file synchronous method and device
CN104899249B (en) Reliable index upgrade system and method under a kind of mass data
US9870402B2 (en) Distributed storage device, storage node, data providing method, and medium
CN110334076B (en) Data processing method, system, server and device
US20080162588A1 (en) Repository synchronization in a ranked repository cluster
CN101442458B (en) System and method for sending sequence type data
CN104391931A (en) Efficient mass data indexing method in cloud computing
CN116542668A (en) Block chain-based data processing method, equipment and readable storage medium
CN111045987B (en) Ceph-based distributed file system metadata access acceleration method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131015

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20131015

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Patentee after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.