Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of disposal route and system that is used for index upgrade, and the renewal speed owing to the internal memory index causes the ageing not high problem of information search slowly in the prior art to solve.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention provides a kind of disposal system that is used for index upgrade, this system comprises:
The Data Receiving subsystem is used to receive new data, and described new data comprises deletion lists of documents, newly-increased lists of documents and upgrades at least a of lists of documents;
The data distribution subsystem is used for described new data is distributed;
The multiple index memory subsystem comprises multiple index, and the indexed data capacity at different levels in the described multiple index increase progressively from higher level to the subordinate successively, and data index subordinate's index by the higher level and transmit step by step;
Described multiple index memory subsystem comprises: data transfer module, index rebuilding module and index handover module, wherein,
Described data transfer module is used for by the first order index new data from described data distribution subsystem being received, and described new data is delivered to other index at different levels step by step by described first order index;
Described index rebuilding module, link to each other with data transfer module, be used for described first order index being rebuild according to the new data that described first order index receives, and when described new data is delivered to other index at different levels step by step by described first order index, described other index at different levels are rebuild successively according to the new data that is transmitted; Described index at different levels is by master index and be equipped with index and constitute, and the reconstruction of index is that described master index and one of them index of being equipped with in the index are rebuild; Index rebuild finish after, with described master index be equipped with index upgrade and be the index after rebuilding;
Described index handover module links to each other with described index rebuilding module, is used for when the master index of described index at different levels is rebuild with one of them index that is equipped with index service being switched to another index.
Described system also comprises: the disk index subsystem is used for according to the new data from described data distribution subsystem or multiple index memory subsystem the disk index being rebuild.
Described Data Receiving subsystem further comprises:
Data detection module is used to detect the new data in each sense cycle;
Data merge module, when being used for described data detection module detecting many batches of new datas in sense cycle, described many batches of new datas are merged, and the new data after the described merging is offered described data distribution subsystem.
Described multiple index memory subsystem further comprises:
Administration module, link to each other with the index rebuilding module with described data transfer module, be used for managing, and the index in the described multiple index memory subsystem managed by the global document table by the document of circular file to described multiple index memory subsystem.
The present invention also provides a kind of disposal route that is used for index upgrade, and this method comprises:
By the new data of the reception of the first order index in the multiple index of multiple index memory subsystem from the data distribution subsystem, described new data comprises deletion lists of documents, newly-increased lists of documents and upgrades at least a of lists of documents, and the indexed data capacity at different levels in the described multiple index increase progressively from higher level to the subordinate successively, and data index subordinate's index by the higher level and transmit step by step;
According to the new data that is received described first order index is rebuild;
When described new data is delivered to other index at different levels step by step by described first order index, described other index at different levels are rebuild successively according to described new data;
Described index at different levels is by master index and be equipped with index and constitute, the reconstruction of index is that described master index and one of them index that is equipped with in the index are rebuild, and service switched to another index, index rebuild finish after, with described master index be equipped with index upgrade and be the index after rebuilding.
Described new data is delivered to other index at different levels step by step by first order index, specifically comprise: when higher level's indexed data amount reaches the threshold value of setting, new data in described higher level's index is delivered to subordinate's index, trigger described subordinate index and rebuild, and with the new data deletion of having transmitted in described higher level's index.
This method further comprises: when the afterbody indexed data amount in described multiple index reaches the threshold value of setting, new data in the described afterbody index is passed to the disk index subsystem, trigger described disk index subsystem and carry out the reconstruction of disk index, and with the new data deletion of having transmitted in the described afterbody index.
This method further comprises: described data distribution subsystem is according to described deletion lists of documents, newly-increased lists of documents and upgrade the sequential file that the lists of documents generation is made up of document identification, and in each finish time disk update cycle, the new data that this disk was received in the update cycle and the sequential file of generation send to the disk index subsystem, trigger described disk index subsystem and carry out the disk index and rebuild.
This method further comprises: by circular file the document in the described multiple index memory subsystem is managed.
This method further comprises: by the global document table index at different levels in the described multiple index memory subsystem are managed.
A kind of disposal route and system that is used for index upgrade provided by the present invention, the internal memory index structure of employing multiple index receives new data by the first order index in the multiple index, and according to the new data that is received first order index is rebuild; When new data is delivered to other index at different levels step by step by first order index, other index at different levels are rebuild successively according to new data.Because first order indexed data volume ratio is less, the index reconstruction speed is very fast, when new data arrives first order index, the index that can finish first order index that only requires a very short time is rebuild, also just only require a very short time and just can from first order index, search this data, thereby improved the index reconstruction speed, fully satisfied the user, increased user's experience searching for ageing requirement.
Embodiment
The technical solution of the present invention is further elaborated below in conjunction with the drawings and specific embodiments.
A kind of disposal system that is used for index upgrade provided by the present invention, as shown in Figure 2, this system comprises: Data Receiving subsystem 10, data distribution subsystem 20, multiple index memory subsystem 30 and disk index subsystem 40.
Wherein, Data Receiving subsystem 10 is used to receive new data.So-called new data is meant the Data Receiving subsystem 10 new data that receive, and comprises deletion lists of documents, newly-increased lists of documents in this new data and upgrades at least a of lists of documents.Deletion comprise in the lists of documents document identification that needs delete (ID, IDentity); Comprise document id and corresponding newly-increased document that needs increase in the newly-increased lists of documents; Upgrade the document id and the corresponding renewal document that comprise in the lists of documents that needs upgrade.
Data distribution subsystem 20 links to each other with Data Receiving subsystem 10, is used for distributing from the new data of Data Receiving subsystem 10.Data distribution subsystem 20 will send to multiple index memory subsystem 30 from the new data of Data Receiving subsystem 10; And according to the disk update cycle of setting, in each finish time disk update cycle, data distribution subsystem 20 sends to disk index subsystem 40 with all new datas that received in this disk update cycle.
Multiple index memory subsystem 30, link to each other with data distribution subsystem 20, multiple index memory subsystem 30 comprises multiple index, as shown in Figure 3, indexed data capacity at different levels increase progressively from higher level to the subordinate successively, data in the multiple index index subordinate's index by the higher level and transmit step by step, and index at different levels is all by master index be equipped with index and constitute.Fig. 3 is a n level index, and L1_Index represents the first order index of n level index, and Ln_Index represents the afterbody index of n level index, as can be seen from Figure 3, when multiple index is carried out information search, be that index at different levels are all searched for, thereby obtain final search result.Be respectively equipped with the threshold value of data transfer among the present invention for index at different levels, when the data volume in certain grade of index reached the threshold value of correspondence, this grade index passed to subordinate's index with the new data of self.
With 3 grades of index be example the transmission between index at different levels describes to data, the data capacity of L1_Index is 2,000,000, the data capacity of L2_Index is 8,000,000, the data capacity of L3_Index is 20,000,000.The threshold value of setting L1_Index is 1,000,000, and the threshold value of L2_Index is 4,000,000, and the threshold value of L3_Index is 10,000,000.L1_Index receives the new data from data distribution subsystem 20, and according to the new data that receives L1_Index is carried out index and rebuild, and L1_Index receives new data at every turn, all needs self index is rebuild; Data volume in L1_Index reaches at 1,000,000 o'clock, and the L1_Index new data that self is current all passes to L2_Index, and after transmission finished, the new data that L1_Index will transmit was deleted from L1_Index.Similarly, L2_Index receives the new data from L1_Index, and carries out the reconstruction of self index according to the new data that receives, and L2_Index receives new data at every turn, all needs self index is rebuild; Data volume in L2_Index reaches at 4,000,000 o'clock, and the L2_Index new data that self is current all passes to L3_Index, triggers L3_Index and carries out the index reconstruction, and after transmission finished, the new data that L2_Index will transmit was deleted from L2_Index.
From above-mentioned for example as can be seen, after higher level's index is given subordinate's index with data transfer, the data deletion in higher level's index, thereby can guarantee that the data in the multiple index do not repeat.In addition, the threshold value of index correspondences at different levels can be set according to actual needs, but the threshold value of every grade of index correspondence setting all is less than indexed data capacity at different levels usually, thereby guarantee index at different levels when data volume reaches threshold value and goes forward side by side the transmission of line data, also having living space is used for receiving the new data of higher level's index.
Disk index subsystem 40, link to each other with data distribution subsystem 20, data distribution subsystem 20 is in each finish time disk update cycle, all new datas that this disk that self is stored was received in the update cycle send to disk index subsystem 40, carry out the reconstruction of disk index by disk index subsystem 40 according to the new data that receives.As can be seen, disk index subsystem 40 is every through a disk update cycle, all can carry out the reconstruction of disk index according to the new data that data distribution subsystem 20 sends, and makes that the information in the disk index subsystem 40 can access renewal timely.Accordingly, because all new datas of being stored in the multiple index memory subsystem 30 all have its corresponding time of reception, multiple index memory subsystem 30 is in the finish time of each disk update cycle, with the new data that is received in update cycle at this disk stored in the multiple index and corresponding index deletion, duplicate in data in the multiple index memory subsystem 30 and the disk index subsystem 40 preventing.
Above-mentioned Data Receiving subsystem 10 comprises that also interconnective data detection module 11 and data merge module 12.Data detection module 11 is used to detect the new data in each sense cycle.The size of sense cycle can be set according to actual needs; Data detection module 11 detects the arrival whether new data is arranged in each sense cycle according to the sense cycle of setting.Data merge module 12, when being used for data detection module 11 detecting many batches of new datas in sense cycle, detected many batches of new datas are merged, and the new data after will merging offer data distribution subsystem 20.Set sense cycle among the present invention, and many batches of new datas in each sense cycle are merged, provide new data to data distribution subsystem 20 with the fixing cycle, can reduce the time jitter of multiple index memory subsystem 30 on index is rebuild.For example: sense cycle is 2 seconds, detects two crowdes of new data A, B in 2 seconds, carries out offering data distribution subsystem 20 after data merge, and can trigger multiple index memory subsystem 30 and carry out secondary index reconstruction; Middle compared to existing technology new data A triggers a secondary index and rebuilds, and new data B triggers a secondary index again and rebuilds, and the index reconstruction time fluctuation among the present invention is less, thereby can reduce the time jitter of multiple index memory subsystem 30 on index is rebuild.
Above-mentioned data distribution subsystem 20 also comprises: disk update cycle setting module 21, sequential file generation module 22 and data distribution module 23.Disk update cycle setting module 21 is used to set the disk update cycle.Sequential file generation module 22, be used for comprising deletion lists of documents, newly-increased lists of documents and upgrading the sequential file that the lists of documents generation is made up of document id according to the new data that receives, the state that also comprises each document correspondence in this sequential file, for example: for the document of needs deletion, the state of document id correspondence is deletion, for document newly-increased and that upgrade, the state of document id correspondence is effective.Data distribution module 23, link to each other with sequential file generation module 22 with disk update cycle setting module 21, the new data that is used for receiving sends to multiple index memory subsystem 30, and according to the disk update cycle of setting, in each finish time disk update cycle, the new data that this disk was received in the update cycle and the sequential file of generation send to disk index subsystem 40.
The data distribution flow process that is realized by data distribution subsystem 20 among the present invention mainly may further comprise the steps as shown in Figure 4:
Step 401 is according to the new data formation sequence file that receives.
Data distribution subsystem 20 is according to the deletion lists of documents that is comprised in the new data, newly-increased lists of documents and upgrade lists of documents, and the formation sequence file comprises each document id and corresponding document status in the new data in this sequential file.
Step 402 judges according to the disk update cycle of setting whether needs are rebuild the disk index, if then execution in step 403; Otherwise, execution in step 404.
Data distribution subsystem 20 is according to the disk update cycle of setting, determine the time point that rebuild for the disk index each finish time disk update cycle, if when arriving this time point, judging needs to rebuild the disk index, and the other times section is then judged does not need to rebuild the disk index.
Step 403, the new data that receives in disk index subsystem 40 sends these disk index cycles and the sequential file of generation trigger disk index subsystem 40 and carry out the disk index and rebuild, and finish current flow process then.
Disk index subsystem 40 knows that according to sequential file needs carry out the increase of which document, deletion or upgrade operation, execute increase, deletion and the renewal operation of respective document after, rebuild the disk index.
Step 404 sends new data to multiple index memory subsystem 30.
When having new data to arrive in data distribution subsystem 20, data distribution subsystem 20 is transmitted to multiple index memory subsystem 30 with new data, triggers multiple index memory subsystem 30 and carries out the reconstruction of internal memory index.
Multiple index memory subsystem 30 among the present invention also comprises: data transfer module 31 and index rebuilding module 32.Data transfer module 31 is used for the new data from data distribution subsystem 20 is delivered to other index at different levels step by step by first order index, and the detailed process front that new data transmits step by step describes in detail, repeats no more herein.Index rebuilding module 32 links to each other with data transfer module 31, is used for the new data that receives according to index at different levels, and index at different levels are rebuild.
In addition, in order to guarantee that index at different levels in the multiple index memory subsystem 30 are when rebuilding, still can provide normal information search service for the user, index at different levels of the present invention is all by master index be equipped with index and form, thereby make one of them index when rebuilding, can also select the another one index to provide service for the user.Based on this, multiple index memory subsystem 30 among the present invention also comprises index handover module 33, link to each other with index rebuilding module 32, be used for when the master index of index at different levels is rebuild with one of them index that is equipped with index, service is switched to another index, so that normally providing of service to be provided.
Rebuilding, be equipped with index with master index below, service is provided is example, and the flow process that index is switched is described in detail, and as shown in Figure 5, mainly may further comprise the steps:
Step 501 when receiving new data, is carried out index by master index and is rebuild, and triggers index and switches, and service is switched to index fully, provides service by indexed.
For example: L1_Index is by master index a and be equipped with index b and form, has new data to come then in L1_Index, and master index a rebuilds, and all services on the L1_Index all switch to index b fully, during master index a rebuilds, provides service by index b fully.
After step 502, master index were rebuild and finished, whether also providing service, if then execution in step 503 if determining to be equipped with index; Otherwise, execution in step 504.
Step 503 is waited for that being equipped with index provides service, and is returned step 502, up to determining that being equipped with index is not providing service.
Step 504 is replaced the master index after rebuilding and is equipped with index, and by the master index after rebuilding or be equipped with index service is provided.
Still with describing for example in the step 501, master index a obtains new master index a1 after rebuilding, and need also upgrade this moment to being equipped with index b, be equipped with the synchronous of index and master index to reach, therefore, new master index a1 is replaced index b fully, become new index b1 fully.
It is to be noted, adopt the framework of multiple index among the present invention, because when higher level's indexed data amount reaches threshold value, just new data is delivered to subordinate's index, triggering subordinate's index rebuilds, therefore, in the time of can existing higher level's index to carry out document deletion or renewal operation, the index change in higher level's index can't in time be reacted to the situation of subordinate's index.Above-mentioned situation can cause the user searching for to be, old document or deleted document are still searched to come out.So, the present invention adopts the global document table that index is carried out unified management, as shown in Figure 6, in the global document table, all document ids of multiple index memory subsystem 30 are arranged in order, the corresponding document bitmap of each document id comprises n bit in each document bitmap, whether each bit is used for identifying the document respectively and is present in the index at different levels.With 3 grades of index is example, has 3 bits in the document bitmap of document id 1 correspondence, the corresponding L1_Index of first bit, second corresponding L2_Index of bit, the 3rd the corresponding L3_Index of bit; During first bit position 1, show that document id 1 is present among the L1_Index, during first bit position 0, show that document id 1 is not present among the L1_Index.The function of other two bits and first bit are similar, repeat no more.
This shows, when document transmits between the different stage index, need be the bit position 0 of the index correspondence at the document place before transmitting, and the bit position 1 of the index correspondence at the document place after transmitting.For example: in 3 grades of above-mentioned index, document id 1 need be with first bit position 0 of the document bitmap of document id 1 correspondence when L1_Index is delivered to L2_Index, and with second bit position 1.In addition, if document id 1 all exists in 3 grades of index, when then document id 1 is updated in L1_Index, need be with second bit and the 3rd bit position 0 of the document bitmap of document id 1 correspondence, with the document id 1 among expression L2_Index and the L3_Index is invalid, thereby when having avoided user search, document id 1 is upgraded preceding old document searching come out.When document id 1 is performed deletion action, document id in the global document table 1 and corresponding document bitmap need be emptied.
Because increase, deletion, renewal operation to document in the multiple index memory subsystem 30 are very frequent, therefore, the present invention adopts the framework of circular file that each document in the multiple index memory subsystem 30 is managed, can be by the head pointer of circular file and moving of tail pointer, the convenient various operations that realize document.Further describe below in conjunction with Fig. 7, indicate the effective range of document among Fig. 7 by head pointer (Head) and tail pointer (Tail), original document comprises document 1, document 2, document 3, document 4, document 5 and document 6; When needing to increase document 7 newly, only need Tail is moved one backward, and the corresponding content of document 7 is inserted in the circular file in order gets final product; When needing deletion document 1, document 2 and document 3, need not physically to remove document 1, document 2 and document 3, only need Head is moved three backward, point to document 4 and get final product; Continue newly-increased document 8 and document 9 if desired, still insert document in order, and corresponding mobile Tail, when arriving the circular file afterbody, the head that then returns circular file continues insertion and gets final product.In addition, deleting the document 5 in the circular file if desired, then do not need mobile Head and Tail, by the global document table shown in Fig. 6, with each bit position 0 of the document bitmap of document 5 correspondences, is invalid getting final product with expression document 5 states.
Adopt the framework of circular file, only need move the convenient management that Head and Tail can realize document according to deletion, increase and the renewal needs of document, the operation of creating new file and duplicating effective document again compared to existing technology, the way to manage of circular file can not consume the too much document management time.
At hardware aspect, set up an administration module 34 in the multiple index memory subsystem 30 of the present invention, link to each other with index rebuilding module 32 with data transfer module 31, in order to the document in the multiple index memory subsystem 30 being managed, and the index in the multiple index memory subsystem 30 is managed by the global document table by circular file.
In addition, can also be among the present invention with the next stage index of disk index subsystem 40 as multiple index memory subsystem 30, be that the afterbody indexed data amount of multiple index memory subsystem 30 is when reaching the threshold value of setting, new data in the afterbody index is passed to disk index subsystem 40, trigger disk index subsystem 40 and carry out the reconstruction of disk index, after transmitting end, new data deletion with having transmitted in the afterbody index duplicates to avoid the data in multiple index memory subsystem 30 and the disk index subsystem 40.Under this embodiment, no longer need to be provided with the disk index upgrade cycle, data distribution subsystem 20 does not need regularly to send new data triggering disk index subsystem 40 to disk index subsystem 40 yet and carries out the reconstruction of disk index, 40 of disk index subsystems need be when the new data during multiple index memory subsystem 30 is arranged arrives, carry out the reconstruction of disk index and get final product, thereby the data distribution of having simplified data distribution subsystem 20 is operated.
The present invention is by multiple index memory subsystem 30, the realization multiple index is carried out index respectively and is rebuild, because the data capacity of L1_Index is smaller usually, make that the index reconstruction speed of L1_Index is very fast, when new data arrives multiple index memory subsystem 30, the index that can finish L1_Index that only requires a very short time is rebuild, and also just only requiring a very short time just can search this data from L1_Index.For knowledge search, the problem that the user proposes searched engine search is at once come out, thereby has accelerated the speed that problem is solved, and has improved the probability that problem is solved; For news search, news flash can be searched for fast, fully satisfies the user to searching for ageing requirement, has increased user's experience.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.