CN102103602B

CN102103602B - System and method for increasing retrieval speed

Info

Publication number: CN102103602B
Application number: CN 200910242857
Authority: CN
Inventors: 唐年鹏; 黄耀豪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Shiji Guangsu Information Technology Co Ltd
Priority date: 2009-12-17
Filing date: 2009-12-17
Publication date: 2013-02-27
Anticipated expiration: 2029-12-17
Also published as: CN102103602A

Abstract

The invention discloses a system for increasing a retrieval speed. In the system, an index process realizing unit is used for realizing an index process; a retrieval process realizing unit is used for realizing a retrieval process; the index process realizing unit and the retrieval process realizing unit run in different processes; and respective index realizing process and retrieval realizing process are independent of each other. The invention also discloses a method for increasing the retrieval speed. The method comprises the following steps of: separating index from retrieval; and running the index and the retrieval in different processes. By adopting the system and the method, the retrieval speed of key information based on the index by a user can be increased.

Description

A kind of system and method that improves retrieval rate

Technical field

The present invention relates to retrieval technique, relate in particular to a kind of system and method that improves retrieval rate.

Background technology

The Internet era arrival opened information revolution, has grasped bulk information and just can in daily, commercial affairs life, be in the invincible position, but quantity of information is so huge that how selecting out useful key message just becomes the focus that people discuss all the time.At present, in user's information processing, generally can use retrieval technique and obtain key message, and the quality of retrieval technique has directly determined the performance need of the various aspects such as quantity, quality and renewal of the key message that retrieves.In existing retrieval technique based on index, generally include following three aspects: content:

First: index and retrieval are put in the process.Because the two exists in a process simultaneously, unstripped processing separately, therefore, each other performance can interact.That is to say, when index, owing to take more CPU, can cause retrieval performance to have bottleneck.

The second: one retrieval be a unique corresponding index database only.Because index and retrieval do not separate, and do not have synchronous handover mechanism, therefore, when synchronous index, current retrieval can not be served, and can cause the retrieval service interruption during index that is:; And at every turn even only upgrade a record certificate, also need whole index is rebuild, upgraded, can the speed that data are upgraded be limited to some extent.

The the 3rd: the index database that data volume of single-threaded retrieval is larger or a plurality of index database.Because inverted list is crossed the increase that conference causes operand, therefore, cause the corresponding increase of response time of single retrieval, thereby cause the integral retrieval performance not high.

Summary of the invention

In view of this, fundamental purpose of the present invention is to provide a kind of system and method that improves retrieval rate, can improve the user based on the retrieval rate of index to key message.

For achieving the above object, technical scheme of the present invention is achieved in that

A kind of system that improves retrieval rate, this system comprises: the realization unit of Index process and the realization unit of retrieving; Wherein,

The realization unit of Index process is used for realizing Index process;

The realization unit of retrieving is used for realizing retrieving;

The realization unit of described Index process, operate in the different processes from the realization unit of described retrieving, index implementation procedure separately and retrieval implementation procedure are separate.

Wherein, be the corresponding relation of one-to-many between the realization unit of the realization unit of described Index process and described retrieving;

The realization unit of described Index process is further used for the data communication device of storage is crossed Network Synchronization in the realization unit of corresponding a plurality of retrievings, realizes the synchronous renewal of data; Wherein, between the realization unit of a plurality of retrievings corresponding with the realization unit of same Index process, the data of storage are identical and backup each other.

Wherein, the realization unit of described Index process further comprises: the data receiver thread, rebuild control thread, index and rebuild thread, more new thread and index data send-thread; Wherein,

The data receiver thread is used for the reception hint data source;

Thread rebuild in index, is used under the scheduling of rebuilding the control thread, will carry out according to the index database that the index data source has been set up index and rebuild;

Rebuild the control thread, be used for scheduling controlling is carried out in the index reconstruction of index reconstruction thread execution;

More new thread after being used for receiving the index data source update notification of data receiver thread transmission, starts the renewal operation of index data;

The index data send-thread, be used for index database set up finish after, the index data in the index database is all sent to the realization unit of described retrieving by Network Synchronization, enter into retrieval process by current index process and process.

Wherein, described reconstruction control thread is further used for producing the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval.

Wherein, described reconstruction control thread is further used for the described task of needing to rebuild is compressed into wait reconstruction formation; And detect rebuild finish in the formation finish information the time delete this subtask;

Thread rebuild in described index, is further used for waiting for and rebuilding when the described task of needing to rebuild is arranged in the formation when detecting, and the described task of needing to rebuild taken out from wait for the reconstruction formation rebuild; And rebuild and to finish information after finishing and be pressed into to rebuild and finish formation and finish once and rebuild.

Wherein, the realization unit of described retrieving further comprises: receiving thread, retrieval split thread, retrieval process thread and return thread; Wherein,

Receiving thread be used for to receive retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation;

Retrieval splits thread, is used for splitting described retrieval request formation, and a plurality of retrieval request in the retrieval request formation are split each piece in the index database;

The retrieval process thread is a plurality of, is used for adopting each piece of index database, a plurality of retrieval request is carried out respectively piecemeal process;

Return thread, be used for a plurality of retrieval request are carried out respectively after piecemeal processes, the data that piecemeal is handled reconsolidate, and are loaded into retrieval and return formation and return.

Wherein, described index database comprises: historical index database and new data index database; Use when wherein, described historical index database is used for retrieval service; Described new data index database is used for using when index data upgrades.

Wherein, the index data in described historical index database and the described new data index database all adopts partitioned mode to store;

Wherein, described historical index database adopts delivery mode divided block, and the storage take bulk as unit; Described new data index database adopts time mode divided block, and the storage take fritter as unit.

A kind of method that improves retrieval rate, the method comprises: with index and retrieve separate, described index operates in respectively in the different processes from described retrieval.

Wherein, be the relation of one-to-many between described index and the described retrieval;

Between a plurality of retrievals corresponding with current same index, the data of storage are identical and backup each other; The data communication device of storing in the current index is crossed Network Synchronization in corresponding a plurality of retrievals, realizes the synchronous renewal of data.

Wherein, the implementation procedure of described index specifically comprises:

Set up index database according to the index data source that the data receiver thread receives;

Index is rebuild thread under the scheduling of rebuilding the control thread, the index database of having set up is carried out index rebuild;

After more new thread is received the index data source update notification of data receiver thread transmission, start the renewal operation of index data;

The index data send-thread index database set up or rebuild finish after, the index data in the index database is all offered retrieval service by Network Synchronization, and enters into retrieval process by current index process and process.

Wherein, the index database that described index reconstruction thread will have been set up carries out described index to be rebuild, and specifically comprises:

Produce the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval;

Reconstruction control thread is compressed into wait reconstruction formation with the described task of needing to rebuild; Rebuild thread when index and detect and wait for and rebuilding when the described task of needing to rebuild is arranged in the formation, the described task of needing to rebuild is rebuild from waiting for rebuilding to take out the formation;

Index rebuild thread rebuild finish after, the information of will finishing is pressed into to rebuild and finishes formation and finish once and rebuild; When rebuild the control thread detect rebuild finish in the formation finish information the time, delete this subtask.

Wherein, the implementation procedure of described retrieval specifically comprises:

Receiving thread receives retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation;

Retrieval splits thread and splits described retrieval request formation, and a plurality of retrieval request in the retrieval request formation are split each piece in the index database;

A plurality of retrieval process threads adopt each piece in the index database, a plurality of retrieval request are carried out respectively piecemeal process;

Return thread a plurality of retrieval request are carried out respectively after piecemeal processes, the data that piecemeal is handled reconsolidate, and are loaded into retrieval and return formation and return.

Wherein, described index database comprises: historical index database and new data index database;

Described historical index database and described new data index database all adopt piecemeal to process; Wherein, historical index database carries out piecemeal in the delivery mode, and the employing bulk is that unit is divided into polylith; The new data index database carries out piecemeal in the time mode, and the employing fritter is that unit is divided into polylith.The present invention is with index and retrieve separate, and index operates in respectively in the different processes from retrieval.

Adopt the present invention, because index and retrieve separate, the implementation of index and the implementation of retrieval, operate in respectively in the different processes, therefore, index implementation procedure separately and retrieval implementation procedure are separate, can't have influence on the performance of retrieval when index, thereby improve retrieval rate.

Description of drawings

Fig. 1 is the index of example one of the present invention and the system architecture synoptic diagram of retrieve separate;

Fig. 2 is the index synoptic diagram of example two of the present invention;

Fig. 3 is the structural representation of the index database of example three of the present invention;

Fig. 4 is the structural representation of Data Update in the index database among Fig. 3;

Fig. 5 is the scheduling controlling synoptic diagram that the index of example four of the present invention is rebuild;

Fig. 6 is the retrieval synoptic diagram of example five of the present invention.

Embodiment

Basic thought of the present invention is: with index and retrieve separate, index operates in respectively in the different processes from retrieval.

Be described in further detail below in conjunction with the enforcement of accompanying drawing to technical scheme.

The present invention is a kind of scheme that improves retrieval rate, mainly is by the index database classification, and index and retrieve separate, thereby realizes quick-searching, thereby has improved retrieval rate.

Further, because prior art is not high except retrieval rate, outside the bad shortcoming of integral retrieval performance, also have retrieval and the nonsynchronous shortcoming of Data Update performance, be: when retrieval rate was fast, the new data renewal speed was slow; Perhaps, when the new data renewal speed is fast, but retrieval rate is slow or the time is fast when slow.The present invention has also solved this nonsynchronous shortcoming of prior art, on a kind of basis improving retrieval rate, improve simultaneously the scheme of Data Update speed, mainly be by the index database piecemeal, comprise mutually independently historical index database piecemeal and new data index database piecemeal, thereby realize the quick renewal of new data.In a word, adopt the present invention, can significantly improve retrieval rate; Significantly improve the speed that new data upgrades; And fast more under the news, still guarantee higher retrieval rate at new data.

Below to the present invention's description of giving an example.

Here it is to be noted: in the entire system framework of index and retrieve separate, mainly be divided into several major parts on the logical organization, that is: be used for realize the part of retrieving---the realization unit of retrieving and be used for realizing the part of Index process---realization unit of Index process.Be: logically with index and retrieve separate, to realize quick-searching.Can also comprise: the part that is used for retrieval agent---the realization unit of retrieval agent and be used for the part of retrieval buffer memory---realization unit of retrieval buffer memory in the logical organization.And, the present invention is different to retrieve with index in the prior art and concerns one to one, but adopt retrieval and the many-to-one relation of index, that is: the modes of the corresponding index of a plurality of retrieval, mutually backup between a plurality of retrievals, thereby avoided in the prior art retrieval and index one to one, operation does not each other separate that the renewal speed that causes is slow, the defective such as interrupt search service during index.

Physical arrangement based on above-mentioned logical organization has a lot, below only is described below for example with example shown in Figure 1.

Example one: index and retrieve separate.

Be illustrated in figure 1 as the system architecture synoptic diagram of index and retrieve separate, among Fig. 1, the realization unit of retrieving is specially retrieval 00 and retrieval 01 and retrieves 10 and retrieve 11; The realization unit of Index process is specially index 0, index 1; The realization unit that the realization unit of retrieval agent is specially retrieval agent, retrieval buffer memory is specially the retrieval buffer memory.

Wherein, index 0, the different data of index 1 difference index.Two parts of identical index datas that index 0 provides are equipped with in retrieval 00, retrieval 01.Two parts of identical index datas that index 1 provides are equipped with in retrieval 10, retrieval 11.Between corresponding index and the retrieval, such as passing through Network Synchronization between retrieval 00, retrieval 01 and the index 0, to guarantee renewal and the consistance of index data.Here it is to be noted, Fig. 1 only is based on the application on the concrete physical arrangement of above-mentioned logical organization division description, in practical application, do not limit the situation of only having 2 corresponding index of retrieval, can be the corresponding index of a plurality of retrievals, mutually backup between a plurality of retrievals, and a plurality of retrievals can arrange at a machine, also can arrange respectively on many machines.Index and retrieve separate make index can exist in different processes with retrieval, peeled off treatment progress separately, thereby each other the performance of can not interacting have improved retrieval performance.

Example two: be illustrated in figure 2 as the index synoptic diagram of this example, mainly comprise: data receiver thread, index data send-thread, rebuild the control thread, thread, new thread more rebuild in index.Wherein, adopting more new thread can upgrade various information, mainly is the renewal of the information such as logarithm value information and document deletion.Employing is rebuild control and is rebuild thread with index, realizes that the scheduling controlling of index reconstruction sees the description of following instance four for details.

As can be seen from Figure 2: the data receiver thread, be used on the one hand the reception hint data source, so that the data source of setting up index database to be provided, that is: the follow-up index database that is set up as of this data source; On the other hand, the data receiver thread is used for receiving continuously follow-up new index data source, and the update notifications thread starts the renewal of index data.Thread rebuild in index, is used for carrying out index and rebuilds under the scheduling of rebuilding the control thread, and old index database is redeveloped into new index database.Rebuild the control thread, be used for the scheduling controlling index and rebuild.New thread more, after being used for receiving the notice of data receiver thread, log-on data is upgraded.The index data send-thread, be used for index database set up or rebuild finish after, the index data in the index database is all sent to retrieving portion shown in Figure 1 by Network Synchronization, enter into retrieval process by current index process and process.It is to be noted: the place that relates to " index database " herein all refers to comprise the index database of historical index database and new data index database.

Among Fig. 2, index is rebuild thread and comprised: thread 0 rebuild in index and thread 1 rebuild in index.It is pointed out that being not limited among Fig. 2 index in the practical application rebuilds thread 1, the index between old historical index database and new historical index database is rebuild; And index is rebuild the index of thread 0 between old new data index database and new new data index database and is rebuild.Thread 0 rebuild in index, index is rebuild thread 1 and can be exchanged use, and such as rebuilding thread 0 by index, the index between old historical index database and new historical index database is rebuild.Wherein, old historical index database is rebuild shown in thread 1 below such as index among Fig. 2, and new historical index database is rebuild shown in thread 1 top such as index among Fig. 2; Same, old new data index database is rebuild shown in thread 0 below such as index among Fig. 2, and new new data index database is rebuild shown in thread 0 top such as index among Fig. 2.Herein " old " refers to: untreated index database is the untreated index database of initial foundation such as old new data index database; " newly " herein refers to the index database of having finished dealing with, the index database of having finished dealing with after referring to upgrade such as new historical index database.

And historical index database or new data index database all adopt piecemeal to process, and different is: historical index database adopts " bulk " for unit is divided into polylith, and is to carry out piecemeal in the delivery mode; The new data index database adopts " fritter " for unit is divided into polylith, and is to carry out piecemeal in the time mode.Here, the purpose of piecemeal can take full advantage of the respectively processing that a plurality of CPU carry out multitask.Wherein, with regard to " bulk " piecemeal, it is relevant with retrieval, and a plurality of CPU process respectively different " bulk ", can improve retrieval rate; With regard to " fritter " piecemeal, it is relevant with Data Update, and a plurality of CPU process respectively different " fritter ", can improve Data Update speed.Because, the processing of employing piecemeal can not only realize the respectively processing of multitask, and with of the prior art one whole storage area, further be divided into block storage area one by one, index rebuild and during Data Update since for processing region diminish, will certainly increase and retrieve or the speed of Data Update." fritter " that be used for Data Update also is limited number, when " fritter " is fast full, in order to improve Data Update speed, in the new data index database that " fritter " can be divided, the data of having handled by " fritter " are inserted in the historical data base of " bulk " division, empty " fritter " of handling data.Concrete, adopt the structure of historical index database that piecemeal processes and new data index database to see the description of the example three that following Fig. 3, Fig. 4 disclose for details.

Example three: be illustrated in figure 3 as the structural representation of the index database of this example, index database mainly is divided into two major parts, historical index database and new data index database.Wherein, historical index database is several " bulks " according to Docid delivery mode by piecemeal.Each piece has data about equally.The new data index database is several " fritters " according to the time mode by piecemeal, but every max cap. data equate.Data are divided according to the sequencing that enters, and it is empty allowing the partial block data.Historical index database and new data index database are to be configured according to the concrete condition of data.

For configuration, be configured according to the number in storehouse, the main and data volume of the operation of configuration, machine CPU number, disk, memory size and data characteristics, Data Update rate request are relevant.Principle is to have taken full advantage of machine resources.With respect to historical index database, the new data index database also can be called little storehouse, and little database data is more, and Data Update speed can be faster.

For the cutting of historical index database, the cutting of historical index database just is cut into several " bulks " after final index is finished, and historical index database and new data index database are fully mutually independently.Wherein, no matter historical index database or new data index database, index data wherein all comprises: inverted index, inverted list, deletion information, numerical information, the various data such as library information.And final inverted list is divided into two, and first has been data, and second is data inferior.The differentiation of good data can grade to divide according to the numerical attribute of text relevant or document.Mention in the index database structure, the division of bad data, be not limited to two.It can be polylith.Dividing also can be a comprehensive grading.Here, for comprehensive grading, comprehensive grading can calculate comprehensive text weight according to the numerical value of text, content of text etc.

Be illustrated in figure 4 as among Fig. 3 the structural representation of Data Update in the index database, the content that discloses is: when new data index database " fritter " is expired soon, the data that " fritter " handled are inserted historical index database and emptied " fritter ", in order to restart to set up the new data index database.

Among Fig. 3, the shade of left oblique line is filled each " bulk " in the historical index database of expression; The shade that oblique line intersects is filled " fritter " of having finished dealing with in the expression new data index database; The shade of right oblique line is filled " fritter " processed in the expression new data index database.Among Fig. 4, the shade of left oblique line is filled each initial " bulk " in the historical index database of expression; Anyhow the shade that intersects is filled after expression inserts historical index database with " fritter " of having finished dealing with in the new data index database, new " bulk " in the history of forming index database; The shade that oblique line intersects is filled " fritter " of having finished dealing with in the expression new data index database after inserting historical index database, current " fritter " that empties; The shade of right oblique line is filled " fritter " processed in the expression new data index database.

For above Fig. 3, Fig. 4, the foundation of index is described below:

Suppose that Fig. 3 represents that the new data index database has been built to the 2nd " fritter ", surpassed half of 3 of total block datas.Fig. 4 represents so, and historical index database has also been set up index with the data of 0,1 in the new data index database, and has been divided into four of arrow institute cutting below Fig. 4.And 0,1 resource that has before in the new data index database can be used to set up the index of new data.So just guarantee that new data sets up the possibility of index immediately.Historical index database all sends to retrieving portion shown in Figure 1 with index data by network after the index foundation of new data index database is finished; After being sent completely,, and notifying this retrieving portion to switch to up-to-date index data and carry out retrieval service to retrieval process by the index process switching.

Example four: the scheduling controlling synoptic diagram that is illustrated in figure 5 as the index reconstruction of this example; Among Fig. 5, mainly comprise: rebuild the control thread, wait for that formation is finished in reconstruction formation, reconstruction and thread rebuild in index.Wherein, rebuild the situation that the control thread is set up according to index, data what and the time interval etc. produce the task of needing reconstruction.

The priority of two index reconstruction threads equates in this example.Namely two index are rebuild thread and all might be responsible for rebuilding historical index database and new data index database.Each other index database of level can only have a task simultaneously in operation in process of reconstruction.As can be seen from Figure 5 the scheduling controlling process of index reconstruction is: the task that rebuilding the control thread will need to rebuild is compressed into wait reconstruction formation; Index is rebuild thread and is detected wait reconstruction formation, when detecting the task of reconstruction this task is rebuild from waiting for rebuilding to take out the formation; The information of will finishing after reconstruction is finished is pressed into to rebuild and finishes formation and finish once and rebuild.Rebuild the control thread and detect to rebuild and finishes formation, if detect the information of finishing then delete this time and finish the work.Wait for the next time beginning of reconstruction tasks.

Example five: the retrieval synoptic diagram that is illustrated in figure 6 as this example, among Fig. 6, mainly comprise: receiving thread, return thread, retrieval request formation, retrieval and return a plurality of retrieval process threads that formation, retrieval split thread, are made of a plurality of piecemeal processing threads.Wherein, receiving thread be used for to receive retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation according to sequencing.Retrieval splits the retrieval request that thread is used for sequentially extracting formation, corresponds to respectively each index database, is: the retrieval request formation is split each corresponding index database.The piecemeal processing threads is processed respectively retrieval request for the piece in each index database; And when processing retrieval request, can carry out index and switch, realize the switching between the active and standby index.Returning thread is used for returning the retrieval agent of front end after will again receiving for the data that retrieval request is handled and load after processing retrieval request respectively for the piece of each index database.Wherein, when index switched, because the present invention is processing with the index database piecemeal, therefore, each index database was independent, did not need in the prior art one to one active and standby index, such as, can adopt the situation of 4 corresponding backups of historical index database.

As can be seen from Figure 6 retrieving is: retrieval request is divided into corresponding historical index database and the retrieval request of each piece of new data index database.Although the mode of historical index database and these two kinds of storehouse piecemeals of new data index database is different, arranged dividing of " bulk " and " fritter " division, final index file all is independently.Each retrieval process thread of index also is equal, can process the data of any one piece, each retrieval process thread competition ground from formation constantly the request of taking out process, be put into retrieval after the retrieval process thread of finally handling merges the result and return formation and send.The main buffer unit Divided Retrieving of intermediate result buffer memory result's Docid tabulation.

In sum, adopt the present invention, can reduce the computing cost.To the index database of piecemeal, process in the time of multitask, a plurality of thread and not only can take full advantage of CPU, more because with after index and the retrieve separate, inverted list is corresponding dwindling all.The calculated amount of single thread also obviously reduces.Thereby whole retrieval performance and the Data Update speed of improving rapidly.

It is to be noted: in the above-mentioned retrieving, be the processing of adopting multithreading, and be that a retrieval splits the corresponding a plurality of retrieval process threads of thread.Threading model in this retrieving also can become each storehouse of single-threaded sequential search, because storehouse of the present invention adopts piecemeal to process, data volume is less in each piece, therefore is different from the huge complete storehouse of retrieve data amount single-threaded in the prior art, thereby also can improves retrieval rate.Also have a kind of mode to be: a plurality of retrievals can be set split thread, that is: a plurality of retrievals split the corresponding retrieval process thread of thread; A plurality of retrievals fractionation threads are given task assignment and are processed the retrieval process thread, and receive the result of retrieval process thread, gather rear transmission.

The implication of above some that relates to being refered in particular to literal here, makes an explanation and is described as follows:

So-called index refers to: to data encode, in-line arrangement, the row of falling process, and finally generates inverted index and various inverted file.

So-called retrieval refers to: externally ask string according to the user, retrieval service is provided.

So-called retrieval agent refers to: the result who merges a plurality of retrievals.

So-called retrieval buffer memory refers to: the result of buffer memory retrieval.

So-called Docid refers to: the unique number of document.

The above is preferred embodiment of the present invention only, is not for limiting protection scope of the present invention.

Claims

1. a system that improves retrieval rate is characterized in that, this system comprises: the realization unit of Index process and the realization unit of retrieving; Wherein,

The realization unit of Index process is used for realizing Index process;

The realization unit of retrieving is used for realizing retrieving;

The realization unit of described Index process, operate in the different processes from the realization unit of described retrieving, index implementation procedure separately and retrieval implementation procedure are separate;

Wherein, the realization unit of described Index process comprises: the data receiver thread, rebuild control thread, index and rebuild thread, more new thread and index data send-thread; Wherein,

The data receiver thread is used for the reception hint data source;

2. system according to claim 1 is characterized in that, is the corresponding relation of one-to-many between the realization unit of described Index process and the realization unit of described retrieving;

3. system according to claim 1 is characterized in that, described reconstruction control thread is further used for producing the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval.

4. system according to claim 3 is characterized in that, described reconstruction control thread is further used for the described task of needing to rebuild is compressed into wait reconstruction formation; And detect rebuild finish in the formation finish information the time delete this subtask;

5. system according to claim 1 is characterized in that, the realization unit of described retrieving further comprises: receiving thread, retrieval split thread, retrieval process thread and return thread; Wherein,

6. each described system in 5 according to claim 1 is characterized in that described index database comprises: historical index database and new data index database; Use when wherein, described historical index database is used for retrieval service; Described new data index database is used for using when index data upgrades.

7. system according to claim 6 is characterized in that, the index data in described historical index database and the described new data index database all adopts partitioned mode to store;

8. method that improves retrieval rate, it is characterized in that the method comprises: with index and retrieve separate, described index operates in respectively in the different processes from described retrieval;

9. method according to claim 8 is characterized in that, is the relation of one-to-many between described index and the described retrieval;

10. method according to claim 8 is characterized in that, the index database that described index reconstruction thread will have been set up carries out described index to be rebuild, and specifically comprises:

11. method according to claim 8 is characterized in that, the implementation procedure of described retrieval specifically comprises:

12. each described method in 11 is characterized in that described index database comprises: historical index database and new data index database according to claim 8;

Described historical index database and described new data index database all adopt piecemeal to process; Wherein, historical index database carries out piecemeal in the delivery mode, and the employing bulk is that unit is divided into polylith; The new data index database carries out piecemeal in the time mode, and the employing fritter is that unit is divided into polylith.