CN102103602B - System and method for increasing retrieval speed - Google Patents

System and method for increasing retrieval speed Download PDF

Info

Publication number
CN102103602B
CN102103602B CN 200910242857 CN200910242857A CN102103602B CN 102103602 B CN102103602 B CN 102103602B CN 200910242857 CN200910242857 CN 200910242857 CN 200910242857 A CN200910242857 A CN 200910242857A CN 102103602 B CN102103602 B CN 102103602B
Authority
CN
China
Prior art keywords
index
thread
retrieval
rebuild
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200910242857
Other languages
Chinese (zh)
Other versions
CN102103602A (en
Inventor
唐年鹏
黄耀豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN 200910242857 priority Critical patent/CN102103602B/en
Publication of CN102103602A publication Critical patent/CN102103602A/en
Application granted granted Critical
Publication of CN102103602B publication Critical patent/CN102103602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a system for increasing a retrieval speed. In the system, an index process realizing unit is used for realizing an index process; a retrieval process realizing unit is used for realizing a retrieval process; the index process realizing unit and the retrieval process realizing unit run in different processes; and respective index realizing process and retrieval realizing process are independent of each other. The invention also discloses a method for increasing the retrieval speed. The method comprises the following steps of: separating index from retrieval; and running the index and the retrieval in different processes. By adopting the system and the method, the retrieval speed of key information based on the index by a user can be increased.

Description

A kind of system and method that improves retrieval rate
Technical field
The present invention relates to retrieval technique, relate in particular to a kind of system and method that improves retrieval rate.
Background technology
The Internet era arrival opened information revolution, has grasped bulk information and just can in daily, commercial affairs life, be in the invincible position, but quantity of information is so huge that how selecting out useful key message just becomes the focus that people discuss all the time.At present, in user's information processing, generally can use retrieval technique and obtain key message, and the quality of retrieval technique has directly determined the performance need of the various aspects such as quantity, quality and renewal of the key message that retrieves.In existing retrieval technique based on index, generally include following three aspects: content:
First: index and retrieval are put in the process.Because the two exists in a process simultaneously, unstripped processing separately, therefore, each other performance can interact.That is to say, when index, owing to take more CPU, can cause retrieval performance to have bottleneck.
The second: one retrieval be a unique corresponding index database only.Because index and retrieval do not separate, and do not have synchronous handover mechanism, therefore, when synchronous index, current retrieval can not be served, and can cause the retrieval service interruption during index that is:; And at every turn even only upgrade a record certificate, also need whole index is rebuild, upgraded, can the speed that data are upgraded be limited to some extent.
The the 3rd: the index database that data volume of single-threaded retrieval is larger or a plurality of index database.Because inverted list is crossed the increase that conference causes operand, therefore, cause the corresponding increase of response time of single retrieval, thereby cause the integral retrieval performance not high.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of system and method that improves retrieval rate, can improve the user based on the retrieval rate of index to key message.
For achieving the above object, technical scheme of the present invention is achieved in that
A kind of system that improves retrieval rate, this system comprises: the realization unit of Index process and the realization unit of retrieving; Wherein,
The realization unit of Index process is used for realizing Index process;
The realization unit of retrieving is used for realizing retrieving;
The realization unit of described Index process, operate in the different processes from the realization unit of described retrieving, index implementation procedure separately and retrieval implementation procedure are separate.
Wherein, be the corresponding relation of one-to-many between the realization unit of the realization unit of described Index process and described retrieving;
The realization unit of described Index process is further used for the data communication device of storage is crossed Network Synchronization in the realization unit of corresponding a plurality of retrievings, realizes the synchronous renewal of data; Wherein, between the realization unit of a plurality of retrievings corresponding with the realization unit of same Index process, the data of storage are identical and backup each other.
Wherein, the realization unit of described Index process further comprises: the data receiver thread, rebuild control thread, index and rebuild thread, more new thread and index data send-thread; Wherein,
The data receiver thread is used for the reception hint data source;
Thread rebuild in index, is used under the scheduling of rebuilding the control thread, will carry out according to the index database that the index data source has been set up index and rebuild;
Rebuild the control thread, be used for scheduling controlling is carried out in the index reconstruction of index reconstruction thread execution;
More new thread after being used for receiving the index data source update notification of data receiver thread transmission, starts the renewal operation of index data;
The index data send-thread, be used for index database set up finish after, the index data in the index database is all sent to the realization unit of described retrieving by Network Synchronization, enter into retrieval process by current index process and process.
Wherein, described reconstruction control thread is further used for producing the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval.
Wherein, described reconstruction control thread is further used for the described task of needing to rebuild is compressed into wait reconstruction formation; And detect rebuild finish in the formation finish information the time delete this subtask;
Thread rebuild in described index, is further used for waiting for and rebuilding when the described task of needing to rebuild is arranged in the formation when detecting, and the described task of needing to rebuild taken out from wait for the reconstruction formation rebuild; And rebuild and to finish information after finishing and be pressed into to rebuild and finish formation and finish once and rebuild.
Wherein, the realization unit of described retrieving further comprises: receiving thread, retrieval split thread, retrieval process thread and return thread; Wherein,
Receiving thread be used for to receive retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation;
Retrieval splits thread, is used for splitting described retrieval request formation, and a plurality of retrieval request in the retrieval request formation are split each piece in the index database;
The retrieval process thread is a plurality of, is used for adopting each piece of index database, a plurality of retrieval request is carried out respectively piecemeal process;
Return thread, be used for a plurality of retrieval request are carried out respectively after piecemeal processes, the data that piecemeal is handled reconsolidate, and are loaded into retrieval and return formation and return.
Wherein, described index database comprises: historical index database and new data index database; Use when wherein, described historical index database is used for retrieval service; Described new data index database is used for using when index data upgrades.
Wherein, the index data in described historical index database and the described new data index database all adopts partitioned mode to store;
Wherein, described historical index database adopts delivery mode divided block, and the storage take bulk as unit; Described new data index database adopts time mode divided block, and the storage take fritter as unit.
A kind of method that improves retrieval rate, the method comprises: with index and retrieve separate, described index operates in respectively in the different processes from described retrieval.
Wherein, be the relation of one-to-many between described index and the described retrieval;
Between a plurality of retrievals corresponding with current same index, the data of storage are identical and backup each other; The data communication device of storing in the current index is crossed Network Synchronization in corresponding a plurality of retrievals, realizes the synchronous renewal of data.
Wherein, the implementation procedure of described index specifically comprises:
Set up index database according to the index data source that the data receiver thread receives;
Index is rebuild thread under the scheduling of rebuilding the control thread, the index database of having set up is carried out index rebuild;
After more new thread is received the index data source update notification of data receiver thread transmission, start the renewal operation of index data;
The index data send-thread index database set up or rebuild finish after, the index data in the index database is all offered retrieval service by Network Synchronization, and enters into retrieval process by current index process and process.
Wherein, the index database that described index reconstruction thread will have been set up carries out described index to be rebuild, and specifically comprises:
Produce the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval;
Reconstruction control thread is compressed into wait reconstruction formation with the described task of needing to rebuild; Rebuild thread when index and detect and wait for and rebuilding when the described task of needing to rebuild is arranged in the formation, the described task of needing to rebuild is rebuild from waiting for rebuilding to take out the formation;
Index rebuild thread rebuild finish after, the information of will finishing is pressed into to rebuild and finishes formation and finish once and rebuild; When rebuild the control thread detect rebuild finish in the formation finish information the time, delete this subtask.
Wherein, the implementation procedure of described retrieval specifically comprises:
Receiving thread receives retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation;
Retrieval splits thread and splits described retrieval request formation, and a plurality of retrieval request in the retrieval request formation are split each piece in the index database;
A plurality of retrieval process threads adopt each piece in the index database, a plurality of retrieval request are carried out respectively piecemeal process;
Return thread a plurality of retrieval request are carried out respectively after piecemeal processes, the data that piecemeal is handled reconsolidate, and are loaded into retrieval and return formation and return.
Wherein, described index database comprises: historical index database and new data index database;
Described historical index database and described new data index database all adopt piecemeal to process; Wherein, historical index database carries out piecemeal in the delivery mode, and the employing bulk is that unit is divided into polylith; The new data index database carries out piecemeal in the time mode, and the employing fritter is that unit is divided into polylith.The present invention is with index and retrieve separate, and index operates in respectively in the different processes from retrieval.
Adopt the present invention, because index and retrieve separate, the implementation of index and the implementation of retrieval, operate in respectively in the different processes, therefore, index implementation procedure separately and retrieval implementation procedure are separate, can't have influence on the performance of retrieval when index, thereby improve retrieval rate.
Description of drawings
Fig. 1 is the index of example one of the present invention and the system architecture synoptic diagram of retrieve separate;
Fig. 2 is the index synoptic diagram of example two of the present invention;
Fig. 3 is the structural representation of the index database of example three of the present invention;
Fig. 4 is the structural representation of Data Update in the index database among Fig. 3;
Fig. 5 is the scheduling controlling synoptic diagram that the index of example four of the present invention is rebuild;
Fig. 6 is the retrieval synoptic diagram of example five of the present invention.
Embodiment
Basic thought of the present invention is: with index and retrieve separate, index operates in respectively in the different processes from retrieval.
Be described in further detail below in conjunction with the enforcement of accompanying drawing to technical scheme.
The present invention is a kind of scheme that improves retrieval rate, mainly is by the index database classification, and index and retrieve separate, thereby realizes quick-searching, thereby has improved retrieval rate.
Further, because prior art is not high except retrieval rate, outside the bad shortcoming of integral retrieval performance, also have retrieval and the nonsynchronous shortcoming of Data Update performance, be: when retrieval rate was fast, the new data renewal speed was slow; Perhaps, when the new data renewal speed is fast, but retrieval rate is slow or the time is fast when slow.The present invention has also solved this nonsynchronous shortcoming of prior art, on a kind of basis improving retrieval rate, improve simultaneously the scheme of Data Update speed, mainly be by the index database piecemeal, comprise mutually independently historical index database piecemeal and new data index database piecemeal, thereby realize the quick renewal of new data.In a word, adopt the present invention, can significantly improve retrieval rate; Significantly improve the speed that new data upgrades; And fast more under the news, still guarantee higher retrieval rate at new data.
Below to the present invention's description of giving an example.
Here it is to be noted: in the entire system framework of index and retrieve separate, mainly be divided into several major parts on the logical organization, that is: be used for realize the part of retrieving---the realization unit of retrieving and be used for realizing the part of Index process---realization unit of Index process.Be: logically with index and retrieve separate, to realize quick-searching.Can also comprise: the part that is used for retrieval agent---the realization unit of retrieval agent and be used for the part of retrieval buffer memory---realization unit of retrieval buffer memory in the logical organization.And, the present invention is different to retrieve with index in the prior art and concerns one to one, but adopt retrieval and the many-to-one relation of index, that is: the modes of the corresponding index of a plurality of retrieval, mutually backup between a plurality of retrievals, thereby avoided in the prior art retrieval and index one to one, operation does not each other separate that the renewal speed that causes is slow, the defective such as interrupt search service during index.
Physical arrangement based on above-mentioned logical organization has a lot, below only is described below for example with example shown in Figure 1.
Example one: index and retrieve separate.
Be illustrated in figure 1 as the system architecture synoptic diagram of index and retrieve separate, among Fig. 1, the realization unit of retrieving is specially retrieval 00 and retrieval 01 and retrieves 10 and retrieve 11; The realization unit of Index process is specially index 0, index 1; The realization unit that the realization unit of retrieval agent is specially retrieval agent, retrieval buffer memory is specially the retrieval buffer memory.
Wherein, index 0, the different data of index 1 difference index.Two parts of identical index datas that index 0 provides are equipped with in retrieval 00, retrieval 01.Two parts of identical index datas that index 1 provides are equipped with in retrieval 10, retrieval 11.Between corresponding index and the retrieval, such as passing through Network Synchronization between retrieval 00, retrieval 01 and the index 0, to guarantee renewal and the consistance of index data.Here it is to be noted, Fig. 1 only is based on the application on the concrete physical arrangement of above-mentioned logical organization division description, in practical application, do not limit the situation of only having 2 corresponding index of retrieval, can be the corresponding index of a plurality of retrievals, mutually backup between a plurality of retrievals, and a plurality of retrievals can arrange at a machine, also can arrange respectively on many machines.Index and retrieve separate make index can exist in different processes with retrieval, peeled off treatment progress separately, thereby each other the performance of can not interacting have improved retrieval performance.
Example two: be illustrated in figure 2 as the index synoptic diagram of this example, mainly comprise: data receiver thread, index data send-thread, rebuild the control thread, thread, new thread more rebuild in index.Wherein, adopting more new thread can upgrade various information, mainly is the renewal of the information such as logarithm value information and document deletion.Employing is rebuild control and is rebuild thread with index, realizes that the scheduling controlling of index reconstruction sees the description of following instance four for details.
As can be seen from Figure 2: the data receiver thread, be used on the one hand the reception hint data source, so that the data source of setting up index database to be provided, that is: the follow-up index database that is set up as of this data source; On the other hand, the data receiver thread is used for receiving continuously follow-up new index data source, and the update notifications thread starts the renewal of index data.Thread rebuild in index, is used for carrying out index and rebuilds under the scheduling of rebuilding the control thread, and old index database is redeveloped into new index database.Rebuild the control thread, be used for the scheduling controlling index and rebuild.New thread more, after being used for receiving the notice of data receiver thread, log-on data is upgraded.The index data send-thread, be used for index database set up or rebuild finish after, the index data in the index database is all sent to retrieving portion shown in Figure 1 by Network Synchronization, enter into retrieval process by current index process and process.It is to be noted: the place that relates to " index database " herein all refers to comprise the index database of historical index database and new data index database.
Among Fig. 2, index is rebuild thread and comprised: thread 0 rebuild in index and thread 1 rebuild in index.It is pointed out that being not limited among Fig. 2 index in the practical application rebuilds thread 1, the index between old historical index database and new historical index database is rebuild; And index is rebuild the index of thread 0 between old new data index database and new new data index database and is rebuild.Thread 0 rebuild in index, index is rebuild thread 1 and can be exchanged use, and such as rebuilding thread 0 by index, the index between old historical index database and new historical index database is rebuild.Wherein, old historical index database is rebuild shown in thread 1 below such as index among Fig. 2, and new historical index database is rebuild shown in thread 1 top such as index among Fig. 2; Same, old new data index database is rebuild shown in thread 0 below such as index among Fig. 2, and new new data index database is rebuild shown in thread 0 top such as index among Fig. 2.Herein " old " refers to: untreated index database is the untreated index database of initial foundation such as old new data index database; " newly " herein refers to the index database of having finished dealing with, the index database of having finished dealing with after referring to upgrade such as new historical index database.
And historical index database or new data index database all adopt piecemeal to process, and different is: historical index database adopts " bulk " for unit is divided into polylith, and is to carry out piecemeal in the delivery mode; The new data index database adopts " fritter " for unit is divided into polylith, and is to carry out piecemeal in the time mode.Here, the purpose of piecemeal can take full advantage of the respectively processing that a plurality of CPU carry out multitask.Wherein, with regard to " bulk " piecemeal, it is relevant with retrieval, and a plurality of CPU process respectively different " bulk ", can improve retrieval rate; With regard to " fritter " piecemeal, it is relevant with Data Update, and a plurality of CPU process respectively different " fritter ", can improve Data Update speed.Because, the processing of employing piecemeal can not only realize the respectively processing of multitask, and with of the prior art one whole storage area, further be divided into block storage area one by one, index rebuild and during Data Update since for processing region diminish, will certainly increase and retrieve or the speed of Data Update." fritter " that be used for Data Update also is limited number, when " fritter " is fast full, in order to improve Data Update speed, in the new data index database that " fritter " can be divided, the data of having handled by " fritter " are inserted in the historical data base of " bulk " division, empty " fritter " of handling data.Concrete, adopt the structure of historical index database that piecemeal processes and new data index database to see the description of the example three that following Fig. 3, Fig. 4 disclose for details.
Example three: be illustrated in figure 3 as the structural representation of the index database of this example, index database mainly is divided into two major parts, historical index database and new data index database.Wherein, historical index database is several " bulks " according to Docid delivery mode by piecemeal.Each piece has data about equally.The new data index database is several " fritters " according to the time mode by piecemeal, but every max cap. data equate.Data are divided according to the sequencing that enters, and it is empty allowing the partial block data.Historical index database and new data index database are to be configured according to the concrete condition of data.
For configuration, be configured according to the number in storehouse, the main and data volume of the operation of configuration, machine CPU number, disk, memory size and data characteristics, Data Update rate request are relevant.Principle is to have taken full advantage of machine resources.With respect to historical index database, the new data index database also can be called little storehouse, and little database data is more, and Data Update speed can be faster.
For the cutting of historical index database, the cutting of historical index database just is cut into several " bulks " after final index is finished, and historical index database and new data index database are fully mutually independently.Wherein, no matter historical index database or new data index database, index data wherein all comprises: inverted index, inverted list, deletion information, numerical information, the various data such as library information.And final inverted list is divided into two, and first has been data, and second is data inferior.The differentiation of good data can grade to divide according to the numerical attribute of text relevant or document.Mention in the index database structure, the division of bad data, be not limited to two.It can be polylith.Dividing also can be a comprehensive grading.Here, for comprehensive grading, comprehensive grading can calculate comprehensive text weight according to the numerical value of text, content of text etc.
Be illustrated in figure 4 as among Fig. 3 the structural representation of Data Update in the index database, the content that discloses is: when new data index database " fritter " is expired soon, the data that " fritter " handled are inserted historical index database and emptied " fritter ", in order to restart to set up the new data index database.
Among Fig. 3, the shade of left oblique line is filled each " bulk " in the historical index database of expression; The shade that oblique line intersects is filled " fritter " of having finished dealing with in the expression new data index database; The shade of right oblique line is filled " fritter " processed in the expression new data index database.Among Fig. 4, the shade of left oblique line is filled each initial " bulk " in the historical index database of expression; Anyhow the shade that intersects is filled after expression inserts historical index database with " fritter " of having finished dealing with in the new data index database, new " bulk " in the history of forming index database; The shade that oblique line intersects is filled " fritter " of having finished dealing with in the expression new data index database after inserting historical index database, current " fritter " that empties; The shade of right oblique line is filled " fritter " processed in the expression new data index database.
For above Fig. 3, Fig. 4, the foundation of index is described below:
Suppose that Fig. 3 represents that the new data index database has been built to the 2nd " fritter ", surpassed half of 3 of total block datas.Fig. 4 represents so, and historical index database has also been set up index with the data of 0,1 in the new data index database, and has been divided into four of arrow institute cutting below Fig. 4.And 0,1 resource that has before in the new data index database can be used to set up the index of new data.So just guarantee that new data sets up the possibility of index immediately.Historical index database all sends to retrieving portion shown in Figure 1 with index data by network after the index foundation of new data index database is finished; After being sent completely,, and notifying this retrieving portion to switch to up-to-date index data and carry out retrieval service to retrieval process by the index process switching.
Example four: the scheduling controlling synoptic diagram that is illustrated in figure 5 as the index reconstruction of this example; Among Fig. 5, mainly comprise: rebuild the control thread, wait for that formation is finished in reconstruction formation, reconstruction and thread rebuild in index.Wherein, rebuild the situation that the control thread is set up according to index, data what and the time interval etc. produce the task of needing reconstruction.
The priority of two index reconstruction threads equates in this example.Namely two index are rebuild thread and all might be responsible for rebuilding historical index database and new data index database.Each other index database of level can only have a task simultaneously in operation in process of reconstruction.As can be seen from Figure 5 the scheduling controlling process of index reconstruction is: the task that rebuilding the control thread will need to rebuild is compressed into wait reconstruction formation; Index is rebuild thread and is detected wait reconstruction formation, when detecting the task of reconstruction this task is rebuild from waiting for rebuilding to take out the formation; The information of will finishing after reconstruction is finished is pressed into to rebuild and finishes formation and finish once and rebuild.Rebuild the control thread and detect to rebuild and finishes formation, if detect the information of finishing then delete this time and finish the work.Wait for the next time beginning of reconstruction tasks.
Example five: the retrieval synoptic diagram that is illustrated in figure 6 as this example, among Fig. 6, mainly comprise: receiving thread, return thread, retrieval request formation, retrieval and return a plurality of retrieval process threads that formation, retrieval split thread, are made of a plurality of piecemeal processing threads.Wherein, receiving thread be used for to receive retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation according to sequencing.Retrieval splits the retrieval request that thread is used for sequentially extracting formation, corresponds to respectively each index database, is: the retrieval request formation is split each corresponding index database.The piecemeal processing threads is processed respectively retrieval request for the piece in each index database; And when processing retrieval request, can carry out index and switch, realize the switching between the active and standby index.Returning thread is used for returning the retrieval agent of front end after will again receiving for the data that retrieval request is handled and load after processing retrieval request respectively for the piece of each index database.Wherein, when index switched, because the present invention is processing with the index database piecemeal, therefore, each index database was independent, did not need in the prior art one to one active and standby index, such as, can adopt the situation of 4 corresponding backups of historical index database.
As can be seen from Figure 6 retrieving is: retrieval request is divided into corresponding historical index database and the retrieval request of each piece of new data index database.Although the mode of historical index database and these two kinds of storehouse piecemeals of new data index database is different, arranged dividing of " bulk " and " fritter " division, final index file all is independently.Each retrieval process thread of index also is equal, can process the data of any one piece, each retrieval process thread competition ground from formation constantly the request of taking out process, be put into retrieval after the retrieval process thread of finally handling merges the result and return formation and send.The main buffer unit Divided Retrieving of intermediate result buffer memory result's Docid tabulation.
In sum, adopt the present invention, can reduce the computing cost.To the index database of piecemeal, process in the time of multitask, a plurality of thread and not only can take full advantage of CPU, more because with after index and the retrieve separate, inverted list is corresponding dwindling all.The calculated amount of single thread also obviously reduces.Thereby whole retrieval performance and the Data Update speed of improving rapidly.
It is to be noted: in the above-mentioned retrieving, be the processing of adopting multithreading, and be that a retrieval splits the corresponding a plurality of retrieval process threads of thread.Threading model in this retrieving also can become each storehouse of single-threaded sequential search, because storehouse of the present invention adopts piecemeal to process, data volume is less in each piece, therefore is different from the huge complete storehouse of retrieve data amount single-threaded in the prior art, thereby also can improves retrieval rate.Also have a kind of mode to be: a plurality of retrievals can be set split thread, that is: a plurality of retrievals split the corresponding retrieval process thread of thread; A plurality of retrievals fractionation threads are given task assignment and are processed the retrieval process thread, and receive the result of retrieval process thread, gather rear transmission.
The implication of above some that relates to being refered in particular to literal here, makes an explanation and is described as follows:
So-called index refers to: to data encode, in-line arrangement, the row of falling process, and finally generates inverted index and various inverted file.
So-called retrieval refers to: externally ask string according to the user, retrieval service is provided.
So-called retrieval agent refers to: the result who merges a plurality of retrievals.
So-called retrieval buffer memory refers to: the result of buffer memory retrieval.
So-called Docid refers to: the unique number of document.
The above is preferred embodiment of the present invention only, is not for limiting protection scope of the present invention.

Claims (12)

1. a system that improves retrieval rate is characterized in that, this system comprises: the realization unit of Index process and the realization unit of retrieving; Wherein,
The realization unit of Index process is used for realizing Index process;
The realization unit of retrieving is used for realizing retrieving;
The realization unit of described Index process, operate in the different processes from the realization unit of described retrieving, index implementation procedure separately and retrieval implementation procedure are separate;
Wherein, the realization unit of described Index process comprises: the data receiver thread, rebuild control thread, index and rebuild thread, more new thread and index data send-thread; Wherein,
The data receiver thread is used for the reception hint data source;
Thread rebuild in index, is used under the scheduling of rebuilding the control thread, will carry out according to the index database that the index data source has been set up index and rebuild;
Rebuild the control thread, be used for scheduling controlling is carried out in the index reconstruction of index reconstruction thread execution;
More new thread after being used for receiving the index data source update notification of data receiver thread transmission, starts the renewal operation of index data;
The index data send-thread, be used for index database set up finish after, the index data in the index database is all sent to the realization unit of described retrieving by Network Synchronization, enter into retrieval process by current index process and process.
2. system according to claim 1 is characterized in that, is the corresponding relation of one-to-many between the realization unit of described Index process and the realization unit of described retrieving;
The realization unit of described Index process is further used for the data communication device of storage is crossed Network Synchronization in the realization unit of corresponding a plurality of retrievings, realizes the synchronous renewal of data; Wherein, between the realization unit of a plurality of retrievings corresponding with the realization unit of same Index process, the data of storage are identical and backup each other.
3. system according to claim 1 is characterized in that, described reconstruction control thread is further used for producing the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval.
4. system according to claim 3 is characterized in that, described reconstruction control thread is further used for the described task of needing to rebuild is compressed into wait reconstruction formation; And detect rebuild finish in the formation finish information the time delete this subtask;
Thread rebuild in described index, is further used for waiting for and rebuilding when the described task of needing to rebuild is arranged in the formation when detecting, and the described task of needing to rebuild taken out from wait for the reconstruction formation rebuild; And rebuild and to finish information after finishing and be pressed into to rebuild and finish formation and finish once and rebuild.
5. system according to claim 1 is characterized in that, the realization unit of described retrieving further comprises: receiving thread, retrieval split thread, retrieval process thread and return thread; Wherein,
Receiving thread be used for to receive retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation;
Retrieval splits thread, is used for splitting described retrieval request formation, and a plurality of retrieval request in the retrieval request formation are split each piece in the index database;
The retrieval process thread is a plurality of, is used for adopting each piece of index database, a plurality of retrieval request is carried out respectively piecemeal process;
Return thread, be used for a plurality of retrieval request are carried out respectively after piecemeal processes, the data that piecemeal is handled reconsolidate, and are loaded into retrieval and return formation and return.
6. each described system in 5 according to claim 1 is characterized in that described index database comprises: historical index database and new data index database; Use when wherein, described historical index database is used for retrieval service; Described new data index database is used for using when index data upgrades.
7. system according to claim 6 is characterized in that, the index data in described historical index database and the described new data index database all adopts partitioned mode to store;
Wherein, described historical index database adopts delivery mode divided block, and the storage take bulk as unit; Described new data index database adopts time mode divided block, and the storage take fritter as unit.
8. method that improves retrieval rate, it is characterized in that the method comprises: with index and retrieve separate, described index operates in respectively in the different processes from described retrieval;
Wherein, the implementation procedure of described index specifically comprises:
Set up index database according to the index data source that the data receiver thread receives;
Index is rebuild thread under the scheduling of rebuilding the control thread, the index database of having set up is carried out index rebuild;
After more new thread is received the index data source update notification of data receiver thread transmission, start the renewal operation of index data;
The index data send-thread index database set up or rebuild finish after, the index data in the index database is all offered retrieval service by Network Synchronization, and enters into retrieval process by current index process and process.
9. method according to claim 8 is characterized in that, is the relation of one-to-many between described index and the described retrieval;
Between a plurality of retrievals corresponding with current same index, the data of storage are identical and backup each other; The data communication device of storing in the current index is crossed Network Synchronization in corresponding a plurality of retrievals, realizes the synchronous renewal of data.
10. method according to claim 8 is characterized in that, the index database that described index reconstruction thread will have been set up carries out described index to be rebuild, and specifically comprises:
Produce the task that needs are rebuild according to the index index of correlation; Wherein, described index index of correlation comprises the quantity of situation that index database sets up, index data or at least a in the time interval;
Reconstruction control thread is compressed into wait reconstruction formation with the described task of needing to rebuild; Rebuild thread when index and detect and wait for and rebuilding when the described task of needing to rebuild is arranged in the formation, the described task of needing to rebuild is rebuild from waiting for rebuilding to take out the formation;
Index rebuild thread rebuild finish after, the information of will finishing is pressed into to rebuild and finishes formation and finish once and rebuild; When rebuild the control thread detect rebuild finish in the formation finish information the time, delete this subtask.
11. method according to claim 8 is characterized in that, the implementation procedure of described retrieval specifically comprises:
Receiving thread receives retrieval request, and in the mode of formation, a plurality of retrieval request is pressed into the retrieval request formation;
Retrieval splits thread and splits described retrieval request formation, and a plurality of retrieval request in the retrieval request formation are split each piece in the index database;
A plurality of retrieval process threads adopt each piece in the index database, a plurality of retrieval request are carried out respectively piecemeal process;
Return thread a plurality of retrieval request are carried out respectively after piecemeal processes, the data that piecemeal is handled reconsolidate, and are loaded into retrieval and return formation and return.
12. each described method in 11 is characterized in that described index database comprises: historical index database and new data index database according to claim 8;
Described historical index database and described new data index database all adopt piecemeal to process; Wherein, historical index database carries out piecemeal in the delivery mode, and the employing bulk is that unit is divided into polylith; The new data index database carries out piecemeal in the time mode, and the employing fritter is that unit is divided into polylith.
CN 200910242857 2009-12-17 2009-12-17 System and method for increasing retrieval speed Active CN102103602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910242857 CN102103602B (en) 2009-12-17 2009-12-17 System and method for increasing retrieval speed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910242857 CN102103602B (en) 2009-12-17 2009-12-17 System and method for increasing retrieval speed

Publications (2)

Publication Number Publication Date
CN102103602A CN102103602A (en) 2011-06-22
CN102103602B true CN102103602B (en) 2013-02-27

Family

ID=44156379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910242857 Active CN102103602B (en) 2009-12-17 2009-12-17 System and method for increasing retrieval speed

Country Status (1)

Country Link
CN (1) CN102103602B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473229A (en) * 2012-06-06 2013-12-25 深圳市世纪光速信息技术有限公司 Memory retrieval system and method, and real-time retrieval system and method
CN104598550B (en) * 2014-12-31 2018-09-25 北京奇艺世纪科技有限公司 A kind of update method and device of Internet video index
CN106156018B (en) * 2015-03-23 2020-05-05 深圳市腾讯计算机系统有限公司 Data indexing method and device
CN104778267A (en) * 2015-04-22 2015-07-15 无锡天脉聚源传媒科技有限公司 Searching and index updating method and device
CN110019179A (en) * 2017-07-31 2019-07-16 北京嘀嘀无限科技发展有限公司 Update method and device, the electronic equipment, storage medium of index database
US10402112B1 (en) * 2018-02-14 2019-09-03 Alibaba Group Holding Limited Method and system for chunk-wide data organization and placement with real-time calculation
CN111368020A (en) * 2020-02-10 2020-07-03 浙江大华技术股份有限公司 Feature vector comparison method and device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196935A (en) * 2008-01-03 2008-06-11 中兴通讯股份有限公司 System and method for creating index database
CN101295323A (en) * 2008-06-30 2008-10-29 腾讯科技(深圳)有限公司 Processing method and system for index updating

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196935A (en) * 2008-01-03 2008-06-11 中兴通讯股份有限公司 System and method for creating index database
CN101295323A (en) * 2008-06-30 2008-10-29 腾讯科技(深圳)有限公司 Processing method and system for index updating

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭文堂等.基于Lucene.Net的分布式全文检索系统.《计算机应用与软件》.2009,第26卷(第9期),142-145. *

Also Published As

Publication number Publication date
CN102103602A (en) 2011-06-22

Similar Documents

Publication Publication Date Title
CN102103602B (en) System and method for increasing retrieval speed
CN110209734B (en) Data copying method and device, computer equipment and storage medium
CN103810048B (en) Automatic adjusting method and device for thread number aiming to realizing optimization of resource utilization
CN105550318B (en) A kind of querying method based on Spark big data processing platforms
JP5577350B2 (en) Method and system for efficient data synchronization
US10685041B2 (en) Database system, computer program product, and data processing method
CN106776855B (en) Processing method for reading Kafka data based on Spark Streaming
CN105808447B (en) A kind of method for recovering internal storage and device of terminal device
CN102214205A (en) Logical replication in clustered database system with adaptive cloning
CN100538646C (en) A kind of method and apparatus of in distributed system, carrying out the SQL script file
CN102521269A (en) Index-based computer continuous data protection method
CN109388481B (en) Transaction information transmission method, system, device, computing equipment and medium
CN108694188B (en) Index data updating method and related device
CN105554121A (en) Method and system for realizing load equalization of distributed cache system
CN103885811B (en) Method, system and device that dummy machine system total system is migrated online
CN113886430A (en) Query restartability
CN104601562A (en) Interactive method and system of game server and database
CN103634411A (en) Real-time market data broadcasting system and real-time market data broadcasting method with state consistency
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
CN103200272A (en) Streaming media storage system and storage method
CN107368324A (en) A kind of component upgrade methods, devices and systems
CN111163118B (en) Message transmission method and device in Kafka cluster
CN105487946B (en) A kind of faulty computer automatic switching method and device
CN113342839A (en) Data processing method and device, terminal equipment and storage medium
CN117056303A (en) Data storage method and device suitable for military operation big data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131025

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20131025

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Patentee after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.