CN112748866A

CN112748866A - Method and device for processing incremental index data

Info

Publication number: CN112748866A
Application number: CN201911053521.2A
Authority: CN
Inventors: 薛耀宏; 王春明
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2021-05-04

Abstract

The invention discloses a method and a device for processing incremental index data, and relates to the technical field of computers. One embodiment of the method comprises: receiving increment index data and storing the increment index data into a continuous cache; if the space of the cache is full, storing the increment index data in the cache into a temporary subdata set of a disk; and packaging the temporary subdata sets into read-only subdata sets at preset time intervals or preset increment index data numbers. The method and the device can solve the technical problem that the restarting time consumption is long due to the fact that the increment index data are reloaded and analyzed when the search engine service is restarted.

Description

Method and device for processing incremental index data

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for processing incremental index data.

Background

Index data in a search engine is divided into full index data and incremental index data. For example, for e-commerce search data, the full index is an index constructed for all commodity data in a commodity library at a certain time, and the construction period of the full index data is generally days or weeks; the incremental index is an index created for newly added or real-time modified commodity data in the commodity library during two times of full index construction, and the time interval for generating two pieces of incremental index data is generally several seconds or several milliseconds.

In order to ensure the real-time performance of the search service, the search engine service continuously receives the real-time incremental index data. When the search engine service is restarted after updating or fault recovery, the incremental index data needs to be loaded again in sequence, so that the integrity and the real-time performance of the index data are ensured.

The incremental index stores one or more pieces of incremental index data in an incremental index file of a disk in a disk file mode, and each file is compressed and encoded, so that the occupation of network bandwidth and storage resources in the transmission process is reduced. When the search engine service is started, each increment index file is sequentially loaded, and each increment index data in the increment index files is analyzed according to the service requirement so as to be used by the search engine service.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

when the construction period of the full index is long, a large number of incremental index files are generated in the disk. When the search engine service is restarted, a large number of incremental index files need to be read from the disk again, and each piece of incremental data needs to be analyzed again according to the service requirement. For a mechanical disk machine, continuous small file reading and writing can seriously affect the throughput of a disk, the time consumption for loading the incremental index data is increased, and the time consumption for recovering and starting the search engine service is increased by re-analyzing each piece of incremental index data.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for processing incremental index data, so as to solve the technical problem of long restart time consumption caused by reloading and parsing the incremental index data when a search engine service is restarted.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method for processing incremental index data, including:

receiving incremental index data and storing the incremental index data into a continuous cache;

if the space of the cache is full, storing the increment index data in the cache into a temporary subdata set of a disk;

and packaging the temporary subdata sets into read-only subdata sets at preset time intervals or preset increment index data numbers.

Optionally, storing the incremental index data in a continuous cache includes:

storing the increment index data into a continuous data cache one by one, and recording the document ID of the increment index data and the position offset of the increment index data in the data cache into an index cache.

Optionally, storing the increment index data in the cache to a temporary subdata set of a disk, including:

storing the increment index data in the data cache into an increment data file of a disk;

updating the index cache according to the position offset of the incremental index data in the disk, and storing the updated data in the index cache into a database engine file of the disk;

and the file objects in the temporary subdata set comprise incremental data files and database engine files.

Optionally, for each piece of incremental index data, determining a position offset of the piece of incremental index data in the disk by using the following method:

and adding the sum of the initial position of the incremental data file in the disk and the position offset of the incremental index data in the data cache to obtain the position offset of the incremental index data in the disk.

Optionally, encapsulating the temporary sub data set into a read-only sub data set, including:

judging whether the cache is empty or not;

if yes, packaging the temporary subdata set into a read-only subdata set;

if not, storing the increment index data in the cache into a temporary subdata set of the disk, and packaging the temporary subdata set into a read-only subdata set.

closing all file objects in the temporary subdata set;

distributing a snapshot ID for the temporary subdata set, renaming the temporary subdata set by using the snapshot ID, and packaging the temporary subdata set into a read-only subdata set;

newly building a temporary subdata set, and opening all file objects in the newly built temporary subdata set;

wherein the snapshot ID is sequentially incremented as the number of packages increases.

Optionally, after encapsulating the temporary sub data set into a read-only sub data set, further comprising:

sequentially loading each subdata set into a memory according to the sequence of the snapshot IDs from large to small;

and analyzing the data in each subdata set in the memory according to the sequence of the snapshot IDs from large to small, and storing the analyzed data in the memory.

Optionally, for each sub data set, loading the sub data set into a memory by using the following method:

reading a database engine file in the subdata set into an internal memory;

traversing the database engine file, and mapping the key values into fixed-size buckets according to the position offset of the incremental index data in the disk;

and sequentially traversing each barrel according to the sequence from small to large, and loading the incremental index data corresponding to each barrel into the memory.

Optionally, analyzing the incremental index data in each of the sub-data sets, and storing the analyzed incremental index data in a memory, including:

analyzing the data in each subdata set in the memory by combining the bitmap of the document ID, and storing the analysis result in the memory;

updating the bitmap of the document ID according to the analysis result;

and the bitmap of the document ID indicates whether the incremental index data corresponding to each document ID is analyzed.

Optionally, for each piece of increment index data in each sub data set, the following method is adopted for parsing:

judging whether the state of the incremental index data to be analyzed in the bitmap of the document ID is not analyzed or not;

if so, acquiring the incremental index data to be analyzed from the memory, and analyzing the incremental index data;

and if not, skipping the increment index data to be analyzed.

In addition, according to another aspect of the embodiments of the present invention, there is provided a processing apparatus for incrementally indexing data, including:

the first storage module is used for receiving incremental index data and storing the incremental index data into a continuous cache;

the second storage module is used for storing the increment index data in the cache into a temporary subdata set of a disk if the space of the cache is full;

and the packaging module is used for packaging the temporary subdata set into a read-only subdata set at preset time intervals or preset increment index data numbers.

Optionally, the first storage module is further configured to:

Optionally, the second storage module is further configured to:

Optionally, the second storage module is further configured to: for each piece of incremental index data, determining the position offset of the piece of incremental index data in the disk by adopting the following method:

Optionally, the encapsulation module is further configured to:

judging whether the cache is empty or not;

if yes, packaging the temporary subdata set into a read-only subdata set;

Optionally, the encapsulation module is further configured to:

closing all file objects in the temporary subdata set;

Optionally, the system further comprises a parsing module, configured to:

after the temporary subdata sets are packaged into read-only subdata sets, sequentially loading the subdata sets into a memory according to the descending order of the snapshot IDs;

Optionally, the parsing module is further configured to: for each subdata set, loading the subdata set into a memory by adopting the following method:

reading a database engine file in the subdata set into an internal memory;

Optionally, the parsing module is further configured to:

updating the bitmap of the document ID according to the analysis result;

Optionally, the parsing module is further configured to: analyzing each piece of increment index data in each subdata set by adopting the following method:

and if not, skipping the increment index data to be analyzed.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments described above.

According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above embodiments.

One embodiment of the above invention has the following advantages or benefits: because the incremental index data are stored in the continuous cache, if the cache space is full, the incremental index data in the cache are stored in the temporary subdata set of the disk, and the temporary subdata set is packaged into the read-only subdata set, the technical problem of long restarting time caused by reloading and analyzing the incremental index data when the search engine service is restarted in the prior art is solved. The embodiment of the invention can avoid triggering the operation of writing the disk when receiving one piece of increment index data, thereby reducing the time consumption of persisting the increment index data. And newly-added increment index data during two adjacent snapshot operations are stored in the subdata set for restart recovery, and meanwhile, the integrity of the subdata set is ensured. When the service is restarted, the disk reading efficiency when the subdata set is recovered is improved and the time consumption for recovering and starting the search engine service is reduced by continuously reading the disk according to blocks.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a main flow of a method of processing incremental index data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of storing delta index data to disk in accordance with an embodiment of the present invention;

FIG. 3 is a diagram showing a main flow of a processing method of incremental index data according to a referential embodiment of the present invention;

FIG. 4 is a diagram showing a main flow of a processing method of incremental index data according to another referential embodiment of the present invention;

FIG. 5 is a diagram illustrating a main flow of a method for processing delta index data according to still another referential embodiment of the present invention;

FIG. 6 is a diagram illustrating mapping of data in a level DB file to disk space, according to an embodiment of the invention;

FIG. 7 is a schematic diagram of the main blocks of a device for processing incremental index data according to an embodiment of the present invention;

FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of a processing method of incremental index data according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the method for processing the increment index data may include:

step 101, receiving increment index data, and storing the increment index data into a continuous cache.

After receiving an incremental index datum, the search engine service parses the incremental index datum into a locally used data format (generally, a self-defined binary storage format, which is used for saving storage space and facilitating quick search), and stores the incremental index datum in a cache. If the search engine service receives a piece of incremental index data, the incremental index data is added to the previous piece of incremental index data, and the storage of the incremental index data in a continuous cache is ensured.

Optionally, storing the incremental index data in a continuous cache includes: storing the increment index data into a continuous data cache one by one, and recording the document ID of the increment index data and the position offset of the increment index data in the data cache into an index cache. In an embodiment of the invention, for each piece of delta index data to be persisted: appending the piece of increment index data to a continuous data cache, and returning an offset1 of the piece of increment index data in the data cache; the document ID and offset1 of the incremental index data are recorded into the index cache. The document ID of the incremental index data is carried in the incremental index data, and the document ID may be an ID of a certain article, or an ID of the incremental index data.

The reason why the incremental index data, the document ID and the position offset are respectively stored in the data cache and the index cache is to realize key value separation, which is convenient for writing the data in the data storage and the index cache into different file objects of the disk in step 102, so as to improve the read-write efficiency of the incremental index data.

And step 102, if the space of the cache is full, storing the increment index data in the cache into a temporary subdata set of a disk.

If the space of the cache (the data cache and the index cache) is full, the receiving of the increment index data to be persisted is suspended, and then the disk writing operation is triggered, that is, the increment index data in the cache is completely written into the temporary subdata set of the disk.

Optionally, storing the increment index data in the cache to a temporary subdata set of a disk, including: storing the increment index data in the data cache into an increment data file of a disk; and updating the index cache according to the position offset of the incremental index data in the disk, and storing the updated data in the index cache into a database engine file of the disk. And the file objects in the temporary subdata set comprise incremental data files and database engine files. The embodiment of the invention is based on the working characteristics of continuous reading and writing of a mechanical hard disk, adopts a key value separation strategy, stores incremental index data in a continuous disk space during disk writing operation, improves the reading and writing efficiency of the incremental index data, and stores a document ID and the position offset of the incremental index data in a disk in a database engine file (such as a levelDB file) to realize the persistent storage function of real-time incremental index data.

Optionally, for each piece of incremental index data, determining a position offset of the piece of incremental index data in the disk by using the following method: and adding the sum of the initial position of the incremental data file in the disk and the position offset of the incremental index data in the data cache to obtain the position offset of the incremental index data in the disk.

Optionally, step 102 may include the steps of:

checking the cache space, if the cache space is full, adding the incremental index data in the data cache into an incremental data file of the disk, and simultaneously returning the initial position base of the incremental data file in the disk; according to the position offset2 of the increment index data in the disk, namely offset1+ base, the offset of each increment index data in the index cache is updated, and the updated data (document ID and offset2) in the index cache is stored in the level DB file of the disk. If the buffer space is not full, step 101 is performed. And after the data is stored in the temporary subdata sets (the incremental data files and the levelDB files), clearing the cache.

By adopting the steps 101 and 102 to perform persistent storage, the disk writing operation can be prevented from being triggered when each piece of incremental index data is received, so that the time consumption of persistent incremental index data can be reduced by the embodiment of the invention.

And 103, packaging the temporary subdata set into a read-only subdata set at preset time intervals or preset increment index data numbers.

The time interval and the number of pieces of increment index data may be configured in advance, for example, every 10 seconds, 30 seconds, 1 minute, 5 minutes, 10 minutes, and the like, the temporary sub data set is packaged as a read-only sub data set; the temporary subdata set may also be packaged as a read-only subdata set every time 100, 1000, or 2000 pieces of incremental index data are received. It should be noted that there is only one temporary subdata set on the disk at any time. The search engine encapsulates the current temporary subdata set on the disk into a read-only subdata set through a snapshot function, the integrity of persistent storage is guaranteed, and all incremental index data can be completely recovered after the service of the search engine is restarted.

Optionally, encapsulating the temporary sub data set into a read-only sub data set, including: judging whether the cache is empty or not; if yes, packaging the temporary subdata set into a read-only subdata set; if not, storing the increment index data in the cache into a temporary subdata set of the disk, and packaging the temporary subdata set into a read-only subdata set. The snapshot operation can ensure that all the incremental index data before the snapshot operation is successfully stored in the disk. It should be noted that the snapshot operation may block the storage operation of the delta index data (i.e., step 101 and step 102). The snapshot operation may persist the data in the temporary subset of data to disk in a continuous, bulk manner.

Optionally, encapsulating the temporary sub data set into a read-only sub data set, including: closing all file objects in the temporary subdata set; distributing a snapshot ID for the temporary subdata set, renaming the temporary subdata set by using the snapshot ID, and packaging the temporary subdata set into a read-only subdata set; newly building a temporary subdata set, and opening all file objects in the newly built temporary subdata set; wherein the snapshot ID is sequentially incremented as the number of packages increases.

Optionally, after each snapshot operation is performed, a snapshot ID of 6 bits is returned, and the initial snapshot ID is "000000" and is sequentially incremented. And renames the temporary subdata set with the snapshot ID. As shown in FIG. 2, t₀The first snapshot operation performed at time generates a read-only sub data set "000000" in the disk, the second snapshot operation performed at time t1 generates a read-only sub data set "000001" in the disk, and so on.

Because the incremental data file and the database engine are placed in the same directory, the newly added incremental index data, the corresponding document ID and the position offset in the disk are stored in a read-only subdata set during two adjacent snapshot operations. In an embodiment of the present invention, the process of snapshotting ID is the process of switching the temporary sub data set to a read-only sub data set.

According to the various embodiments described above, it can be seen that the technical problem of long restart time caused by reloading and parsing increment index data when a search engine service is restarted in the prior art is solved by the technical means of storing the increment index data in the continuous cache, storing the increment index data in the cache into the temporary sub-data set of the disk if the cache space is full, and encapsulating the temporary sub-data set into the read-only sub-data set. The embodiment of the invention can avoid triggering the operation of writing the disk when receiving one piece of increment index data, thereby reducing the time consumption of persisting the increment index data. And newly-added increment index data during two adjacent snapshot operations are stored in the subdata set for restart recovery, and meanwhile, the integrity of the subdata set is ensured. When the service is restarted, the disk reading efficiency when the subdata set is recovered is improved and the time consumption for recovering and starting the search engine service is reduced by continuously reading the disk according to blocks.

Fig. 3 is a schematic diagram of a main flow of a processing method of incremental index data according to one referential embodiment of the present invention. Step 101 and step 103 are storage processes, and may specifically include the following steps:

in step 301, incremental index data is received.

Step 302, store the increment index data into the continuous data buffer, and return the offset1 of the increment index data in the data buffer.

Step 303, recording the document ID of the incremental index data and the position offset of the incremental index data in the data cache, that is, < document ID, offset1>, into an index cache.

Step 304, judging whether the cache space is full; if yes, go to step 305; if not, go to step 301.

And 305, adding the incremental index data in the data cache into an incremental data file of the disk, and returning the initial position base of the incremental data file in the disk.

In step 306, the offset of each piece of delta index data in the index buffer is updated according to the position offset2 of the delta index data in the disk, which is offset1+ base.

Step 307, store the updated data < document ID, offset2> in the index cache to the database engine file of the disk.

The step 301 and 307 are adopted for persistent storage, which can avoid triggering the disk writing operation every time an increment index datum is received, so the embodiment of the invention can reduce the time consumption of persistent increment index data.

In addition, in one embodiment of the present invention, the detailed implementation of the method for processing the increment index data is described in detail in the above-mentioned method for processing the increment index data, and therefore, the repeated content is not described again.

Fig. 4 is a schematic diagram of a main flow of a processing method of incremental index data according to another referential embodiment of the present invention. With t in FIG. 2_nFor example, the step 103 may include the following steps:

step 401, judging whether the cache is empty; if yes, go to step 405; if not, go to step 402.

Step 402, adding the incremental index data in the data cache to the incremental data file of the temporary subdata set, and returning the initial position base of the incremental data file in the disk.

In step 403, the offset of each piece of delta index data in the index buffer is updated according to the position offset2 of the delta index data in the disk, which is offset1+ base.

At step 404, the updated data < document ID, offset2> in the index cache is stored in the database engine file of the temporary sub data set.

Step 405, closing all file objects in the temporary subdata set.

And 406, allocating a snapshot ID to the temporary subdata set, renaming the temporary subdata set by using the snapshot ID, and packaging the temporary subdata set into a read-only subdata set.

Step 407, creating a temporary subdata set, opening all file objects (i.e. an incremental data file and a database engine file) in the created temporary subdata set, and storing the file objects until the next snapshot operation (t)_n+1Before time) newly added increment index data.

The snapshot operation can ensure the integrity of the persistent increment index data and avoid the influence of frequent disk writing on the step 101 and the step 102. If no new incremental index data is persisted during the two snapshot operations, the two snapshot operations will be treated as the same operation and the second snapshot operation will not perform any operation, thereby avoiding frequent snapshot operations blocking other operations.

In addition, in another embodiment of the present invention, the detailed implementation of the method for processing the increment index data is described in detail in the above-mentioned method for processing the increment index data, and therefore the repeated content is not described herein.

Fig. 5 is a schematic diagram of a main flow of a processing method of incremental index data according to still another referential embodiment of the present invention.

At willAfter the temporary subdata set is packaged as a read-only subdata set, the method further comprises: sequentially loading each subdata set into a memory according to the sequence of the snapshot IDs from large to small; and analyzing the data in each subdata set in the memory according to the sequence of the snapshot IDs from large to small, and storing the analyzed data in the memory. When the search engine service is restarted, the subdata sets corresponding to the specified time can be obtained through the snapshot ID, and the subdata sets smaller than or equal to the snapshot ID are sequentially restored. As shown in FIG. 2, t may be recovered by the snapshot ID "000001₁All the delta data before the time, that is, all the delta index data in the sub data set "000001" and the sub data set "000000".

Optionally, as shown in fig. 5, for each sub data set, the sub data set is loaded into the memory by the following method:

step 501, reading the database engine file in the subdata set into a memory.

Optionally, the key-value pair "< document ID, offset2 >" stored in the level db file is read into the memory.

Step 502, traversing the database engine file, and mapping the key values into fixed-size buckets according to the position offset of the incremental index data in the disk.

Alternatively, the level db file is traversed and key-value pairs are mapped into fixed-size buckets (e.g., 1M, 64M, or 128M) according to offset2, where the incremental data corresponding to the document ID in each bucket is stored in contiguous (e.g., 1M, 64M, or 128M) disk space. As shown in FIG. 6, ID-1, ID-3, and ID-4 are mapped to the first bucket, and the corresponding incremental index data are all in the space of 0-1M, 0-64M, or 0-128M of the disk file.

Step 503, sequentially traversing each bucket according to the sequence from small to large, and loading the incremental index data corresponding to each bucket into the memory.

In this step, each bucket is traversed from small to large (i.e., the first bucket to the last bucket are traversed in sequence), and for all the incremental index data in each bucket, all the incremental index data corresponding to the bucket can be loaded into the memory through one continuous disk reading operation (e.g., 1M, 64M, or 128M).

Therefore, through the steps 501 and 503, the sub data sets are sequentially traversed, so as to copy the incremental index data in the sub data sets to the memory.

Optionally, analyzing the incremental index data in each of the sub-data sets, and storing the analyzed incremental index data in a memory, including: analyzing the data in each subdata set in the memory by combining the bitmap of the document ID, and storing the analysis result in the memory; and updating the bitmap of the document ID according to the analysis result. And the bitmap of the document ID indicates whether the incremental index data corresponding to each document ID is analyzed.

Note that, in the bitmap of document IDs, each document ID is represented by one bit. Before the incremental index data is analyzed, a bit corresponding to the document ID of the incremental index data is marked as 0, and after the incremental index data is analyzed, a bit corresponding to the document ID of the incremental index data is marked as 1. When the increment index data in the next subdata set is analyzed, if the bit 1 corresponding to the document ID is checked to indicate that the increment index data is analyzed and stored in the memory, the increment index data is skipped.

Because the data in each subdata set in the memory is analyzed according to the sequence of the snapshot IDs from large to small, whether the incremental index data is analyzed is indicated by the bitmap of the document ID, and whether the incremental index data of the same document ID in the next subdata set is analyzed is conveniently judged. A document ID may be stored multiple times (e.g., a document is updated multiple times), and only the last time it was stored on disk (i.e., the most recent incremental index data) is resolved to memory.

Optionally, for each piece of increment index data in each sub data set, the following method is adopted for parsing: judging whether the state of the incremental index data to be analyzed in the bitmap of the document ID is not analyzed or not; if so, acquiring the incremental index data to be analyzed from the memory, and analyzing the incremental index data; and if not, skipping the increment index data to be analyzed. Traversing each piece of incremental data in the current bucket, checking a bitmap of a document ID of the currently analyzed incremental index data, if the bitmap of the document ID is marked as 0 (which indicates that the document ID is not analyzed), acquiring the incremental index data to be analyzed from a memory, analyzing the incremental index data, and marking the bitmap of the document ID as 1 (which indicates that the document ID is analyzed); otherwise, the piece of delta index data is skipped.

In the embodiment of the invention, when restarting the service, only the newly added incremental index data is analyzed to the memory for the service to use, and each incremental index data is prevented from being analyzed, so that the efficiency of service updating restart and fault recovery can be obviously improved; and the random disk reading is improved into continuous disk reading operation according to blocks, so that the disk reading characteristic of a mechanical disk is met, the disk reading efficiency during incremental index data recovery is improved, and the time consumption of the recovery and starting of the search engine service is reduced. Compared with the small file storage scheme in the prior art, the time consumption for starting the search engine service is reduced by one time, and the efficiency of service updating restart and fault recovery is obviously improved.

It should be noted that the search engine service does not restore the temporary sub data sets when it is restarted, but rather restores the read-only sub data sets. And encapsulating the temporary subdata set into a read-only subdata set for restarting and recovering by a snapshot function, and simultaneously ensuring the integrity of the subdata set. When the service is restarted, continuous disk reading operation according to blocks is executed by traversing each subdata set, and incremental index data stored in the disk are copied to the memory for service use, so that the disk reading efficiency during restoring the subdata set is improved, and the time consumption of restoring and starting the search engine service is reduced.

Fig. 7 is a schematic diagram of main blocks of an apparatus for processing incremental index data according to an embodiment of the present invention, and as shown in fig. 7, an apparatus 700 for processing incremental index data includes a first storage module 701, a second storage module 702, and an encapsulation module 703. The first storage module 701 is configured to receive incremental index data and store the incremental index data in a continuous cache; the second storage module 702 is configured to store the increment index data in the cache into a temporary subdata set of a disk if the space of the cache is full; the encapsulation module 703 is configured to encapsulate the temporary sub-data set into a read-only sub-data set at preset time intervals or at preset increment index data numbers.

Optionally, the first storage module 701 is further configured to:

Optionally, the second storage module 702 is further configured to:

Optionally, the second storage module 702 is further configured to: for each piece of incremental index data, determining the position offset of the piece of incremental index data in the disk by adopting the following method:

Optionally, the encapsulation module 703 is further configured to:

judging whether the cache is empty or not;

if yes, packaging the temporary subdata set into a read-only subdata set;

Optionally, the encapsulation module 703 is further configured to:

closing all file objects in the temporary subdata set;

Optionally, the system further comprises a parsing module, configured to:

reading a database engine file in the subdata set into an internal memory;

Optionally, the parsing module is further configured to:

updating the bitmap of the document ID according to the analysis result;

and if not, skipping the increment index data to be analyzed.

It should be noted that, in the implementation of the incremental index data processing apparatus according to the present invention, the above incremental index data processing method has been described in detail, and therefore, the repeated description is omitted here.

Fig. 8 shows an exemplary system architecture 800 of a processing method of incremental index data or a processing apparatus of incremental index data to which an embodiment of the present invention can be applied.

As shown in fig. 8, the system architecture 800 may include

terminal devices

801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the

terminal devices

801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

801, 802, 803 to interact with a server 804 over a network 804 to receive or send messages or the like. The

terminal devices

801, 802, 803 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 805 may be a server that provides various services, such as a back-office management server (for example only) that supports shopping-like websites browsed by users using the

terminal devices

801, 802, 803. The background management server may analyze and otherwise process the received data such as the item information query request, and feed back a processing result (for example, target push information, item information — just an example) to the terminal device.

It should be noted that the method for processing the incremental index data provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the processing device for the incremental index data is generally disposed in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program article comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program articles according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a first memory module, a second memory module, and a packaging module, where the names of the modules do not in some way constitute a limitation on the modules themselves.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving incremental index data and storing the incremental index data into a continuous cache; if the space of the cache is full, storing the increment index data in the cache into a temporary subdata set of a disk; and packaging the temporary subdata sets into read-only subdata sets at preset time intervals or preset increment index data numbers.

According to the technical scheme of the embodiment of the invention, because the incremental index data are stored in the continuous cache, if the cache space is full, the incremental index data in the cache are stored in the temporary subdata set of the disk, and the temporary subdata set is packaged into the read-only subdata set, the technical problem of long restart time caused by reloading and analyzing the incremental index data when the search engine service is restarted in the prior art is solved. The embodiment of the invention can avoid triggering the operation of writing the disk when receiving one piece of increment index data, thereby reducing the time consumption of persisting the increment index data. And newly-added increment index data during two adjacent snapshot operations are stored in the subdata set for restart recovery, and meanwhile, the integrity of the subdata set is ensured. When the service is restarted, the disk reading efficiency when the subdata set is recovered is improved and the time consumption for recovering and starting the search engine service is reduced by continuously reading the disk according to blocks.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for processing incremental index data is characterized by comprising the following steps:

receiving increment index data and storing the increment index data into a continuous cache;

2. The method of claim 1, wherein storing the delta index data in a continuous cache comprises:

3. The method of claim 2, wherein storing the cached delta index data in a temporary subdata set of a disk comprises:

4. The method of claim 3, wherein for each piece of delta index data, the position offset of the piece of delta index data in the disk is determined by:

5. The method of claim 3, wherein encapsulating the temporary sub data set as a read-only sub data set comprises:

judging whether the cache is empty or not;

if yes, packaging the temporary subdata set into a read-only subdata set;

6. The method of claim 5, wherein encapsulating the temporary sub data set as a read-only sub data set comprises:

closing all file objects in the temporary subdata set;

7. The method of claim 3, further comprising, after encapsulating the temporary sub data set as a read-only sub data set:

8. The method of claim 7, wherein for each sub data set, loading the sub data set into memory is performed by:

reading a database engine file in the subdata set into an internal memory;

9. The method of claim 7, wherein parsing the incremental index data in each of the sub-datasets and storing the parsed incremental index data in a memory comprises:

updating the bitmap of the document ID according to the analysis result;

10. The method of claim 9, wherein the parsing is performed for each piece of incremental index data in each sub data set by:

and if not, skipping the increment index data to be analyzed.

11. An apparatus for incrementally indexing data, comprising:

the first storage module is used for receiving the increment index data and storing the increment index data into a continuous cache;

12. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.

13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.