CN111352936A

CN111352936A - Method and storage medium for ES index reconstruction

Info

Publication number: CN111352936A
Application number: CN202010081576.0A
Authority: CN
Inventors: 刘德建; 林伟; 郭玉湖; 陈宏�
Original assignee: Fujian Tianquan Educational Technology Ltd
Current assignee: Fujian Tianquan Educational Technology Ltd
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2020-06-30

Abstract

The invention provides a method and a storage medium for reconstructing an index by an ES (ES), wherein the method comprises the following steps: creating a consumption group thread waiting for triggering consumption; synchronously writing the index operation corresponding to the old index into the consumption group; configuring setting and mapping fields of the new index; reconstructing an index; triggering the consumption group to consume, and consuming the index operation in the consumption group into the newly-built index; when the consumption delay of the consumption group is lower than a threshold value, the association of the old index and the alias thereof is switched to the association of the new index and the alias. The invention can realize the switching of ES indexes without stopping service, and bring about nearly non-sensible experience to users; meanwhile, the invention also has the characteristics of high efficiency, stability and low cost.

Description

Method and storage medium for ES index reconstruction

Technical Field

The invention relates to the field of database search, in particular to a method and a storage medium for ES reconstruction index.

Background

With the rapid development of the mobile internet, a business system faces a scene of complex searching of big data, and the traditional relational database MySQL cannot be applied to the scene of complex condition searching of the big data.

The Elasticissearch is a distributed full-text search engine based on the Lucene underlying technology, and provides a near-real-time solution for complex search conditions. The specific principle comprises the following steps: firstly, a user submits data to an elastic search database, then a corresponding sentence is segmented by a segmentation controller, the weight and the segmentation result are stored together, and by utilizing the principle of inverted index, when the user searches data, the data is ranked and scored according to the weight result, and the search result is returned to the user. The data in the Elasticsearch is stored in the index, and each index generally needs to be preset with setting of the index and a mapping type corresponding to the field. However, the field map type of the elasticsearch index can only add a field once created, and cannot change an existing field. In an actual online business scenario, situations are often encountered in which the field type setting is incorrect, and dirty data exists in the index of the online elastic search. When processing these scenes, an operation of reconstructing the index has to be performed. In order to affect the service as little as possible, the existing technical solution is to reconstruct the index by reconstructing the index and switching the alias, and then appending data at a later stage, and the specific method is as follows: (1) establishing a new index mapping type and related settings; (2) stopping inserting or modifying and deleting data into the old index; (3) copying the data of the old index into the new index through the index re-index operation; (4) deleting the association with the old index by the alias, and associating the alias to the new index; (5) data is added. However, in this method, from the second step, the data of the old index is not updated any more. Therefore, the old data is searched by the user, and the user can not search the latest data until the fifth step operation is completed. This may take several minutes if the amount of old index data is not large, but in the case of hundreds G, even T of old index data, the user may wait several hours to query the latest data, which is intolerable to the C-side search service with high real-time requirement in the internet.

Therefore, it is necessary to provide an effective solution to the problem that the user may not search the latest data for a long time due to the reconstruction of the index, which brings a bad experience to the user.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the ES index reconstruction method and the storage medium are provided, and the operating process is insensitive to users, so that the user experience is remarkably improved.

In order to solve the technical problems, the invention adopts the technical scheme that:

creating a consumption group thread waiting for triggering consumption;

after the setting and mapping fields of the new index are configured, the index operation corresponding to the old index is synchronously written into the consumption group;

reconstructing an index;

triggering the consumption group to consume, and consuming the index operation in the consumption group into the newly-built index;

when the consumption delay of the consumption group is lower than a threshold value, the association of the old index and the alias thereof is switched to the association of the new index and the alias.

The invention provides another technical scheme as follows:

a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, is capable of implementing the steps included in the above-mentioned method for ES reconstruction index.

The invention has the beneficial effects that: aiming at the problem that the index base can not be used by a user for a long time due to index reconstruction, the invention realizes the index reconstruction without stopping the normal use of the old index by utilizing the message queue double consumption group mode based on the premise of message queue power consumption, and brings nearly-insensible experience to the user.

Drawings

FIG. 1 is a flowchart illustrating a method for reconstructing an index by an ES according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an ES index reconstruction method according to an embodiment of the present invention.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

The most key concept of the invention is as follows: the use of the old index is not stopped, the index operation of the old index is monitored through the double consumption groups, and after the new index is rebuilt, the data is added through the consumption groups, so that the user is provided with nearly-insensible experience.

The technical terms related to the invention are explained as follows:

referring to fig. 1, the present invention provides a method for reconstructing an ES index, including:

creating a consumption group thread waiting for triggering consumption;

synchronously writing the index operation corresponding to the old index into the consumption group;

configuring setting and mapping fields of the new index;

reconstructing an index;

From the above description, the beneficial effects of the present invention are: creating a new consumption group to monitor an index operation event of an old index, and reconstructing the index through a reindex without stopping inserting data into the old index or modifying or deleting data (therefore, a user can still inquire the latest data) after establishing mapping of a new index meeting the requirement; and after reconstruction is completed, the index operation in the new consumption group is consumed to the new index through the new consumption group to realize data addition, and the alias switching is carried out only after complete addition is successful. The above operation is not sensible to the user.

Further, when the consumption delay of the consumption group is lower than a threshold, the method further comprises the following steps:

stopping consumption behavior of the consumption group;

deleting the consumption group and old index.

As can be seen from the above description, after the new index is successfully reconstructed and can be put into use, the consumption group and the old index are deleted, so that unnecessary resource loss can be avoided.

Further, the index operation corresponding to the old index is synchronously written into the consumption group, specifically:

receiving an index operation instruction corresponding to the old index;

and writing the index operation to an old index according to the instruction, and simultaneously writing the index operation to the consumption group.

As can be seen from the above description, the old index will keep working normally in the process of rebuilding the new index and before the new index is put into use, so as to provide good experience for the user.

Further, after configuring the setting and mapping fields of the new index, the method further includes:

and configuring parameters of the new index, wherein the parameters correspond to the dirty data in the corrected index.

As can be seen from the above description, in the process of reconstructing the new index, the dirty data in the old index can be corrected through the parameter configuration of the new index.

Further, the reconstructing the index specifically includes:

copying the data of the old index to the new index.

As can be seen from the above description, the old index function is maintained by moving the old index data as it is.

Further, the consumption group is a message queue of a kafka topic or a rabbitmq topic or an actvemq topic.

As can be seen from the above description, it is more flexible to support multiple types of message queues to be selected.

The invention provides another technical scheme as follows:

a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of implementing the steps of a method of ES reconstruction indexing comprising:

creating a consumption group thread waiting for triggering consumption;

configuring setting and mapping fields of the new index;

reconstructing an index;

stopping consumption behavior of the consumption group;

deleting the consumption group and old index.

receiving an index operation instruction corresponding to the old index;

Further, the reconstructing the index specifically includes:

copying the data of the old index to the new index.

As can be understood from the above description, those skilled in the art can understand that all or part of the processes in the above technical solutions can be implemented by instructing related hardware through a computer program, where the program can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the above methods. The program can also achieve advantageous effects corresponding to the respective methods after being executed by a processor.

The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Example one

Referring to fig. 2, the present embodiment provides a method for reconstructing an ES index, which brings a user experience of reconstructing an index without sense.

According to the method, the index operation of the old index is recorded through the message queue, the queue message is consumed to the new index for data addition after the new index is reconstructed, and the old index does not need to be stopped in the whole process.

The message queue related to the embodiment is a double consumption group, and can be a kafka topic, a rabbitmq topic or an activemq topic. In this example, the kafka theme is explained.

The method comprises the following steps:

s1: creating a consumption group thread waiting for triggering consumption;

specifically, a thread of a consumption group new _ group corresponding to the kafka theme is created, and the consumption mode of the consumption group is that the message is consumed only after being triggered.

S2: synchronously writing the index operation corresponding to the old index into the consumption group;

after the consumption group thread is created, all index operations of the old index by the user are written into the consumption group together, but the consumption is not carried out.

It should be noted that the old index does not stop working from the beginning to the end. That is to say, in the present embodiment, the ES reconstructs the index in the whole process until the old index is deleted, and when the user searches, the user still executes the index based on the old index, and also supports all the index operations, such as inserting, modifying, and deleting data, to be executed in the old index.

Corresponding to the step, the user is corresponding to the index operation of the old index and is also written into the consumption group.

S3: creating a new index new _ index;

specifically, in the database where the old index old _ index to be modified is located, the mapping field mappings required for creating the new index new _ index and the corresponding settings of other related indexes are set.

In a specific example, the method further comprises the step of setting the parameter configuration of the new index according to specific business requirements, such as writing dirty data to be corrected in a script.

S4: and reconstructing the index.

Specifically, a reindex rebuilding index operation is performed, and at this time, the elastic search automatically copies the data of the old index into the new index.

S5: after the step S4 is completed, the consumption group created in the first step is triggered to perform consumption information, and new index data is added.

That is, from this point on, writing of the index operation recorded in the consumption group into the new index is performed.

S6: monitoring the consumption delay lag of the consumption group new _ group, and executing the next step when the consumption delay lag is lower than a preset threshold value, namely the consumption delay lag is stabilized at a small value and indicates that the latest generated data can be consumed in near real time;

s7: switching alias associations;

specifically, the alias is switched by an atom built in the elastic search. The association of the alias with the old index is deleted and then the alias is associated with the new index.

S8: stopping consumption behavior of the consumption group;

s9: deleting the consumption group and old index.

Example two

This embodiment provides a specific application scenario corresponding to the first embodiment:

the topic for Kafka is: topic _ order, the corresponding consumption group is: consumer _ order _ group;

the old order index name is: index _ order, the corresponding alias is: alias _ order;

the new order index name is: index _ order _ new.

The method comprises the following steps:

1. the old order program inquires the old order index _ order according to the alias _ order and writes order data; and simultaneously, the newly added order data is also written into a newly-built consumption group consumer _ order _ group waiting for triggering consumption.

That is, after modification, the old order program can query the index _ order according to the alias _ order to write the order data. At this point, the order program also needs to write the piece of order data together into the topic _ order topic of kafka.

2. An operation of reconstructing an index in the ES is performed.

Parameters of the reconstruction index are set, such as: dirty data and the like which need to be corrected can be written in the script, and the index rebuilding operation is carried out. The elastic search copies data of the old index _ order to the new index _ order _ new.

At this time, the old order program also performs the step 1 operation. Meanwhile, the order program is written into and inquired of the old index _ order index base, and the operation of the program is not influenced.

3. When step 2 is completed, that is, index _ order _ new is established and the old index data is copied into the new index library, the consumer _ order _ group consumption group thread is started, and from this moment, the index message in topic _ order is incrementally consumed into the new index library index _ order _ new.

4. Observing the consumption delay lag of the consumption group consumer _ order _ group _ new, when lag is smaller, the latest production data can be consumed in near real time. At this time, the next operation can be performed.

5. And switching the alias association.

Alias switching by an atom built in the elasticsearch: i.e. the alias _ order is deleted from the association with the old index _ order while associating the alias _ order to the new index _ order _ new. The process can perform the switching process quickly.

6. Stopping consuming the consumer _ order _ group;

7. the consumption group consumr _ order _ group and index _ order are deleted.

At this time, the function of switching the ES index without stopping the service has been realized.

EXAMPLE III

This embodiment provides another specific application scenario corresponding to the first embodiment:

service scenario (order):

in a large e-commerce platform, order data of a user is often stored in an order index order in an elastic search, and the data of the order data can reach billions of data, and the disk space is occupied by 2 to 3T. By utilizing the distributed search feature of the elastic search, a user can search for his or her own order among billions of orders in milliseconds.

Service requirements are as follows:

some users of online orders feed back that the orders cannot be searched out.

The technical scheme is as follows:

because field mapping of the online order index is provided with a problem, the user cannot search out the correct order. While the online order amount already has billions of data, the conventional scheme of reconstructing the index may result in the user not searching for the latest order within hours. Therefore, a method for reconstructing the index of the new elastic search is adopted.

Basic information:

the topic for Kafka is: topic _ order;

the order index name is: the alias corresponding to index _ order is: alias _ order;

the corresponding consumption groups are: consumer _ order _ group.

The method comprises the following specific steps:

1. creating a new consumption group consumer _ order _ group _ new corresponding to the kafka theme topic _ order, wherein the consumption offset is latest (namely only monitoring the messages after the moment is started, and no message consumption is carried out); the snooping object of the consumption group is an index operation corresponding to the old index, namely an index operation synchronously writing the old index.

2. Modifying the field attribute of the order index according to the requirement, and creating a new order index _ order _ new;

3. parameters of the reconstruction index are set, such as: dirty data and the like needing to be corrected can be written in the script, a reindex rebuilding index operation is carried out, and the elastic search can copy the data of the old index _ order to the new index _ order _ new.

4. After the step 3 is completed, the consumption group consumer _ order _ group _ new established in the first step is consumed, and new index data is added.

5. Observing the consumption delay lag of the consumption group consumer _ order _ group _ new, when lag is smaller, the latest production data can be consumed in near real time. At this time, the next operation can be performed.

6. And switching the alias association. Alias switching by an atom built in the elasticsearch: i.e. the alias _ order is deleted from the association with the old index _ order while associating the alias _ order to the new index _ order _ new.

7. Stopping consuming the consumer _ order _ group;

8. the consumption group consumr _ order _ group and index _ order are deleted.

To summarize: after the operation, the problem that some users cannot search the order is solved, and the condition that the users can inquire the latest order is not influenced.

Example four

Corresponding to the first to fourth embodiments, the present embodiment provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, is capable of implementing the steps included in the method for reconstructing an index of an ES according to any one of the first to fourth embodiments. The detailed steps are not repeated here, and please refer to the description of the first to fourth embodiments in detail.

In summary, the method and the storage medium for reconstructing the ES index provided by the present invention can implement ES index switching without stopping service, and bring approximately non-sensible experience to the user; meanwhile, the invention also has the characteristics of high efficiency, stability and low cost.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

A method of ES reconstruction indexing, comprising:

creating a consumption group thread waiting for triggering consumption;

synchronously writing the index operation corresponding to the old index into the consumption group;

configuring setting and mapping fields of the new index;

reconstructing an index;

triggering the consumption group to consume, and consuming the index operation in the consumption group into the newly-built index;

when the consumption delay of the consumption group is lower than a threshold value, the association of the old index and the alias thereof is switched to the association of the new index and the alias.
2. The ES index reconstruction method of claim 1, wherein when the consumption delay of the consumption group is below a threshold, then further comprising:

stopping consumption behavior of the consumption group;

deleting the consumption group and old index.
3. The ES index rebuilding method of claim 1, wherein said index operation corresponding to the old index is synchronously written into said consumption group, specifically:

receiving an index operation instruction corresponding to the old index;

and writing the index operation to an old index according to the instruction, and simultaneously writing the index operation to the consumption group.
4. The method for ES rebuilding index of claim 1, wherein after configuring the setting and mapping fields of the new index, further comprising:

and configuring parameters of the new index, wherein the parameters correspond to the dirty data in the corrected index.
5. The ES index reconstruction method according to claim 1, wherein the index reconstruction method specifically comprises:

copying the data of the old index to the new index.
6. The ES re-indexing method of claim 1, wherein the consumption group is a message queue of a kafka topic or a rabbitmq topic or an actvemq topic.
7. A computer-readable storage medium, on which a computer program is stored, the program being capable of implementing the steps included in the method for ES reconstruction index according to any one of claims 1 to 6 when the program is executed by a processor.