CN116401259A

CN116401259A - Automatic pre-creation index method and system for elastic search database

Info

Publication number: CN116401259A
Application number: CN202310671425.4A
Authority: CN
Inventors: 何飞; 郑成彬; 翁国海
Original assignee: Beijing Jiangrongxin Technology Co ltd
Current assignee: Beijing Jiangrongxin Technology Co ltd
Priority date: 2023-06-08
Filing date: 2023-06-08
Publication date: 2023-07-07
Anticipated expiration: 2043-06-08
Also published as: CN116401259B

Abstract

The application provides an automated pre-creation indexing method and system for an elastic search database. The method generates an index and an index configuration according to read business data structure information by generating an automation task, and creates a corresponding index in an elastic search database according to the index and the index configuration. According to the technical scheme, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to the fact that the index creation of the elastic search cluster server is overtime can be completely avoided.

Description

Automatic pre-creation index method and system for elastic search database

Technical Field

The present disclosure relates to the field of index creation in databases, and more particularly, to an automated pre-creation index method and system for an elastic search database.

Background

Elastic search is a very excellent Lucene-based distributed data storage and search engine. The method can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring. In recent years, with the advent of the internet era, the data volume has been increased, and in the background that enterprises use large data technologies to realize their own business development needs, an elastic search cluster is often used to provide search services.

The elastesearch defaults to mapping which dynamically creates indexes and index types, i.e., elastesearch does not have the property of autonomously pre-creating indexes. When the elastic search stores data, corresponding index information and mapping configuration of index types can be dynamically generated in real time according to the service data structure information. After the index is established, the business data structure information is submitted to an elastic search database, and the corresponding sentence is segmented by a segmentation controller, and the weight and the segmentation result are stored into data together. Whereas the elastesearch creation index timeout period is 30 seconds. This results in resource contention conditions being encountered when the cluster server itself resources are at a peak occupancy. When the server performs a high-consumption thread such as GC recycle, the thread of the elastesearch for saving data may be suspended and not processed, and after 30 seconds, the elastesearch creation index is timed out, so that the data saving cannot be completed.

Disclosure of Invention

In order to solve the problem that when the self resources of an elastic search cluster server (ES cluster server) are in an occupied peak, under the condition of resource contention, the establishment of an index by the elastic search cluster server is overtime and thus the data storage cannot be completed, the application provides a method for establishing the index in an elastic search database. The method generates an index and index configuration (mapping) from the read business data structure information by generating an automation task, and creates a corresponding index in an elastic search database according to the index and index configuration.

The application adopts the following technical scheme: an automated pre-creation indexing method for an elastic search database, the method comprising the steps of:

step 1, executing a timing task and generating an index creation request;

step 2, acquiring service data structure information from service data structure samples;

step 3, determining information of an elastic search node corresponding to the service data structure information, wherein the information of the elastic search node comprises an index, a document type, a document, a field, a word and a mark;

step 4, generating index configuration according to the information of the elastic search node;

and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.

Further, the method further comprises: and step 6, storing the generated index corresponding to the index creation request into an elastic search database.

Further, in step 1, a quantiz framework is used to perform a timing task, and an index creation request is automatically generated according to the timing task.

Further, the step 1 specifically includes the following sub-steps:

step 101, starting a timing task and executing task monitoring;

102, reading an elastic search database, and loading the started timing task job and trigger into a scheduler;

step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.

Further, in step 2, the method further comprises: the timing task obtains a business data structure sample from an elastic search database.

Further, the service data structure sample includes a mapping directory and a plurality of service data structure information.

Further, the service data structure information is stored as a mapping. Txt file and managed by a version number.

Further, the index configuration includes a configuration section for the incoming setup information and a mapping section for the incoming type mapping.

Further, in step 5, an index is created by the following syntax: PUT http:// host: port/index_name/+index_configuration;

where index_name represents the name of the created index, index_configuration is the configuration information of the index of the request load delivered to the elastic search server, and the data format is json.

An automated pre-creation index system for an elastic search database, the system comprising a processor and a memory storing instructions executable by the processor, the processor performing the method steps described above when the instructions are executed by the processor.

Through the embodiment of the application, the following technical effects can be obtained: according to the technical scheme, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to the fact that the index creation of the elastic search cluster server is overtime can be completely avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow diagram of an automated pre-creation indexing method;

FIG. 2 is a schematic flow chart of the method for performing timing tasks;

FIG. 3 is a schematic diagram of the composition of service data structure information in the method;

FIG. 4 is a diagram of a directory structure of a sample of service data structure information in the method;

FIG. 5 is a schematic diagram of task execution tenant data in the method;

FIG. 6 is a schematic diagram of ES node information in the method;

FIG. 7 is a mapping diagram of the method for generating an index mapping configuration;

fig. 8 is a flow diagram of creating and saving an index in the method.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

FIG. 1 is a flow diagram of an automated pre-creation indexing method. The method comprises the following steps:

in step 1, a quantiz framework is employed to perform a timed task and an index creation request is automatically generated from the timed task. Executing the timing task step 1, and executing the timing task to generate an index creation request;

the process of executing the timing task is shown in fig. 2, and specifically includes the following sub-steps:

step 101, starting a timing task and executing task monitoring;

102, reading a database, and loading the started timing task job and trigger into a scheduler;

In the interface implementation class of task Job, the generated index creation logic is added to the execute () method. The interface implementation class of the task Job is compiled, a scheduling schedule is set, and the Quartz framework pays attention to the remaining time by itself; when the scheduler in the scheduler determines that the scheduling time has arrived, the Quartz framework will call the execute () method in the interface implementation class (Job class) of task Job, and enter into the process of generating the index creation request flow; by configuring the schedule job expression to be "0 0 1.

each tenant correspondingly generates own service data structure information, and the timing task will sequentially read the service data structure information of each tenant, as shown in fig. 5. The service data structure information comprises an index, sample data and an index configuration (mapping), wherein the sample data is respectively associated with the index and the index configuration; the elastic search server can obtain the index and index configuration of the service data structure information by analyzing sample data; the composition of the service data structure information is shown in fig. 3.

And storing the service data structure information of each tenant to an application server designated directory, wherein the service data structure information is stored as a mapping. Txt file. The service data structure information of each tenant is stored as a mapping.

The composition of the traffic data structure information sample is shown in fig. 4. The business data structure information sample comprises a mapping catalog and a plurality of business data structure information, and each business data structure information can be searched through the mapping catalog.

The mapping. Txt file of the service data structure information is managed by version numbers, and the service data structure information of each version is stored in a folder named with a corresponding version number; when the service data structure information is changed, a new corresponding version number is generated, and the changed service data structure information is stored in a folder named by the new corresponding version number; and placing the mapping.txt file of the service data structure information under a designated directory for the timing task to read, and loading the service data structure information of a required version through the current active version number during reading.

Step 3, determining the elastic search node information corresponding to the service data structure information;

as shown in fig. 6, the elastic search node information includes an index (index name in the corresponding service data structure information), a document type, a document, a field, a word, and a mark. The following describes the various fields:

● Index (Index): a store for defining document types, in which the same field can define only one data type;

● Document Type (Type): the method is used for describing the definition of each field in the document, different document types, and can store different fields to serve different query requests;

● Document (Document): a carrier for storing data, comprising one or more fields in which the data is stored;

● Field (Field): a Key/Value pair of the document;

● Word (Term): representing a word in the text;

● Mark (Token): the word appearing in the field is represented, consisting of the text, offset (start and end) and type of the word.

Step 4, generating index configuration according to the elastic search node information;

as shown in fig. 7, a plurality of fields, such as Field 1, field 2, field 3, field N, are acquired from a service data structure sample, a corresponding ES Index is generated, and an Index structure of the ES Index is composed of an Index (Index), a Field (Field), and a document Type (Type). And generating index configuration according to the mapping relation between the fields in the service data structure sample and the index structure of the ES index.

The index configuration includes a configuration section (settings) in which setting information is entered, and a mapping section (maps) in which an incoming type mapping, within a request body an incoming setting or type mapping is implemented by the following function:

{

"settings": { ...... },

"mappings": {

"type_one": { ...... },

"type_two": { ...... },

...

}

The syntax for creating the index is: PUT http:// host port/index_name/+ index_configuration

Where index_name represents the name of the created index, index_configuration is the body of the request load (configuration information of the index) passed to the elastic search server, and the data format is json.

In summary, through the above flow operations, when the elastic search performs data saving, it is no longer necessary to create the index and the index configuration in real time. By creating an index in the elastic search in advance, the elastic search only needs to save data, and the flow of creating and saving the index is shown in fig. 8. Thus, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to overtime of the elastic search creation index can be completely avoided.

Finally, it is noted that the above-mentioned preferred embodiments are only intended to illustrate rather than limit the technical solution of the present application, and that, although the present application has been described in detail by means of the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application as defined by the appended claims.

Claims

1. An automated pre-creation indexing method for an elastic search database, the method comprising the steps of:

step 1, executing a timing task and generating an index creation request;

2. The method according to claim 1, characterized in that the method further comprises: and step 6, storing the generated index corresponding to the index creation request into an elastic search database.

3. The method according to claim 1, characterized in that in step 1 a quantiz framework is employed to perform a timed task and an index creation request is automatically generated from the timed task.

4. A method according to claim 3, characterized in that step 1 comprises the following sub-steps:

step 101, starting a timing task and executing task monitoring;

5. The method according to claim 1, further comprising, in step 2: the timing task obtains a business data structure sample from an elastic search database.

6. The method of claim 1, wherein the business data structure samples comprise a mapping directory and a plurality of business data structure information.

7. The method of claim 1, wherein the service data structure information is saved as a mapping. Txt file and is managed by a version number.

8. The method of claim 1, wherein the index configuration comprises a configuration section for incoming setup information and a mapping section for incoming type mapping.

9. The method according to claim 1, characterized in that in step 5, the index is created by the following syntax: PUT http:// host: port/index_name/+index_configuration;

10. An automated pre-creation index system for an elastic search database, characterized in that the system comprises a processor and a memory storing instructions executable by said processor, said processor performing the method steps of any of claims 1 to 9 when said instructions are executed by the processor.