CN116401259A - Automatic pre-creation index method and system for elastic search database - Google Patents
Automatic pre-creation index method and system for elastic search database Download PDFInfo
- Publication number
- CN116401259A CN116401259A CN202310671425.4A CN202310671425A CN116401259A CN 116401259 A CN116401259 A CN 116401259A CN 202310671425 A CN202310671425 A CN 202310671425A CN 116401259 A CN116401259 A CN 116401259A
- Authority
- CN
- China
- Prior art keywords
- index
- elastic search
- data structure
- configuration
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013507 mapping Methods 0.000 claims description 25
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 8
- 239000010453 quartz Substances 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44521—Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides an automated pre-creation indexing method and system for an elastic search database. The method generates an index and an index configuration according to read business data structure information by generating an automation task, and creates a corresponding index in an elastic search database according to the index and the index configuration. According to the technical scheme, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to the fact that the index creation of the elastic search cluster server is overtime can be completely avoided.
Description
Technical Field
The present disclosure relates to the field of index creation in databases, and more particularly, to an automated pre-creation index method and system for an elastic search database.
Background
Elastic search is a very excellent Lucene-based distributed data storage and search engine. The method can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring. In recent years, with the advent of the internet era, the data volume has been increased, and in the background that enterprises use large data technologies to realize their own business development needs, an elastic search cluster is often used to provide search services.
The elastesearch defaults to mapping which dynamically creates indexes and index types, i.e., elastesearch does not have the property of autonomously pre-creating indexes. When the elastic search stores data, corresponding index information and mapping configuration of index types can be dynamically generated in real time according to the service data structure information. After the index is established, the business data structure information is submitted to an elastic search database, and the corresponding sentence is segmented by a segmentation controller, and the weight and the segmentation result are stored into data together. Whereas the elastesearch creation index timeout period is 30 seconds. This results in resource contention conditions being encountered when the cluster server itself resources are at a peak occupancy. When the server performs a high-consumption thread such as GC recycle, the thread of the elastesearch for saving data may be suspended and not processed, and after 30 seconds, the elastesearch creation index is timed out, so that the data saving cannot be completed.
Disclosure of Invention
In order to solve the problem that when the self resources of an elastic search cluster server (ES cluster server) are in an occupied peak, under the condition of resource contention, the establishment of an index by the elastic search cluster server is overtime and thus the data storage cannot be completed, the application provides a method for establishing the index in an elastic search database. The method generates an index and index configuration (mapping) from the read business data structure information by generating an automation task, and creates a corresponding index in an elastic search database according to the index and index configuration.
The application adopts the following technical scheme: an automated pre-creation indexing method for an elastic search database, the method comprising the steps of:
step 1, executing a timing task and generating an index creation request;
step 2, acquiring service data structure information from service data structure samples;
step 3, determining information of an elastic search node corresponding to the service data structure information, wherein the information of the elastic search node comprises an index, a document type, a document, a field, a word and a mark;
step 4, generating index configuration according to the information of the elastic search node;
and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.
Further, the method further comprises: and step 6, storing the generated index corresponding to the index creation request into an elastic search database.
Further, in step 1, a quantiz framework is used to perform a timing task, and an index creation request is automatically generated according to the timing task.
Further, the step 1 specifically includes the following sub-steps:
step 101, starting a timing task and executing task monitoring;
102, reading an elastic search database, and loading the started timing task job and trigger into a scheduler;
step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.
Further, in step 2, the method further comprises: the timing task obtains a business data structure sample from an elastic search database.
Further, the service data structure sample includes a mapping directory and a plurality of service data structure information.
Further, the service data structure information is stored as a mapping. Txt file and managed by a version number.
Further, the index configuration includes a configuration section for the incoming setup information and a mapping section for the incoming type mapping.
Further, in step 5, an index is created by the following syntax: PUT http:// host: port/index_name/+index_configuration;
where index_name represents the name of the created index, index_configuration is the configuration information of the index of the request load delivered to the elastic search server, and the data format is json.
An automated pre-creation index system for an elastic search database, the system comprising a processor and a memory storing instructions executable by the processor, the processor performing the method steps described above when the instructions are executed by the processor.
Through the embodiment of the application, the following technical effects can be obtained: according to the technical scheme, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to the fact that the index creation of the elastic search cluster server is overtime can be completely avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow diagram of an automated pre-creation indexing method;
FIG. 2 is a schematic flow chart of the method for performing timing tasks;
FIG. 3 is a schematic diagram of the composition of service data structure information in the method;
FIG. 4 is a diagram of a directory structure of a sample of service data structure information in the method;
FIG. 5 is a schematic diagram of task execution tenant data in the method;
FIG. 6 is a schematic diagram of ES node information in the method;
FIG. 7 is a mapping diagram of the method for generating an index mapping configuration;
fig. 8 is a flow diagram of creating and saving an index in the method.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
FIG. 1 is a flow diagram of an automated pre-creation indexing method. The method comprises the following steps:
in step 1, a quantiz framework is employed to perform a timed task and an index creation request is automatically generated from the timed task. Executing the timing task step 1, and executing the timing task to generate an index creation request;
the process of executing the timing task is shown in fig. 2, and specifically includes the following sub-steps:
step 101, starting a timing task and executing task monitoring;
102, reading a database, and loading the started timing task job and trigger into a scheduler;
step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.
In the interface implementation class of task Job, the generated index creation logic is added to the execute () method. The interface implementation class of the task Job is compiled, a scheduling schedule is set, and the Quartz framework pays attention to the remaining time by itself; when the scheduler in the scheduler determines that the scheduling time has arrived, the Quartz framework will call the execute () method in the interface implementation class (Job class) of task Job, and enter into the process of generating the index creation request flow; by configuring the schedule job expression to be "0 0 1.
Step 2, acquiring service data structure information from service data structure samples;
each tenant correspondingly generates own service data structure information, and the timing task will sequentially read the service data structure information of each tenant, as shown in fig. 5. The service data structure information comprises an index, sample data and an index configuration (mapping), wherein the sample data is respectively associated with the index and the index configuration; the elastic search server can obtain the index and index configuration of the service data structure information by analyzing sample data; the composition of the service data structure information is shown in fig. 3.
And storing the service data structure information of each tenant to an application server designated directory, wherein the service data structure information is stored as a mapping. Txt file. The service data structure information of each tenant is stored as a mapping.
The composition of the traffic data structure information sample is shown in fig. 4. The business data structure information sample comprises a mapping catalog and a plurality of business data structure information, and each business data structure information can be searched through the mapping catalog.
The mapping. Txt file of the service data structure information is managed by version numbers, and the service data structure information of each version is stored in a folder named with a corresponding version number; when the service data structure information is changed, a new corresponding version number is generated, and the changed service data structure information is stored in a folder named by the new corresponding version number; and placing the mapping.txt file of the service data structure information under a designated directory for the timing task to read, and loading the service data structure information of a required version through the current active version number during reading.
Step 3, determining the elastic search node information corresponding to the service data structure information;
as shown in fig. 6, the elastic search node information includes an index (index name in the corresponding service data structure information), a document type, a document, a field, a word, and a mark. The following describes the various fields:
● Index (Index): a store for defining document types, in which the same field can define only one data type;
● Document Type (Type): the method is used for describing the definition of each field in the document, different document types, and can store different fields to serve different query requests;
● Document (Document): a carrier for storing data, comprising one or more fields in which the data is stored;
● Field (Field): a Key/Value pair of the document;
● Word (Term): representing a word in the text;
● Mark (Token): the word appearing in the field is represented, consisting of the text, offset (start and end) and type of the word.
Step 4, generating index configuration according to the elastic search node information;
as shown in fig. 7, a plurality of fields, such as Field 1, field 2, field 3, field N, are acquired from a service data structure sample, a corresponding ES Index is generated, and an Index structure of the ES Index is composed of an Index (Index), a Field (Field), and a document Type (Type). And generating index configuration according to the mapping relation between the fields in the service data structure sample and the index structure of the ES index.
The index configuration includes a configuration section (settings) in which setting information is entered, and a mapping section (maps) in which an incoming type mapping, within a request body an incoming setting or type mapping is implemented by the following function:
{
"settings": { ...... },
"mappings": {
"type_one": { ...... },
"type_two": { ...... },
...
}
}
and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.
The syntax for creating the index is: PUT http:// host port/index_name/+ index_configuration
Where index_name represents the name of the created index, index_configuration is the body of the request load (configuration information of the index) passed to the elastic search server, and the data format is json.
In summary, through the above flow operations, when the elastic search performs data saving, it is no longer necessary to create the index and the index configuration in real time. By creating an index in the elastic search in advance, the elastic search only needs to save data, and the flow of creating and saving the index is shown in fig. 8. Thus, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to overtime of the elastic search creation index can be completely avoided.
Finally, it is noted that the above-mentioned preferred embodiments are only intended to illustrate rather than limit the technical solution of the present application, and that, although the present application has been described in detail by means of the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application as defined by the appended claims.
Claims (10)
1. An automated pre-creation indexing method for an elastic search database, the method comprising the steps of:
step 1, executing a timing task and generating an index creation request;
step 2, acquiring service data structure information from service data structure samples;
step 3, determining information of an elastic search node corresponding to the service data structure information, wherein the information of the elastic search node comprises an index, a document type, a document, a field, a word and a mark;
step 4, generating index configuration according to the information of the elastic search node;
and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.
2. The method according to claim 1, characterized in that the method further comprises: and step 6, storing the generated index corresponding to the index creation request into an elastic search database.
3. The method according to claim 1, characterized in that in step 1 a quantiz framework is employed to perform a timed task and an index creation request is automatically generated from the timed task.
4. A method according to claim 3, characterized in that step 1 comprises the following sub-steps:
step 101, starting a timing task and executing task monitoring;
102, reading an elastic search database, and loading the started timing task job and trigger into a scheduler;
step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.
5. The method according to claim 1, further comprising, in step 2: the timing task obtains a business data structure sample from an elastic search database.
6. The method of claim 1, wherein the business data structure samples comprise a mapping directory and a plurality of business data structure information.
7. The method of claim 1, wherein the service data structure information is saved as a mapping. Txt file and is managed by a version number.
8. The method of claim 1, wherein the index configuration comprises a configuration section for incoming setup information and a mapping section for incoming type mapping.
9. The method according to claim 1, characterized in that in step 5, the index is created by the following syntax: PUT http:// host: port/index_name/+index_configuration;
where index_name represents the name of the created index, index_configuration is the configuration information of the index of the request load delivered to the elastic search server, and the data format is json.
10. An automated pre-creation index system for an elastic search database, characterized in that the system comprises a processor and a memory storing instructions executable by said processor, said processor performing the method steps of any of claims 1 to 9 when said instructions are executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310671425.4A CN116401259B (en) | 2023-06-08 | 2023-06-08 | Automatic pre-creation index method and system for elastic search database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310671425.4A CN116401259B (en) | 2023-06-08 | 2023-06-08 | Automatic pre-creation index method and system for elastic search database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116401259A true CN116401259A (en) | 2023-07-07 |
CN116401259B CN116401259B (en) | 2023-08-22 |
Family
ID=87014661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310671425.4A Active CN116401259B (en) | 2023-06-08 | 2023-06-08 | Automatic pre-creation index method and system for elastic search database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116401259B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268000A1 (en) * | 2017-03-20 | 2018-09-20 | Datameer, Inc. | Apparatus and Method for Distributed Query Processing Utilizing Dynamically Generated In-Memory Term Maps |
CN110297829A (en) * | 2019-06-26 | 2019-10-01 | 重庆紫光华山智安科技有限公司 | A kind of text searching method and system towards specific industry structuring business datum |
CN110347722A (en) * | 2019-07-11 | 2019-10-18 | 软通智慧科技有限公司 | Data capture method, device, equipment and storage medium based on HBase |
CN110609865A (en) * | 2018-05-29 | 2019-12-24 | 优信拍(北京)信息科技有限公司 | Information synchronization method, device and system |
CN111460023A (en) * | 2020-04-29 | 2020-07-28 | 上海东普信息科技有限公司 | Service data processing method, device, equipment and storage medium based on elastic search |
CN112612865A (en) * | 2020-12-17 | 2021-04-06 | 杭州迪普科技股份有限公司 | Document storage method and device based on elastic search |
CN113051460A (en) * | 2021-03-29 | 2021-06-29 | 北京智慧星光信息技术有限公司 | Elasticissearch-based data retrieval method and system, electronic device and storage medium |
CN114356878A (en) * | 2022-01-10 | 2022-04-15 | 中国银行股份有限公司 | Distributed storage method and device for unstructured data |
CN115145916A (en) * | 2022-06-27 | 2022-10-04 | 南斗六星系统集成有限公司 | Automatic capacity expansion method for Elasticissearch index in streaming data scene |
-
2023
- 2023-06-08 CN CN202310671425.4A patent/CN116401259B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268000A1 (en) * | 2017-03-20 | 2018-09-20 | Datameer, Inc. | Apparatus and Method for Distributed Query Processing Utilizing Dynamically Generated In-Memory Term Maps |
CN110609865A (en) * | 2018-05-29 | 2019-12-24 | 优信拍(北京)信息科技有限公司 | Information synchronization method, device and system |
CN110297829A (en) * | 2019-06-26 | 2019-10-01 | 重庆紫光华山智安科技有限公司 | A kind of text searching method and system towards specific industry structuring business datum |
CN110347722A (en) * | 2019-07-11 | 2019-10-18 | 软通智慧科技有限公司 | Data capture method, device, equipment and storage medium based on HBase |
CN111460023A (en) * | 2020-04-29 | 2020-07-28 | 上海东普信息科技有限公司 | Service data processing method, device, equipment and storage medium based on elastic search |
CN112612865A (en) * | 2020-12-17 | 2021-04-06 | 杭州迪普科技股份有限公司 | Document storage method and device based on elastic search |
CN113051460A (en) * | 2021-03-29 | 2021-06-29 | 北京智慧星光信息技术有限公司 | Elasticissearch-based data retrieval method and system, electronic device and storage medium |
CN114356878A (en) * | 2022-01-10 | 2022-04-15 | 中国银行股份有限公司 | Distributed storage method and device for unstructured data |
CN115145916A (en) * | 2022-06-27 | 2022-10-04 | 南斗六星系统集成有限公司 | Automatic capacity expansion method for Elasticissearch index in streaming data scene |
Also Published As
Publication number | Publication date |
---|---|
CN116401259B (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110581893A (en) | data transmission method and device, routing equipment, server and storage medium | |
CN112035230A (en) | Method and device for generating task scheduling file and storage medium | |
CN110046100B (en) | Packet testing method, electronic device and medium | |
US20210034574A1 (en) | Systems and methods for verifying performance of a modification request in a database system | |
CN116401259B (en) | Automatic pre-creation index method and system for elastic search database | |
CN108959294B (en) | Method and device for accessing search engine | |
CN113568603B (en) | Component object creating and interface method calling method, terminal and storage device | |
CN114625515A (en) | Task management method, device, equipment and storage medium | |
CN111858489B (en) | Multi-source heterogeneous spatial data archiving method based on self-adaptive metadata template | |
CN113792026A (en) | Deployment method and device of database script and computer readable storage medium | |
CN112988722A (en) | Hive partition table data cleaning method and device and storage medium | |
CN106682221B (en) | Question-answer interaction response method and device and question-answer system | |
CN113722141B (en) | Method and device for determining delay reason of data task, electronic equipment and medium | |
CN115617487A (en) | Container rescheduling method, device, equipment and storage medium | |
CN110825736A (en) | Method, device and system for asynchronously calling data | |
CN116860776A (en) | Label updating method, device, equipment and medium | |
CN117950985A (en) | Storage performance test method of search engine and related equipment | |
CN114817393A (en) | Data extraction and cleaning method and device and storage medium | |
CN117539928A (en) | Method and device for sub-graph matching through calculation engine | |
CN117971378A (en) | Workflow execution method, workflow execution device, electronic equipment and storage medium | |
CN115480897A (en) | Task processing method, device, equipment, storage medium and program product | |
CN117873497A (en) | Automatic CDH deployment method, device and system for big data management platform | |
CN112905321A (en) | Event response type task triggering method and device, electronic equipment and storage medium | |
CN114328965A (en) | Knowledge graph updating method and device and computer equipment | |
CN113360452A (en) | Distributed file generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |