CN116401259A - Automatic pre-creation index method and system for elastic search database - Google Patents

Automatic pre-creation index method and system for elastic search database Download PDF

Info

Publication number
CN116401259A
CN116401259A CN202310671425.4A CN202310671425A CN116401259A CN 116401259 A CN116401259 A CN 116401259A CN 202310671425 A CN202310671425 A CN 202310671425A CN 116401259 A CN116401259 A CN 116401259A
Authority
CN
China
Prior art keywords
index
elastic search
data structure
configuration
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310671425.4A
Other languages
Chinese (zh)
Other versions
CN116401259B (en
Inventor
何飞
郑成彬
翁国海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiangrongxin Technology Co ltd
Original Assignee
Beijing Jiangrongxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiangrongxin Technology Co ltd filed Critical Beijing Jiangrongxin Technology Co ltd
Priority to CN202310671425.4A priority Critical patent/CN116401259B/en
Publication of CN116401259A publication Critical patent/CN116401259A/en
Application granted granted Critical
Publication of CN116401259B publication Critical patent/CN116401259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an automated pre-creation indexing method and system for an elastic search database. The method generates an index and an index configuration according to read business data structure information by generating an automation task, and creates a corresponding index in an elastic search database according to the index and the index configuration. According to the technical scheme, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to the fact that the index creation of the elastic search cluster server is overtime can be completely avoided.

Description

Automatic pre-creation index method and system for elastic search database
Technical Field
The present disclosure relates to the field of index creation in databases, and more particularly, to an automated pre-creation index method and system for an elastic search database.
Background
Elastic search is a very excellent Lucene-based distributed data storage and search engine. The method can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring. In recent years, with the advent of the internet era, the data volume has been increased, and in the background that enterprises use large data technologies to realize their own business development needs, an elastic search cluster is often used to provide search services.
The elastesearch defaults to mapping which dynamically creates indexes and index types, i.e., elastesearch does not have the property of autonomously pre-creating indexes. When the elastic search stores data, corresponding index information and mapping configuration of index types can be dynamically generated in real time according to the service data structure information. After the index is established, the business data structure information is submitted to an elastic search database, and the corresponding sentence is segmented by a segmentation controller, and the weight and the segmentation result are stored into data together. Whereas the elastesearch creation index timeout period is 30 seconds. This results in resource contention conditions being encountered when the cluster server itself resources are at a peak occupancy. When the server performs a high-consumption thread such as GC recycle, the thread of the elastesearch for saving data may be suspended and not processed, and after 30 seconds, the elastesearch creation index is timed out, so that the data saving cannot be completed.
Disclosure of Invention
In order to solve the problem that when the self resources of an elastic search cluster server (ES cluster server) are in an occupied peak, under the condition of resource contention, the establishment of an index by the elastic search cluster server is overtime and thus the data storage cannot be completed, the application provides a method for establishing the index in an elastic search database. The method generates an index and index configuration (mapping) from the read business data structure information by generating an automation task, and creates a corresponding index in an elastic search database according to the index and index configuration.
The application adopts the following technical scheme: an automated pre-creation indexing method for an elastic search database, the method comprising the steps of:
step 1, executing a timing task and generating an index creation request;
step 2, acquiring service data structure information from service data structure samples;
step 3, determining information of an elastic search node corresponding to the service data structure information, wherein the information of the elastic search node comprises an index, a document type, a document, a field, a word and a mark;
step 4, generating index configuration according to the information of the elastic search node;
and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.
Further, the method further comprises: and step 6, storing the generated index corresponding to the index creation request into an elastic search database.
Further, in step 1, a quantiz framework is used to perform a timing task, and an index creation request is automatically generated according to the timing task.
Further, the step 1 specifically includes the following sub-steps:
step 101, starting a timing task and executing task monitoring;
102, reading an elastic search database, and loading the started timing task job and trigger into a scheduler;
step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.
Further, in step 2, the method further comprises: the timing task obtains a business data structure sample from an elastic search database.
Further, the service data structure sample includes a mapping directory and a plurality of service data structure information.
Further, the service data structure information is stored as a mapping. Txt file and managed by a version number.
Further, the index configuration includes a configuration section for the incoming setup information and a mapping section for the incoming type mapping.
Further, in step 5, an index is created by the following syntax: PUT http:// host: port/index_name/+index_configuration;
where index_name represents the name of the created index, index_configuration is the configuration information of the index of the request load delivered to the elastic search server, and the data format is json.
An automated pre-creation index system for an elastic search database, the system comprising a processor and a memory storing instructions executable by the processor, the processor performing the method steps described above when the instructions are executed by the processor.
Through the embodiment of the application, the following technical effects can be obtained: according to the technical scheme, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to the fact that the index creation of the elastic search cluster server is overtime can be completely avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow diagram of an automated pre-creation indexing method;
FIG. 2 is a schematic flow chart of the method for performing timing tasks;
FIG. 3 is a schematic diagram of the composition of service data structure information in the method;
FIG. 4 is a diagram of a directory structure of a sample of service data structure information in the method;
FIG. 5 is a schematic diagram of task execution tenant data in the method;
FIG. 6 is a schematic diagram of ES node information in the method;
FIG. 7 is a mapping diagram of the method for generating an index mapping configuration;
fig. 8 is a flow diagram of creating and saving an index in the method.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
FIG. 1 is a flow diagram of an automated pre-creation indexing method. The method comprises the following steps:
in step 1, a quantiz framework is employed to perform a timed task and an index creation request is automatically generated from the timed task. Executing the timing task step 1, and executing the timing task to generate an index creation request;
the process of executing the timing task is shown in fig. 2, and specifically includes the following sub-steps:
step 101, starting a timing task and executing task monitoring;
102, reading a database, and loading the started timing task job and trigger into a scheduler;
step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.
In the interface implementation class of task Job, the generated index creation logic is added to the execute () method. The interface implementation class of the task Job is compiled, a scheduling schedule is set, and the Quartz framework pays attention to the remaining time by itself; when the scheduler in the scheduler determines that the scheduling time has arrived, the Quartz framework will call the execute () method in the interface implementation class (Job class) of task Job, and enter into the process of generating the index creation request flow; by configuring the schedule job expression to be "0 0 1.
Step 2, acquiring service data structure information from service data structure samples;
each tenant correspondingly generates own service data structure information, and the timing task will sequentially read the service data structure information of each tenant, as shown in fig. 5. The service data structure information comprises an index, sample data and an index configuration (mapping), wherein the sample data is respectively associated with the index and the index configuration; the elastic search server can obtain the index and index configuration of the service data structure information by analyzing sample data; the composition of the service data structure information is shown in fig. 3.
And storing the service data structure information of each tenant to an application server designated directory, wherein the service data structure information is stored as a mapping. Txt file. The service data structure information of each tenant is stored as a mapping.
The composition of the traffic data structure information sample is shown in fig. 4. The business data structure information sample comprises a mapping catalog and a plurality of business data structure information, and each business data structure information can be searched through the mapping catalog.
The mapping. Txt file of the service data structure information is managed by version numbers, and the service data structure information of each version is stored in a folder named with a corresponding version number; when the service data structure information is changed, a new corresponding version number is generated, and the changed service data structure information is stored in a folder named by the new corresponding version number; and placing the mapping.txt file of the service data structure information under a designated directory for the timing task to read, and loading the service data structure information of a required version through the current active version number during reading.
Step 3, determining the elastic search node information corresponding to the service data structure information;
as shown in fig. 6, the elastic search node information includes an index (index name in the corresponding service data structure information), a document type, a document, a field, a word, and a mark. The following describes the various fields:
● Index (Index): a store for defining document types, in which the same field can define only one data type;
● Document Type (Type): the method is used for describing the definition of each field in the document, different document types, and can store different fields to serve different query requests;
● Document (Document): a carrier for storing data, comprising one or more fields in which the data is stored;
● Field (Field): a Key/Value pair of the document;
● Word (Term): representing a word in the text;
● Mark (Token): the word appearing in the field is represented, consisting of the text, offset (start and end) and type of the word.
Step 4, generating index configuration according to the elastic search node information;
as shown in fig. 7, a plurality of fields, such as Field 1, field 2, field 3, field N, are acquired from a service data structure sample, a corresponding ES Index is generated, and an Index structure of the ES Index is composed of an Index (Index), a Field (Field), and a document Type (Type). And generating index configuration according to the mapping relation between the fields in the service data structure sample and the index structure of the ES index.
The index configuration includes a configuration section (settings) in which setting information is entered, and a mapping section (maps) in which an incoming type mapping, within a request body an incoming setting or type mapping is implemented by the following function:
{
"settings": { ...... },
"mappings": {
"type_one": { ...... },
"type_two": { ...... },
...
}
}
and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.
The syntax for creating the index is: PUT http:// host port/index_name/+ index_configuration
Where index_name represents the name of the created index, index_configuration is the body of the request load (configuration information of the index) passed to the elastic search server, and the data format is json.
In summary, through the above flow operations, when the elastic search performs data saving, it is no longer necessary to create the index and the index configuration in real time. By creating an index in the elastic search in advance, the elastic search only needs to save data, and the flow of creating and saving the index is shown in fig. 8. Thus, the problem that when the self resources of the elastic search cluster server are in an occupied peak, under the condition of resource contention, the problem that the data storage cannot be completed due to overtime of the elastic search creation index can be completely avoided.
Finally, it is noted that the above-mentioned preferred embodiments are only intended to illustrate rather than limit the technical solution of the present application, and that, although the present application has been described in detail by means of the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application as defined by the appended claims.

Claims (10)

1. An automated pre-creation indexing method for an elastic search database, the method comprising the steps of:
step 1, executing a timing task and generating an index creation request;
step 2, acquiring service data structure information from service data structure samples;
step 3, determining information of an elastic search node corresponding to the service data structure information, wherein the information of the elastic search node comprises an index, a document type, a document, a field, a word and a mark;
step 4, generating index configuration according to the information of the elastic search node;
and 5, generating an index corresponding to the index creation request by the elastic search node according to the index configuration.
2. The method according to claim 1, characterized in that the method further comprises: and step 6, storing the generated index corresponding to the index creation request into an elastic search database.
3. The method according to claim 1, characterized in that in step 1 a quantiz framework is employed to perform a timed task and an index creation request is automatically generated from the timed task.
4. A method according to claim 3, characterized in that step 1 comprises the following sub-steps:
step 101, starting a timing task and executing task monitoring;
102, reading an elastic search database, and loading the started timing task job and trigger into a scheduler;
step 103, running a scheduler and running the timed task job according to task scheduling to automatically execute the generated index creation request.
5. The method according to claim 1, further comprising, in step 2: the timing task obtains a business data structure sample from an elastic search database.
6. The method of claim 1, wherein the business data structure samples comprise a mapping directory and a plurality of business data structure information.
7. The method of claim 1, wherein the service data structure information is saved as a mapping. Txt file and is managed by a version number.
8. The method of claim 1, wherein the index configuration comprises a configuration section for incoming setup information and a mapping section for incoming type mapping.
9. The method according to claim 1, characterized in that in step 5, the index is created by the following syntax: PUT http:// host: port/index_name/+index_configuration;
where index_name represents the name of the created index, index_configuration is the configuration information of the index of the request load delivered to the elastic search server, and the data format is json.
10. An automated pre-creation index system for an elastic search database, characterized in that the system comprises a processor and a memory storing instructions executable by said processor, said processor performing the method steps of any of claims 1 to 9 when said instructions are executed by the processor.
CN202310671425.4A 2023-06-08 2023-06-08 Automatic pre-creation index method and system for elastic search database Active CN116401259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310671425.4A CN116401259B (en) 2023-06-08 2023-06-08 Automatic pre-creation index method and system for elastic search database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310671425.4A CN116401259B (en) 2023-06-08 2023-06-08 Automatic pre-creation index method and system for elastic search database

Publications (2)

Publication Number Publication Date
CN116401259A true CN116401259A (en) 2023-07-07
CN116401259B CN116401259B (en) 2023-08-22

Family

ID=87014661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310671425.4A Active CN116401259B (en) 2023-06-08 2023-06-08 Automatic pre-creation index method and system for elastic search database

Country Status (1)

Country Link
CN (1) CN116401259B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268000A1 (en) * 2017-03-20 2018-09-20 Datameer, Inc. Apparatus and Method for Distributed Query Processing Utilizing Dynamically Generated In-Memory Term Maps
CN110297829A (en) * 2019-06-26 2019-10-01 重庆紫光华山智安科技有限公司 A kind of text searching method and system towards specific industry structuring business datum
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data capture method, device, equipment and storage medium based on HBase
CN110609865A (en) * 2018-05-29 2019-12-24 优信拍(北京)信息科技有限公司 Information synchronization method, device and system
CN111460023A (en) * 2020-04-29 2020-07-28 上海东普信息科技有限公司 Service data processing method, device, equipment and storage medium based on elastic search
CN112612865A (en) * 2020-12-17 2021-04-06 杭州迪普科技股份有限公司 Document storage method and device based on elastic search
CN113051460A (en) * 2021-03-29 2021-06-29 北京智慧星光信息技术有限公司 Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN114356878A (en) * 2022-01-10 2022-04-15 中国银行股份有限公司 Distributed storage method and device for unstructured data
CN115145916A (en) * 2022-06-27 2022-10-04 南斗六星系统集成有限公司 Automatic capacity expansion method for Elasticissearch index in streaming data scene

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268000A1 (en) * 2017-03-20 2018-09-20 Datameer, Inc. Apparatus and Method for Distributed Query Processing Utilizing Dynamically Generated In-Memory Term Maps
CN110609865A (en) * 2018-05-29 2019-12-24 优信拍(北京)信息科技有限公司 Information synchronization method, device and system
CN110297829A (en) * 2019-06-26 2019-10-01 重庆紫光华山智安科技有限公司 A kind of text searching method and system towards specific industry structuring business datum
CN110347722A (en) * 2019-07-11 2019-10-18 软通智慧科技有限公司 Data capture method, device, equipment and storage medium based on HBase
CN111460023A (en) * 2020-04-29 2020-07-28 上海东普信息科技有限公司 Service data processing method, device, equipment and storage medium based on elastic search
CN112612865A (en) * 2020-12-17 2021-04-06 杭州迪普科技股份有限公司 Document storage method and device based on elastic search
CN113051460A (en) * 2021-03-29 2021-06-29 北京智慧星光信息技术有限公司 Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN114356878A (en) * 2022-01-10 2022-04-15 中国银行股份有限公司 Distributed storage method and device for unstructured data
CN115145916A (en) * 2022-06-27 2022-10-04 南斗六星系统集成有限公司 Automatic capacity expansion method for Elasticissearch index in streaming data scene

Also Published As

Publication number Publication date
CN116401259B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN110581893A (en) data transmission method and device, routing equipment, server and storage medium
CN112035230A (en) Method and device for generating task scheduling file and storage medium
CN110046100B (en) Packet testing method, electronic device and medium
US20210034574A1 (en) Systems and methods for verifying performance of a modification request in a database system
CN116401259B (en) Automatic pre-creation index method and system for elastic search database
CN108959294B (en) Method and device for accessing search engine
CN113568603B (en) Component object creating and interface method calling method, terminal and storage device
CN114625515A (en) Task management method, device, equipment and storage medium
CN111858489B (en) Multi-source heterogeneous spatial data archiving method based on self-adaptive metadata template
CN113792026A (en) Deployment method and device of database script and computer readable storage medium
CN112988722A (en) Hive partition table data cleaning method and device and storage medium
CN106682221B (en) Question-answer interaction response method and device and question-answer system
CN113722141B (en) Method and device for determining delay reason of data task, electronic equipment and medium
CN115617487A (en) Container rescheduling method, device, equipment and storage medium
CN110825736A (en) Method, device and system for asynchronously calling data
CN116860776A (en) Label updating method, device, equipment and medium
CN117950985A (en) Storage performance test method of search engine and related equipment
CN114817393A (en) Data extraction and cleaning method and device and storage medium
CN117539928A (en) Method and device for sub-graph matching through calculation engine
CN117971378A (en) Workflow execution method, workflow execution device, electronic equipment and storage medium
CN115480897A (en) Task processing method, device, equipment, storage medium and program product
CN117873497A (en) Automatic CDH deployment method, device and system for big data management platform
CN112905321A (en) Event response type task triggering method and device, electronic equipment and storage medium
CN114328965A (en) Knowledge graph updating method and device and computer equipment
CN113360452A (en) Distributed file generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant