CN117688125A - Index management method, server and server cluster - Google Patents

Index management method, server and server cluster Download PDF

Info

Publication number
CN117688125A
CN117688125A CN202311541076.0A CN202311541076A CN117688125A CN 117688125 A CN117688125 A CN 117688125A CN 202311541076 A CN202311541076 A CN 202311541076A CN 117688125 A CN117688125 A CN 117688125A
Authority
CN
China
Prior art keywords
index
fragments
target
data
target index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311541076.0A
Other languages
Chinese (zh)
Inventor
王召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202311541076.0A priority Critical patent/CN117688125A/en
Publication of CN117688125A publication Critical patent/CN117688125A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the application provides an index management method, a server and a server cluster, which are applied to the technical field of servers. The method comprises the following steps: acquiring statistical data of a history index; calculating the number of fragments according to the statistic data of the historical index and the size of the fragments; creating a target index according to the number of fragments; and storing the data to be written according to the target index. The overall read-write capability of the elastic search cluster is improved, and the resource utilization rate is improved.

Description

Index management method, server and server cluster
Technical Field
The present disclosure relates to the field of servers, and in particular, to an index management method, a server, and a server cluster.
Background
The elastic search is a distributed and RESTful search and data analysis engine and is used for scene services such as information retrieval, log operation and maintenance full observation, data retrieval acceleration and analysis and the like. The cluster of servers that deploy the elastiscearch is called an elastiscearch cluster. The document (data) is stored in the Index (Index) of the elastic search cluster. Each index in turn is made up of one or more shards (shards). Each tile can be considered an index instance that can index documents (data) in an elastic search cluster and process related queries. In general, when a user encounters a performance problem, the reason is typically retrospectively to the coordinating node in the elastic search cluster not creating an index from the appropriate number of slices.
In the related art, a coordinator node in an elastic search cluster typically creates an index according to a fixed number of slices. However, in this way of creating an index, the number of slices in the index is not matched with the size of the index, which further causes resource waste and affects the read-write performance of the elastic search cluster.
Therefore, there is a need for an index management method that can create an index based on a suitable number of slices to improve the overall read-write capability of the elastic search cluster and improve the resource utilization.
Disclosure of Invention
The embodiment of the application provides an index management method, a server and a server cluster, which can create an index based on proper number of fragments so as to improve the overall read-write capability of the elastic search cluster and improve the resource utilization rate.
In a first aspect, an embodiment of the present application provides an index management method, including:
acquiring statistical data of a history index;
calculating to obtain the number of fragments according to the statistic data of the historical index and the size of the fragments;
creating a target index according to the number of fragments;
and storing the data to be written according to the target index.
The beneficial effects of this embodiment are: according to the embodiment of the application, the number of the fragments can be calculated according to the statistical data of the historical index and the size of the fragments, and the target index is created according to the number of the fragments so as to store the data to be written according to the target index. By the method, the on-demand slicing can be realized, so that the index created based on the slicing quantity can meet the requirement of the index size, the integral read-write capability of the elastic search cluster is improved, and the resource utilization rate of the elastic search cluster is improved.
In one implementation, creating the target index based on the number of tiles includes:
judging whether the number of fragments is larger than a preset threshold value or not;
if not, creating a target index according to the number of fragments.
The beneficial effect of this implementation mode is: and when the number of the fragments is less than or equal to the preset threshold value, directly creating a target index according to the number of the fragments. By the method, the number of the fragments can be controlled within a numerical range smaller than or equal to a preset threshold. On the one hand, the situation that the query speed is low when the index created based on the number of fragments is queried due to the fact that the number of fragments is too large can be avoided; on the other hand, the problem of low utilization rate of the elastic search cluster resources caused by overlarge number of fragments can be avoided.
In one implementation, the method further comprises:
if yes, taking the preset threshold value as the number of fragments, and creating a target index according to the number of fragments.
The beneficial effect of this implementation mode is: when the number of fragments is judged to be larger than the preset threshold, the preset threshold is taken as the number of fragments, and a target index is created according to the number of fragments (the preset threshold). By the method, the number of the fragments can be controlled within a numerical range smaller than or equal to a preset threshold. On the one hand, the situation that the query speed is low when the index created based on the number of fragments is queried due to the fact that the number of fragments is too large can be avoided; on the other hand, the problem of low utilization rate of the elastic search cluster resources caused by overlarge number of fragments can be avoided.
In one implementation, the statistics of the history index include an index size of the history index for the first N time periods;
according to the statistics data of the historical index and the size of the fragments, calculating to obtain the number of fragments, including:
calculating to obtain the average index size according to the index sizes of the historical indexes of the first N time periods;
calculating to obtain the number of fragments according to the average index size, the fragment size and the floating factor acquired in advance;
wherein N is a positive integer greater than or equal to 1.
The beneficial effect of this implementation mode is: the number of the fragments can be calculated based on the average index size of the historical indexes of the previous N time periods, the fragment size and the floating factor acquired in advance, and the index with the proper number of the fragments can be dynamically created according to the requirement.
In one implementation, the statistics of the historical index further includes a number of slices of the historical index for a particular time period corresponding to the time period to which the target index belongs; the time period to which the target index belongs and the specific time period are the time periods of the same time node in different time phases; the method further comprises the steps of:
calculating to obtain a floating factor according to a pre-obtained service ratio and the number of fragments of a history index in a specific time period; the traffic ratio is the ratio of the expected traffic of the time period to which the target index belongs to the historical traffic of the specific time period.
The beneficial effect of this implementation mode is: the embodiment of the application can calculate the floating factor based on the pre-acquired business ratio and the number of the fragments of the historical index of the specific time period so as to calculate the number of the fragments based on the floating factor. By the method, the number of the fragments can be reasonably increased to cope with the scene of high traffic such as 'large promotion', and the situation that each fragment of the index becomes a hot spot fragment due to the fact that the number of fragments is set too low is avoided.
In one implementation, calculating the average index size from the index sizes of the historical indexes of the first N time periods includes:
and calculating to obtain an average index size according to the weight coefficient of each of the first N time periods and the index size of the history index of each of the first N time periods.
The beneficial effect of this implementation mode is: by setting the weight coefficient for each of the first N time periods, the calculated average index size is closer to the expected index size of the target index, and therefore the storage requirement matching degree of the number of fragments and the expected index size of the target index is improved.
In one implementation, the statistics of the historical index include statistics of historical indexes of multiple data types;
according to the statistics data of the historical index and the size of the fragments, calculating to obtain the number of fragments, including:
calculating the number of fragments of the target index of the plurality of data types according to the statistical data of the historical indexes of the plurality of data types and the sizes of the fragments;
creating a target index according to the number of fragments, including:
and creating the target indexes of the multiple data types according to the number of fragments of the target indexes of the multiple data types.
The beneficial effect of this implementation mode is: according to the method and the device for the data type target index, the number of fragments of the target index of the plurality of data types can be calculated, and the target index of each data type is created according to the number of fragments of the target index of each data type. By the method, when the elastic search cluster stores the data to be written, the data to be written can be written into the matched target index, and the ordered storage of the data to be written is realized.
In one implementation, the method further comprises:
setting an index name for a target index; wherein the index name includes an index alias and time period information.
The beneficial effect of this implementation mode is: the embodiment of the application can set the index name for the target index when the target index is created.
In one implementation, storing data to be written according to a target index includes:
acquiring a write-in request; the writing request comprises an index name of the target index and data to be written;
and storing the data to be written according to the target index according to the index name of the target index.
The beneficial effect of this implementation mode is: after the data to be written is obtained, the data to be written can be stored into the target index based on the index name of the target index. By the method, the ordered storage of the data to be written is realized.
In one implementation, the method further comprises:
acquiring a query request; wherein the query request includes an index alias of the target index;
and carrying out query processing on the target index according to the index alias of the target index.
The beneficial effect of this implementation mode is: according to the method and the device for searching the target index, the target index can be quickly searched based on the index alias, and the query speed is improved.
In one implementation, the method further comprises:
acquiring a deletion request; wherein the delete request includes an index name of the target index;
And deleting the target index according to the index name of the target index.
The beneficial effect of this implementation mode is: according to the method and the device for deleting the data in the target index, the target index can be deleted rapidly based on the index name of the target index, so that the data in the target index can be deleted, and the speed of deleting the data in the index is improved.
In a second aspect, an embodiment of the present application provides an index management apparatus, including:
the acquisition module is used for acquiring statistical data of the history index;
the processing module is used for calculating the number of fragments according to the statistic data of the historical index and the size of the fragments;
the processing module is also used for creating a target index according to the number of fragments;
and the processing module is also used for storing the data to be written according to the target index.
The index management device provided in the embodiment of the present application may execute the technical solution in the embodiment of the method, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module is specifically configured to:
judging whether the number of fragments is larger than a preset threshold value or not;
if not, creating a target index according to the number of fragments.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module is further configured to:
if yes, taking the preset threshold value as the number of fragments, and creating a target index according to the number of fragments.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the statistics of the history index include an index size of the history index for the first N time periods;
the processing module is specifically used for:
calculating to obtain the average index size according to the index sizes of the historical indexes of the first N time periods;
calculating to obtain the number of fragments according to the average index size, the fragment size and the floating factor acquired in advance;
wherein N is a positive integer greater than or equal to 1.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the statistics of the historical index further includes a number of slices of the historical index for a particular time period corresponding to the time period to which the target index belongs; the time period to which the target index belongs and the specific time period are the time periods of the same time node in different time phases; the processing module is further used for:
Calculating to obtain a floating factor according to a pre-obtained service ratio and the number of fragments of a history index in a specific time period; the traffic ratio is the ratio of the expected traffic of the time period to which the target index belongs to the historical traffic of the specific time period.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module is specifically configured to:
and calculating to obtain an average index size according to the weight coefficient of each of the first N time periods and the index size of the history index of each of the first N time periods.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the statistics of the historical index include statistics of historical indexes of multiple data types;
the processing module is specifically used for:
calculating the number of fragments of the index of the plurality of data types according to the statistical data of the historical index of the plurality of data types and the size of the fragments;
And creating target indexes of the multiple data types according to the number of fragments of the indexes of the multiple data types.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module is further configured to:
setting an index name for a target index; wherein the index name includes an index alias and time period information.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation of the method, in one implementation,
the acquisition module is also used for acquiring the writing request; the writing request comprises an index name of the target index and data to be written;
and the processing module is also used for storing the data to be written in according to the target index according to the index name of the target index.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation of the method, in one implementation,
the acquisition module is also used for acquiring the query request; wherein the query request includes an index alias of the target index;
And the processing module is also used for inquiring the target index according to the index alias of the target index.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation of the method, in one implementation,
the acquisition module is also used for acquiring the deletion request; wherein the delete request includes an index name of the target index;
and the processing module is also used for deleting the target index according to the index name of the target index.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In a third aspect, an embodiment of the present application provides a server, including:
a processor, a memory communicatively coupled to the processor;
the memory is used for storing computer execution instructions;
the processor is configured to execute the computer-executable instructions stored in the memory to implement the index management method of the first aspect.
The server provided in the embodiment of the present application may execute the technical solution in the embodiment of the method, and the beneficial effects thereof are similar, and are not described herein again.
In a fourth aspect, an embodiment of the present application provides a server cluster, including: the index management method according to the first aspect includes a coordination node and a plurality of data nodes, the coordination node being connected to the plurality of data nodes, the coordination node being configured to perform the index management method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the index management method of the first aspect.
The technical solution in the foregoing method embodiment may be executed by the computer readable storage medium provided in the embodiment of the present application, and the beneficial effects are similar, and are not repeated here.
In a sixth aspect, embodiments of the present application provide a computer program product comprising a computer program for implementing the index management method of the first aspect when the computer program is executed by a processor.
The computer program product provided in the embodiments of the present application may execute the technical solutions in the embodiments of the method, and the beneficial effects are similar, and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a architecture diagram of an elastic search cluster according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a data writing process based on an elastic search cluster according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a data retrieval process based on an elastic search cluster according to an embodiment of the present application;
FIG. 4 is a system architecture diagram of an index management method according to an embodiment of the present application;
FIG. 5 is a flowchart of a first embodiment of an index management method according to an embodiment of the present application;
FIG. 6 is a schematic view of a scenario for creating an index according to an embodiment of the present application;
FIG. 7 is a flowchart of index management according to an embodiment of the present application;
fig. 8 is a schematic flow chart of a second embodiment of an index management method provided in the embodiment of the present application;
fig. 9 is a schematic flow chart of a third embodiment of an index management method provided in the embodiment of the present application;
fig. 10 is a schematic structural diagram of an index management device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which a person of ordinary skill in the art would have, based on the embodiments in this application, come within the scope of protection of this application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, an elastiscearch will be described.
With the development of modern society technology, full text searching has become one of the most common demands in today's IT systems. The bottom layer is based on an elastic search encapsulated by an open source item Lucene (full text search engine), has the capabilities of distributed, expandability and real-time searching and data analysis, provides an operation interface of a REST API, and gives the data the capabilities of searching, analyzing and exploring. The elastomer search may be combined with filebean and Kibana into an EFK software stack, where filebean is responsible for collecting data and processing data (rich data, data conversion, etc.); kibana is responsible for data display, analysis, management, supervision and application; the elastiscearch is responsible for retrieving and analyzing data.
The elastomer search can be regarded as a document-type database, and the index (index) is a container of documents (documents), which is a collection of documents of a type. For example, an index may be a collection of documents of the user data type, or a collection of documents of the log data type. To achieve distributed storage and processing capabilities, the index may also be divided into multiple shards (shards). Each tile is a separate index that can be used to store documents.
The cluster of servers that deploy the elastiscearch is called an elastiscearch cluster. An elastic search cluster (server cluster) includes a plurality of nodes, and the plurality of nodes may include a coordination node (or management node) and a plurality of data nodes (or storage nodes) according to different functions, where the coordination node is connected to the plurality of data nodes, and each node may be a server node, a physical machine, a virtual machine, a virtualization container, or the like.
Fig. 1 is a schematic diagram of an elastic search cluster according to an embodiment of the present application. As shown in fig. 1, the elastic search cluster includes three data nodes, esnode1, esnode2, and Esnode3, respectively. Fig. 1 shows two indices, index 1 and index 2, respectively. Wherein index 1 includes three slices, slice P0, slice P1, and slice P2. Index 2 includes three slices, slice P3, slice P4, and slice P5. Wherein, the slice P0 is located at Esnode1, the slice P1 is located at Esnode2, and the slice P2 is located at Esnode1. Fragment P3 is located at Esnode1, fragment P4 is located at Esnode2, and fragment P5 is located at Esnode1.
Fig. 2 is a schematic diagram of a data writing process based on an elastic search cluster according to an embodiment of the present application. The data writing process will be described below with reference to fig. 2. The data writing process is a process of writing data (document) into the index. Wherein a piece of data is a document.
The client (client) sends a write request to the coordinator node (coordinator). Wherein the write request comprises the data (document) to be written.
The coordination node determines a corresponding target index according to the metadata information (index name) of the document, calculates a route (route) according to the metadata information of the document (target fragment in the target index), and sends the document to a data node corresponding to the target fragment.
After the data node (data node) acquires the document, the data type of the document can be analyzed (parse) first to judge whether word segmentation is needed (the document is split according to a certain rule). If the data type of the document is determined to be text type and word segmentation is needed, word segmentation is carried out to obtain a plurality of terms, an inverted Index is created (or updated) (namely, the terms are used as keys, the identification of the document is used as a value, so that the document can be quickly searched based on the terms later), and the document and the inverted Index are subjected to Buffer processing (stored in an Index Buffer). At regular intervals, the data node performs a refresh (flush) process (saving to a segment of memory) and a commit (commit) process (merging and pushing a plurality of different segments to disk) on the document and the inverted index. If the data type of the document is judged to be the keyword type, word segmentation is not carried out, the document is directly subjected to caching, refreshing and submitting.
In addition, in one implementation, in the case of storing the document in a duplicate redundancy manner, the data node may further duplicate and send the document to a duplicate node (data node) corresponding to the data node, so that the duplicate node performs storage processing on the document. It should be noted that the replica node is also a data node in the elastic search cluster.
Fig. 3 is a schematic diagram of a data retrieval process based on an elastic search cluster according to an embodiment of the present application. The data retrieval process is described below with reference to fig. 3.
As shown in fig. 3, the client may send a retrieval request (including the target term) to the coordinating node. The coordination node determines the target fragments corresponding to the target terms and the target data nodes corresponding to the target fragments, and performs a scatter (scatter) process to distribute the search request to the target data nodes corresponding to the target fragments. And respectively executing a search process by the data nodes corresponding to the target fragments to acquire a search result (a document corresponding to the term). The coordinating node performs a gather (gather) process to obtain the retrieval results of the respective target data nodes. The coordination node performs a merge process to merge the search results and returns the merged search results to the client.
The reason why the index needs to be created based on the appropriate number of fragments is explained below.
The mapping of the index and the information about the state are stored in the cluster state. This information is stored in memory for quick access. However, if the number of slices is too large, this may result in too large a cluster state, especially if the mapping is large. Based on all updates needing to be done through a single thread, an excessive number of slices can result in slow updates. If the number of slices is too small, each node is caused to be a hot spot, so that the writing speed of the whole elastic search cluster is influenced, and the recovery of the elastic search cluster from faults is adversely affected. In summary, an index needs to be created based on the appropriate number of slices.
In addition, in an elastic search cluster, each search request is executed in a single thread manner on a single slice. In the related art, a plurality of fragments can be processed at the same time, so as to improve the retrieval efficiency. However, the tasks need to be put into a queue, ordered, etc., and the number of fragments is too large, so that concurrent searching is more, and the searching speed may be reduced. In summary, an index needs to be created based on the appropriate number of slices.
In the related art, an index is generally created in the following two ways:
mode one: the number of fragments of the index can be configured in the template, when the name of the created index accords with the regular expression defined by the template, the number of fragments of the index newly created is the number of fragments of the index configured in the template, and the coordination node of the elastic search cluster (server cluster) creates the index based on the number of fragments of the index configured in the template. However, the number of the fragments of the index is the same no matter how large the index is, that is, there may be a situation that the number of the fragments of the index is not matched with the size of the index, so that resource waste is caused, and the read-write performance of the elastic search cluster is further affected.
Mode two: the coordinating node of the elastic search cluster creates an index based on the manually specified number of slices. However, in this way of creating the index, the number of fragments of the index that is manually specified based on personal experience is required, so that there may be a situation that the number of fragments of the index does not match the size of the index, which causes waste of resources, and further affects the read-write performance of the elastic search cluster.
The embodiment of the application provides an index management method, which can bring the statistical data of historical indexes and the sizes of fragments into the process of calculating the number of fragments, and determine the proper number of fragments for each target index, so that the target index is created according to the number of fragments, the integral read-write capability of an elastic search cluster is improved, and the utilization rate of resources is improved.
The index management method according to the embodiment of the present application is described in detail below.
Fig. 4 is a system architecture diagram of an index management method according to an embodiment of the present application. As shown in fig. 4, the system includes a client and an elastic search cluster. Wherein the elastiscearch cluster comprises a coordinating node and a plurality of data nodes. Illustratively, fig. 4 shows three data nodes, esnode1, esnode2, and Esnode3, respectively.
The following describes the technical scheme of the present application in detail through specific embodiments. It should be noted that the following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 5 is a flowchart of an embodiment one of an index management method provided in the embodiment of the present application. Referring to fig. 5, the method specifically includes the steps of:
s501: and acquiring statistical data of the historical index.
In this embodiment, the coordinating node in the elastic search cluster may acquire statistics of the history index.
Wherein the statistics of the history index may include an index size of the history index for the first N time periods. The statistics of the historical index may also include the number of slices of the historical index for a particular period corresponding to the time period to which the target index (index to be created) belongs. It should be noted that, the time period to which the target index belongs and the specific time period are time periods of the same time node in different time phases, for example, the time period to which the target index belongs is 2023-11-11, and the specific time period is 2022-11-11. For another example, the target index may have a time period 2023-06-18 and the specific time period 2022-06-18.
After the coordination node obtains the statistics data of the history index, the statistics data of the history index may be stored in a database.
S502: and calculating the number of fragments according to the statistic data of the historical index and the size of the fragments.
In this embodiment, the coordinating node may calculate the number of slices according to the statistics data of the history index and the slice size.
Specifically, the coordinating node may calculate the average index size from the index sizes of the history indexes of the first N time periods. Wherein N is a positive integer greater than or equal to 1. The first N time periods refer to the first N time periods of the time period to which the target index belongs. For example, the target index may have a time period 2023-11-11, and the first N time periods may be 2023-11-10, 2023-11-09, and 2023-11-08.
In one implementation, the coordinating node may perform arithmetic averaging on the index sizes of the history indexes of the first N time periods, and calculate an average index size.
In one implementation, the coordinating node may calculate the average index size based on the weight coefficient for each of the first N time periods and the index size of the historical index for each of the first N time periods. It should be noted that, the initial value of the weight coefficient of each time period may be a preset weight coefficient, or may be a weight coefficient obtained by continuously adjusting the preset weight coefficient by the coordination node according to a logistic regression algorithm.
After calculating the average index size, the coordination node may calculate the number of slices based on the following formula:
wherein, shards represents the number of fragments; avg_index_size represents an average index size; size_per_card represents the tile size; factor represents a pre-acquired floating factor.
The procedure for the coordinator node to acquire the floating factor is explained below.
In one implementation, the coordination node may calculate the floating factor according to the number of fragments of the index in the specific time period corresponding to the time period to which the target index belongs, and the service ratio obtained in advance. The traffic ratio is the ratio of the expected traffic of the time period to which the target index belongs to the historical traffic of the specific time period. For example, the expected traffic of the time period to which the target index belongs is 6000, the historical traffic of the specific time period is 3000, and the traffic ratio is 2.
In one implementation, the coordinating node may determine whether the time period to which the target index belongs satisfies the temporary extension condition. For example, the coordinating node may determine that the time period to which the target index belongs satisfies the temporary extension condition when it is identified that the time period to which the target index belongs is a large promotion time period. For example, the large acceleration time period may be 11-11.
And when judging that the time period of the target index does not meet the temporary expansion condition, the coordination node determines that the floating factor is 1.
When judging that the time period of the target index meets the temporary expansion condition, the coordination node calculates a floating factor according to the number of fragments of the index in the specific time period corresponding to the time period of the target index and the service ratio obtained in advance.
The process by which the coordinator node calculates the number of fragments will be described below by way of a specific example.
Taking one day as an example of a time period, for the history indexes of the first two days, namely, index 1 (index name is demo 1.2023-06-05) and index 2 (index name is demo 1.2023-06-06), the coordinating node can obtain that the index size of the index 1 is 58GB, and the index size of the index 2 is 70GB. The coordinator node may calculate an average index size of 64GB. The coordination node calculates the number of fragments to be 2.56 based on the fragment size of 25GB and the floating factor of 1, and takes a positive integer based on the fragment number, so that the coordination node can determine the number of fragments to be 3.
S503: and creating a target index according to the number of fragments.
In this embodiment, the coordinating node may create the target index according to the number of fragments. In one implementation, the target index creates for the next time coordination node for the current time period, the next time period
In addition, the coordinating node may also set an index name for the target index when creating the target index. The index name may include an index alias (alias) and time period information, among others. Illustratively, the coordinator node may set an index name of demo1.2023-06-07. Wherein the index alias is demo1 and the time period information is 2023-06-07.
In the case where the query request includes an index alias of the target index, the coordinator node may perform query processing on the target index according to the index alias of the target index in the query request when acquiring the query request.
By the method for setting the index alias, the speed of searching the data in the index can be improved.
When the deletion request includes the index name of the target index, the coordinator node may perform deletion processing on the target index according to the index name of the target index in the deletion request when acquiring the deletion request.
For example, in the case where the delete request includes an index alias, desmo 1, the coordinator node may delete index 1 (index name desmo 1.2023-06-05), index 2 (index name desmo 1.2023-06-06), and index 3 (index name desmo 1.2023-06-07).
By the method, the whole index can be deleted rapidly and efficiently, and compared with the data in the index deleted in the related art, the method has the advantages that a plurality of segments are required to be combined, so that when the disk input and output operation is consumed, the index is deleted directly, the operation is light, and the deletion speed is improved.
The process of creating an index by the coordinator node is described below by way of specific example.
Fig. 6 is a schematic view of a scenario for creating an index according to an embodiment of the present application. As shown in FIG. 6, there are two indexes before creation, index 1 (index name demo 1.2023-06-05) and index 2 (index name demo 1.2023-06-06), respectively. Index 1 includes slices P0 and P1, and index 2 includes slices P2, P3, and P4. Wherein, slice P0 is located at Esnode1, slice P1 is located at Esnode2, slice P2 is located at Esnode1, slice P3 is located at Esnode2, and slice P4 is located at Esnode3.
The coordinator node may create index 3 (index name demo 1.2023-06-07) based on the number of slices being 3. Wherein index 3 includes slices P5, P6, and P7. Fragment P5 is located at Esnode1, fragment P6 is located at Esnode2, and fragment P7 is located at Esnode3.
S504: and storing the data to be written according to the target index.
In this embodiment, after creating the target index, the coordinating node may send the index name of the target index to the client.
The client may send a write request to the coordinating node of the elastic search cluster when the data to be written needs to be written to the target index. The write request includes an index name of the target index and data to be written.
The coordination node may store the data to be written according to the target index according to the index name of the target index.
Specifically, after determining the target index, the coordination node may calculate a route (a target slice in the target index) according to metadata of the data to be written, and send the data to be written to a data node corresponding to the target slice according to the route, so that the data node stores the data to be written.
The index management process is described below from a software perspective. Wherein both the index service and the elastic search are deployed on the coordination node. The client program is deployed on the client.
Illustratively, fig. 7 is a flowchart of index management provided in an embodiment of the present application.
As shown in fig. 7, the index service acquires statistics of the history index by calling the index information interface. The index information interface is an elastic search interface, and is an interface for providing services to the outside by the elastic search.
The indexing service calculates the number of fragments according to the statistics data of the historical index and the fragment size.
The indexing service determines whether the number of slices is greater than a preset threshold. If yes, the index service takes a preset threshold value as the number of fragments, and sends an index creation request comprising the number of fragments to an elastic search; if not, the index service sends an index creation request including the number of fragments to the elastic search.
The elastomer search creates a target index based on the number of fragments in the index creation request.
The client program determines an index name for storing a target index of data to be written.
The client program sends a write request to the elastic search. The write request includes an index name of the target index and data to be written.
And storing the data to be written according to the target index by the elastic search according to the index name of the target index.
The beneficial effects of this embodiment are: the coordination node in the elastic search cluster can acquire the statistical data of the historical index, calculate the number of fragments according to the statistical data of the historical index and the size of the fragments, and create the target index according to the number of fragments. The coordinating node in the elastic search cluster may store the data to be written according to the target index. By the method, the on-demand slicing can be realized, so that the target index created based on the slicing quantity can meet the requirement of the index size, the overall read-write capability of the elastic search cluster is improved, and the resource utilization rate of the elastic search cluster is improved.
Fig. 8 is a flowchart of a second embodiment of an index management method according to the embodiment of the present application. Referring to fig. 8, the method specifically includes the steps of:
s801: and acquiring statistical data of the historical index.
In this embodiment, the coordinating node of the elastic search cluster acquires statistics of the history index. The specific implementation process is the same as S501, and will not be described here again.
S802: and calculating the number of fragments according to the statistic data of the historical index and the size of the fragments.
In this embodiment, the coordinating node may calculate the number of slices according to the statistics data of the history index and the slice size. The specific implementation process is the same as S502, and will not be described here again.
S803: judging whether the number of the fragments is larger than a preset threshold value.
In this embodiment, the index service of the coordination node may determine whether the number of slices is greater than a preset threshold after calculating the number of slices. For example, the preset threshold is 3.
If yes, S804 is performed.
If not, then S806 is performed.
In one implementation, the coordinating node may determine the preset threshold according to the number of data nodes in the elastic search cluster. For example, in the case where the number of data nodes is 3, the preset threshold is 3.
In one implementation, the coordinating node may determine the preset threshold according to the number of data nodes in the elastic search cluster and the load capacity of each data node. For example, in the case where the number of data nodes is 3, the load capacity of Esnode1 is 1, the load capacity of Esnode2 is 1, and the load capacity of Esnode3 is 4, the coordinating node may determine that the preset threshold is 6.
S804: and taking the preset threshold value as the number of fragments, and creating a target index according to the number of fragments.
In this embodiment, when it is determined that the number of slices calculated from statistics based on the history index is greater than the preset threshold, the coordination node uses the preset threshold as the number of slices.
The coordinating node may create a target index based on the number of fragments. The number of slices is a preset threshold.
S805: and storing the data to be written according to the target index.
In this embodiment, the coordinating node may store the data to be written according to the target index. The specific implementation process is the same as S504, and will not be described here again.
S806: and creating a target index according to the number of fragments.
In this embodiment, when determining that the number of slices calculated based on statistics of the history index is less than or equal to a preset threshold, the coordination node creates a target index according to the number of slices.
S807: and storing the data to be written according to the target index.
In this embodiment, the coordinating node may store the data to be written according to the target index. The specific implementation process is the same as S504, and will not be described here again.
The beneficial effects of this embodiment are: when the coordination node of the elastic search cluster determines that the number of fragments (the number of fragments calculated based on the statistics of the historical index) is larger than a preset threshold, the preset threshold is used as the number of fragments, so that the target index is created. When the coordination node of the elastic search cluster judges that the number of fragments is smaller than or equal to a preset threshold value, the coordination node directly creates a target index according to the number of fragments (the number of fragments calculated based on statistics data of the historical index). By the method, the number of the fragments can be controlled within a numerical range smaller than or equal to a preset threshold. On the one hand, the situation that the query speed is low when the index created based on the number of fragments is queried due to the fact that the number of fragments is too large can be avoided; on the other hand, the problem of low utilization rate of the elastic search cluster resources caused by overlarge number of fragments can be avoided.
Fig. 9 is a schematic flow chart of a third embodiment of an index management method provided in the embodiment of the present application. Referring to fig. 9, the method specifically includes the steps of:
S901: statistics of historical indices of multiple data types are obtained.
In this embodiment, when the coordinating node of the elastic search cluster acquires the statistical data of the history index, the coordinating node acquires the statistical data of the history index of a plurality of data types.
The plurality of data types may include user data types, log data types, and the like, among others.
S902: and calculating the number of fragments of the target index of the plurality of data types according to the statistical data of the historical indexes of the plurality of data types and the sizes of the fragments.
In this embodiment, the coordinating node may calculate the number of slices of the target index of the plurality of data types according to the statistics data of the history indexes of the plurality of data types and the sizes of the slices.
Specifically, the coordinating node may calculate the number of slices of the target index for each data type based on the statistics of the historical index for each data type, and the slice size.
S903: and creating the target indexes of the multiple data types according to the number of fragments of the target indexes of the multiple data types.
In this embodiment, the coordinating node may create the target indexes of the multiple data types according to the number of fragments of the target indexes of the multiple data types.
Specifically, the coordinating node may create the target index for each data type based on the number of fragments of the target index for each data type.
In addition, the coordinating node may also set an index name for the target index of each data type when creating the target index of each data type.
S904: and storing the data to be written according to the target index.
In this embodiment, after creating the target index of each data type, the coordinating node may send the index name of the target index of each data type to the client.
The client may send a write request to the coordinating node of the elastic search cluster when the data to be written needs to be written into a target index corresponding to a data type of the data to be written. The write request includes an index name of the target index and data to be written.
The coordination node may store the data to be written according to the target index according to the index name of the target index.
Specifically, after determining the target index, the coordination node may calculate a route (a target slice in the target index) according to metadata of the data to be written, and send the data to be written to a data node corresponding to the target slice according to the route, so that the data node stores the data to be written.
The beneficial effects of this embodiment are: the coordination node of the elastic search cluster can calculate the number of fragments of the target indexes of multiple data types, create the target index of each data type according to the number of fragments of the target index of each data type, and store the data to be written according to the target index. By the method, when the coordination node of the elastic search cluster stores the data to be written, the data to be written is written into the matched target index, so that the ordered storage of the data to be written is realized.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Fig. 10 is a schematic structural diagram of an index management device according to an embodiment of the present application. As shown in fig. 10, the index management device 100 includes an acquisition module 110 and a processing module 120. The acquiring module 110 is configured to acquire statistical data of the history index; a processing module 120, configured to calculate the number of slices according to the statistics data of the history index and the size of the slices; the processing module 120 is further configured to create a target index according to the number of fragments; the processing module 120 is further configured to store the data to be written according to the target index.
The index management device provided in the embodiment of the present application may execute the technical solution in the embodiment of the method, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module 120 is specifically configured to: judging whether the number of fragments is larger than a preset threshold value or not; if not, creating a target index according to the number of fragments.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module 120 is further configured to: if yes, taking the preset threshold value as the number of fragments, and creating a target index according to the number of fragments.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the statistics of the history index include an index size of the history index for the first N time periods; the processing module 120 is specifically configured to: calculating an average index size according to the index sizes of the historical indexes of the first N time periods; the index service calculates the number of fragments according to the average index size, the fragment size and a floating factor acquired in advance; wherein N is a positive integer greater than or equal to 1.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the statistics of the historical index further includes a number of slices of the historical index for a particular time period corresponding to the time period to which the target index belongs; the time period to which the target index belongs and the specific time period are the time periods of the same time node in different time phases; the processing module 120 is further configured to: calculating a floating factor according to a pre-acquired service ratio and the number of fragments of a history index of a specific time period; the traffic ratio is the ratio of the expected traffic of the time period to which the target index belongs to the historical traffic of the specific time period.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module 120 is specifically configured to: the average index size is calculated from the weight coefficient of each of the first N time periods and the index size of the history index of each of the first N time periods.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the statistics of the historical index include statistics of historical indexes of multiple data types; the processing module 120 is specifically configured to: calculating the number of fragments of the index of the plurality of data types according to the statistical data of the historical index of the plurality of data types and the size of fragments; and creating target indexes of the multiple data types according to the number of fragments of the indexes of the multiple data types.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the processing module 120 is further configured to: setting an index name for a target index; wherein the index name includes an index alias and time period information.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the obtaining module 110 is further configured to obtain a write request; the writing request comprises an index name of the target index and data to be written; the processing module 120 is further configured to store the data to be written according to the target index according to the index name of the target index.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the obtaining module 110 is further configured to obtain a query request; wherein the query request includes an index alias of the target index; the processing module 120 is further configured to query the target index according to the index alias of the target index.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
In one implementation, the obtaining module 110 is further configured to obtain a deletion request; wherein the delete request includes an index name of the target index; the processing module 120 is further configured to delete the target index according to the index name of the target index.
The index management device provided in this implementation manner may execute the technical solution in the foregoing method embodiment, and has similar beneficial effects, and will not be described in detail herein.
Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 11, the server 200 includes: a processor 210 and a memory 220; wherein the processor 210 is communicatively coupled to the memory 220, the memory 220 for storing computer-executable instructions; the processor 210 is configured to execute the technical solutions in the foregoing method embodiments via computer-executable instructions stored in the execution memory 220. Servers such as rack servers, blade servers, general purpose servers, GPU servers, AI servers, or DPU servers, etc.
Alternatively, the memory 220 may be separate or integrated with the processor 210. Optionally, when the memory 220 is a device separate from the processor 210, the server 200 may further include: and a bus for connecting the devices.
The processor is configured to execute the technical scheme in the foregoing method embodiment, and its implementation principle and technical effects are similar, and are not described herein again.
The embodiment of the application also provides a server cluster, which comprises a coordination node and a plurality of data nodes. The coordination node is connected with a plurality of data nodes, and the coordination node is used for executing the technical scheme provided by the embodiment of the method.
The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and the computer executable instructions are used for realizing the technical scheme provided by the embodiment of the method when being executed by a processor.
The embodiment of the application also provides a computer program product, which comprises a computer program, and the computer program is used for realizing the technical scheme provided by the embodiment of the method when being executed by a processor.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features can be replaced equivalently; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. An index management method, the method comprising:
acquiring statistical data of a history index;
calculating to obtain the number of fragments according to the statistic data of the historical index and the size of the fragments;
creating a target index according to the number of fragments;
and storing the data to be written according to the target index.
2. The index management method according to claim 1, wherein the creating a target index from the number of slices comprises:
judging whether the number of the fragments is larger than a preset threshold value or not;
if not, creating the target index according to the number of fragments;
if yes, the preset threshold is used as the number of fragments, and the target index is created according to the number of fragments.
3. The index management method according to claim 1, wherein the statistical data of the history index includes an index size of the history index for the first N time periods;
the calculating to obtain the number of fragments according to the statistics data and the fragment size of the history index includes:
calculating to obtain an average index size according to the index sizes of the historical indexes of the first N time periods;
calculating the number of fragments according to the average index size, the fragment size and the floating factor;
wherein N is a positive integer greater than or equal to 1.
4. The index management method according to claim 3, wherein the statistics of the history index further includes a number of slices of the history index for a specific time period corresponding to a time period to which the target index belongs; the time period to which the target index belongs and the specific time period are the time periods of the same time node in different time phases; the method further comprises the steps of:
calculating to obtain the floating factor according to a pre-obtained service ratio and the number of fragments of the historical index of the specific time period; wherein, the service ratio is the ratio of the expected service volume of the time period to which the target index belongs to the historical service volume of the specific time period.
5. The index management method according to claim 3, wherein calculating an average index size according to the index sizes of the history indexes of the first N time periods comprises:
and calculating the average index size according to the weight coefficient of each of the first N time periods and the index size of the history index of each of the first N time periods.
6. The index management method according to any one of claims 1 to 5, wherein the statistical data of the history index includes statistical data of history indexes of a plurality of data types;
according to the statistical data of the history index and the size of the fragments, calculating to obtain the number of fragments, including:
calculating the number of fragments of the target index of the plurality of data types according to the statistical data of the historical index of the plurality of data types and the size of the fragments;
the creating a target index according to the number of fragments comprises the following steps:
and creating the target indexes of the multiple data types according to the number of fragments of the target indexes of the multiple data types.
7. The index management method according to claim 1, further comprising:
Setting an index name for the target index; wherein the index name includes an index alias and time period information.
8. The index management method according to claim 7, wherein storing the data to be written in accordance with the target index comprises:
acquiring a write-in request; the writing request comprises an index name of the target index and the data to be written, and the data to be written is stored according to the target index according to the index name of the target index;
and/or, obtaining a query request; the query request comprises an index alias of the target index, and query processing is carried out on the target index according to the index alias of the target index;
and/or acquiring a deletion request; the deletion request comprises the index name of the target index, and the deletion processing is carried out on the target index according to the index name of the target index.
9. A server, comprising:
a processor, and a memory communicatively coupled to the processor;
the memory is used for storing computer execution instructions;
the processor is configured to execute computer-executable instructions stored in the memory to implement the index management method of any one of claims 1-8.
10. A server cluster, comprising: a coordinating node and a plurality of data nodes, the coordinating node being connected to a plurality of the data nodes, the coordinating node being configured to perform the index management method of any one of claims 1-8.
CN202311541076.0A 2023-11-17 2023-11-17 Index management method, server and server cluster Pending CN117688125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311541076.0A CN117688125A (en) 2023-11-17 2023-11-17 Index management method, server and server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311541076.0A CN117688125A (en) 2023-11-17 2023-11-17 Index management method, server and server cluster

Publications (1)

Publication Number Publication Date
CN117688125A true CN117688125A (en) 2024-03-12

Family

ID=90125446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311541076.0A Pending CN117688125A (en) 2023-11-17 2023-11-17 Index management method, server and server cluster

Country Status (1)

Country Link
CN (1) CN117688125A (en)

Similar Documents

Publication Publication Date Title
US11366859B2 (en) Hierarchical, parallel models for extracting in real time high-value information from data streams and system and method for creation of same
CN102782643B (en) Use the indexed search of Bloom filter
US10552378B2 (en) Dividing a dataset into sub-datasets having a subset of values of an attribute of the dataset
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
EP3379415B1 (en) Managing memory and storage space for a data operation
CN102819586A (en) Uniform Resource Locator (URL) classifying method and equipment based on cache
Sheoran et al. Optimized mapfile based storage of small files in hadoop
US20110179013A1 (en) Search Log Online Analytic Processing
CN111752945A (en) Time sequence database data interaction method and system based on container and hierarchical model
CN108052535B (en) Visual feature parallel rapid matching method and system based on multiprocessor platform
CN113722600A (en) Data query method, device, equipment and product applied to big data
CN108121807B (en) Method for realizing multi-dimensional Index structure OBF-Index in Hadoop environment
CN117688125A (en) Index management method, server and server cluster
CN111309704B (en) Database operation method and database operation system
US9147011B2 (en) Searching method, searching apparatus, and recording medium of searching program
CN113868267A (en) Method for injecting time sequence data, method for inquiring time sequence data and database system
CN111767287A (en) Data import method, device, equipment and computer storage medium
CN106776772B (en) Data retrieval method and device
Liu Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop
US11734282B1 (en) Methods and systems for performing a vectorized delete in a distributed database system
CN113807179B (en) Method and system for judging peer behavior
CN112506877B (en) Data deduplication method, device and system based on deduplication domain and storage equipment
CN111949439B (en) Database-based data file updating method and device
CN111159218B (en) Data processing method, device and readable storage medium
KR101815813B1 (en) Method for indexing spatio-temporal data in cloud services and apparatus using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination