CN118132565B

CN118132565B - Control method and device for data index storage, storage medium and electronic equipment

Info

Publication number: CN118132565B
Application number: CN202410537580.1A
Authority: CN
Inventors: 宋文豪
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Filing date: 2024-04-30
Publication date: 2024-06-28
Anticipated expiration: 2044-04-30

Abstract

The embodiment of the application provides a control method and a device for data index storage, a storage medium and electronic equipment, and relates to the field of big data, wherein the method comprises the following steps: in the current time period, acquiring a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database; under the condition that the index fragments allocated for the current time period are determined to be added according to the acquired set of load parameters, a second index fragment group is allocated for the current time period, wherein the first index fragment group and the second index fragment group are used for storing the data indexes corresponding to the current time period.

Description

Control method and device for data index storage, storage medium and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of big data, in particular to a control method and device for data index storage, a storage medium and electronic equipment.

Background

When the multi-condition combined query is related to the distributed database (for example, the HBase database), the multi-condition combined query can be realized by a secondary index mode combined with the index node cluster, wherein the secondary index refers to that table data of the distributed database meeting the condition is converted into a data index according to a pre-configured field mapping rule and is written into an index fragment on an index node in the index node cluster.

At present, in the related art, one or more index fragment groups are created in advance according to time granularity (i.e. time period) or service granularity manually, and index service is provided for a data table of a distributed database, but the index fragment groups are created in advance, so that a service party needs to have deep research on the existing data quantity, service load and the like, and can reasonably pre-create the index fragment groups, so that the control method is extremely high in requirement on personnel level, inflexible and insensitive to the condition of rapid increase of flow.

As can be seen, the control method of the data index storage in the related art has a problem of low system performance due to low flexibility of pre-creation of the data index storage.

Disclosure of Invention

The embodiment of the application provides a control method and device for data index storage, a storage medium and electronic equipment, which at least solve the problem of lower system performance caused by lower pre-creation flexibility of data index storage in the control method for data index storage in the related technology.

According to an embodiment of the present application, there is provided a control method of data index storage, including: in a current time period, acquiring a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to the set time period according to the set number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are allocated in advance before the current time period, the first index fragment groups comprise a first number of index fragments, and the group of load parameters comprises a first load parameter used for indicating the load condition of the first index fragment groups; and under the condition that the index slices allocated for the current time period are determined to be increased according to the acquired group of load parameters, a second index slice group is allocated for the current time period, wherein the second index slice group comprises the first number of index slices, and the first index slice group and the second index slice group are both used for storing data indexes corresponding to the current time period.

According to another embodiment of the present application, there is provided a control apparatus for data index storage, including: the system comprises an acquisition unit, a load parameter acquisition unit and a load parameter generation unit, wherein the acquisition unit is used for acquiring a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to a set time period according to a set number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are pre-allocated before the current time period, the first index fragment groups comprise a first number of index fragments, and the group of load parameters comprise a first load parameter used for indicating the load condition of the first index fragment groups; the allocation unit is configured to allocate a second index patch group for the current time period when determining that the index patch allocated for the current time period is to be increased according to the obtained set of load parameters, where the second index patch group includes the first number of index patches, and the first index patch group and the second index patch group are both used to store data indexes corresponding to the current time period.

According to a further aspect of the embodiments of the present application, there is provided a computer readable storage medium comprising a stored program, wherein the program when run performs the steps of any of the method embodiments described above.

According to a further aspect of embodiments of the present application there is provided an electronic device comprising a memory having a computer program stored therein and a processor arranged to perform the steps of any of the method embodiments described above by means of the computer program.

According to a further aspect of embodiments of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

According to the embodiment of the application, a mode of dynamically creating the index fragments based on the load of the index node cluster to store the data indexes is adopted, the data acquisition is carried out on the index node cluster at each acquisition time of the current time period to obtain a group of load parameters, whether new index fragments need to be allocated to current existing index fragments or not is determined based on the load parameters to split the current existing index fragments, the index load is reduced, and as the system can dynamically create the index fragment groups based on the load parameters, various flow change conditions in the production environment can be flexibly dealt with, the index is reasonably created, so that the technical threshold of users can be greatly reduced, the technical effect of improving the system performance is realized, and the problem of lower system performance caused by lower pre-creation flexibility of the data index storage in the related art is solved.

Drawings

Fig. 1 is a block diagram of a hardware configuration of a server apparatus of a control method of data index storage according to an embodiment of the present application.

Fig. 2 is a flow chart of a control method of data index storage according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a variation curve of an index specification according to an embodiment of the present application.

FIG. 4 is a schematic diagram of a control system for data index storage according to an embodiment of the present application.

Fig. 5 is a schematic diagram of an index load aware device workflow according to an embodiment of the application.

Fig. 6 is a schematic diagram of an alternative index acquisition frequency variation according to an embodiment of the present application.

Fig. 7 is a block diagram of an alternative control device for storing data indexes according to an embodiment of the present application.

Fig. 8 is a block diagram of an alternative computer system of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and the claims of the embodiments of the present application and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

The method embodiments provided in the embodiments of the present application may be executed in a server apparatus or similar computing device. Fig. 1 is a block diagram of a hardware configuration of a server apparatus of a control method of data index storage according to an embodiment of the present application. As shown in fig. 1, the server device may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like processing means) and a memory 104 for storing data, wherein the server device may further include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the architecture shown in fig. 1 is merely illustrative and is not intended to limit the architecture of the server apparatus described above. For example, the server device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a control method of data index storage in an embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located with respect to the processor 102, which may be connected to the server device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a server device. In one example, the transmission device 106 includes a NIC (Network Interface Controller, network adapter) that can communicate with other network devices via a base station to communicate with the internet. In one example, the transmission device 106 may be an RF (Radio Frequency) module for communicating with the internet wirelessly.

In this embodiment, a control method for data index storage is provided, and fig. 2 is a schematic flow chart of a control method for data index storage according to an embodiment of the present application, as shown in fig. 2, the flow chart includes the following steps:

step S202, a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency are obtained in a current time period, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to the set time period according to the set number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are pre-allocated before the current time period, the first index fragment groups comprise a first number of index fragments, and one group of load parameters comprise a first load parameter for indicating the load condition of the first index fragment groups.

The control method for data index storage in this embodiment may be applied to a scenario in which a data index created for service data stored in a data table of a distributed database is stored on an index fragment on an index node in an index node cluster. Indexing is a technique for quickly accessing and retrieving elements in a data structure by assigning a unique key or location number to each element, which allows for quick locating and accessing of desired data without traversing the entire data structure, thereby improving the performance of search and retrieval operations.

For example, in this embodiment, taking the distributed database as an HBase, each data item in the HBase is determined by a row key, a column group, a column qualifier and a timestamp, where a row key is a unique identifier of each row of data in the HBase table, and it orders and stores the data. By establishing the corresponding relation between the inquired column (non-row key column) and the row key, the row key set meeting the condition is obtained through the data index during searching, and then the complete record is obtained in the data table of the HBase database according to the searched row key, so that the data inquiry efficiency can be improved.

However, in the related art, one or more index fragment groups are usually pre-created in advance according to time granularity or service granularity by manpower to provide index service for a database table, but the pre-created index fragment groups in advance require deep research on the existing data volume, service load and the like by a service party, so that the index fragment groups can be pre-created, and the requirement on personnel level is extremely high. Meanwhile, if the system data access is suddenly increased (for example, the service is suddenly newly accessed into 20 bayonets from the original 10 bayonets), the service concurrency request under the condition of sudden increase of the data volume cannot be met because the index template specification (the number of fragments) is fixed and the index fragment group creation in the current time period is completed based on the index template. Here, the index shard group may be divided into a plurality of index shards, and the plurality of index shards may be distributed on different nodes to form a distributed search, thereby improving performance and throughput.

It should be noted that, the data index is stored and indexed in the index shard (shards), and the index shard group may refer to a logical space that groups one or more index shards together, where the number of index shards can only be specified when the index shard group is created, and the index shard group cannot be changed after the index shard group is created.

In order to at least partially solve the above problem, in this embodiment, indexes can be flexibly coping with various traffic change conditions in a production environment by sensing multiple load indexes of an index node cluster in real time and automatically creating indexes based on the load indexes (here, one index corresponds to one index slice group), the indexes can be reasonably created, cluster resources are fully utilized, the technical threshold of users can be greatly reduced, meanwhile, the indexes are adaptively adjusted for traffic surge service, no-sensing traffic diversion is realized, and cluster stability is improved.

The index slices on the index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of the distributed database, the index slices on the index nodes in the index node cluster are divided into index slice groups corresponding to set time periods according to the set number of slices, one time period can correspond to one or more index slice groups, generally, the number of the index slices contained in the index slice groups corresponding to one time period is determined when an index is created, and after the index is created, the number of the index slices contained in the index slice groups corresponding to one time period is fixed.

At least one index shard of the index shard group corresponding to one time period may be distributed on different index nodes, and optionally, at least one index shard of the index shard group corresponding to one time period may be uniformly distributed on each index node of the index node cluster.

In this embodiment, taking the index creation based on the time granularity as an example, in the current time period, a set of load parameters obtained by performing data acquisition on the inode cluster according to the first acquisition frequency may be obtained, where the current time period may be any time granularity, for example, day, month, year, etc., so that the technical threshold of the user can be greatly reduced, and no in-depth study on the existing data volume, service load, etc. is required. The group of load parameters obtained by acquiring the data of the index node cluster according to the first acquisition frequency can be components on index nodes in the index node cluster, or can be components running on nodes outside the index node cluster, and the nodes running the components are connected with the index node cluster in a network or bus mode to realize data interaction.

The set of load parameters may include a first load parameter for indicating a load condition of the first index shard group, e.g., a number of data indexes stored by index shards belonging to the first index shard group, based on which it may be determined whether a new index shard is needed.

In the related art, the index shard group corresponding to one time period is pre-allocated before one time period, and in this embodiment, the index shard group corresponding to the current time period is a first index shard group, where the first index shard group includes a first number of index shards, and the first index shard group is pre-allocated before one time period.

In step S204, under the condition that the index slices allocated to the current time period to be added are determined according to the obtained set of load parameters, a second index slice group is allocated to the current time period, where the second index slice group includes a first number of index slices, and the first index slice group and the second index slice group are both used to store the data index corresponding to the current time period.

Considering that the number of index slices included in one index slice group corresponding to the current time period is fixed, that is, the storage capacity of one index slice group corresponding to the current time period is fixed, that is, the number of data indexes which can be stored in the first index slice group corresponding to the current time period is fixed, in the case that a group of load parameters obtained by data acquisition based on the index node cluster determines that index slices allocated to the current time period are to be added, a new index slice group can be created under the current time period, and the existing index load is reduced.

For example, in the present embodiment, in the case where it is determined that the index patch allocated for the current time period is to be added according to the acquired set of load parameters, the second index patch group is allocated for the current time period.

It should be noted that, when the number of index slices of one index slice group corresponding to one time period is fixed and the first index slice group corresponding to the current time period includes the first number of index slices, the newly added second index slice group should also include the first number of index slices.

And determining whether the newly added index shards are needed to shunt the existing index through a group of load parameters obtained by data acquisition of the index node cluster, and under the condition that the need is determined, newly adding the same index shards by the fraudulence so as to reduce the existing index load, namely, the first index shards and the second index shards are both used for storing the data index corresponding to the current time period.

According to the method, a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency is obtained in a current time period, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to the set time period according to the set number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are pre-allocated before the current time period, the first index fragment groups comprise a first number of index fragments, and the group of load parameters comprise first load parameters used for indicating the load condition of the first index fragment groups; under the condition that the index fragments allocated for the current time period are determined to be added according to the acquired set of load parameters, a second index fragment group is allocated for the current time period, wherein the second index fragment group comprises a first number of index fragments, and the first index fragment group and the second index fragment group are both used for storing the data indexes corresponding to the current time period, so that the problem that the system performance is lower due to the fact that the pre-creation flexibility of the data index storage is lower in the control method of the data index storage in the related art is solved.

In one exemplary embodiment, the first load parameter includes a current number of indexes, wherein the current number of indexes is a number of indexes of the data indexes stored by the index shards in the first index shard group;

after obtaining a set of load parameters obtained by data acquisition of the index node cluster according to the first acquisition frequency, the method further comprises:

s11, determining index fragments to be added to be allocated for the current time period under the condition that the current index number is larger than or equal to a first number threshold value.

In consideration of the upper storage limit of the index slices, when the current acquisition time is acquired that the current index number is greater than or equal to the first number threshold, the index slices allocated for the current time period to be added can be determined, wherein the current index number is the index number of the data indexes stored by the index slices in the first index slice group.

Alternatively, in this embodiment, it may be that, in a case where the number of indexes of the data indexes stored in at least one index shard in the first index shard group is greater than or equal to the first number threshold, an index shard to be allocated for the current time period is determined to be added; or determining the index fragment to be added to be allocated for the current time period when the index fragment in the first index fragment group has the index number of the stored data indexes greater than or equal to the first number threshold; the method may also be a method of determining whether the index fragment allocated for the current time period needs to be added based on a comparison result between the number of indexes stored in the index fragment and the first number threshold, or may be other methods of determining whether the index fragment allocated for the current time period needs to be added based on a comparison result between the number of indexes stored in the index fragment and the first number threshold, if the average value or the median of the number of indexes of the data indexes stored in the index fragment group is greater than or equal to the first number threshold, which is not limited in this embodiment.

According to the method and the device, whether the index fragment needs to be newly added for the current time period is determined based on the index number of the data indexes stored on the index fragment in the index fragment group corresponding to the current time period, the index fragment number can be dynamically adjusted based on the residual storage capacity of the existing index fragment, flexibility of index creation is improved, and service sensitivity of a system is improved.

In an exemplary embodiment, the set of load parameters further includes a current interaction delay, wherein the current interaction delay is an interaction delay of an inode at which an index shard of the first index shard set is located;

S21, determining index slices to be added to be allocated for the current time period under the condition that the current interaction delay is greater than or equal to a preset delay threshold, wherein the current interaction delay comprises at least one of the following: current write latency, current query latency.

In this embodiment, the interaction delay may be caused by the overload with the index shard in the first index shard group, for example, there may be a query delay in querying the data index stored on the index shard on the index node in the index node cluster, there may be a write delay in writing the data index in the current time period to the index shard in the index shard group corresponding to the current time period, there may be an update delay in updating the data index stored on the index shard on the index node in the index node cluster, there may be a delete delay in deleting the data index stored on the index shard on the index node in the index node cluster, and so on.

In the case that the current interaction delay is greater than or equal to the preset delay threshold, determining an index shard to be added to be allocated for the current time period, wherein the current interaction delay includes at least one of: current write latency, current query latency, current update latency, current delete latency.

In this embodiment, it may be determined that the index shard allocated for the current time period is to be added in a case where at least one of the current interaction delays is greater than or equal to a preset delay threshold, for example, in a case where the current query delay is greater than or equal to a preset delay threshold, it is determined that the index shard allocated for the current time period is to be added, and different interaction delays may correspond to different preset delay thresholds, for example, the query delay and the write delay may correspond to different delay thresholds.

Optionally, the current interaction delay may correspond to the current acquisition time, and may be that, when interaction delays of index nodes where index slices in the first index slice group are located are all greater than or equal to a preset delay threshold, the index slices to be added to be allocated for the current time period are determined; or determining the index fragment to be added to be allocated for the current time period under the condition that the interaction delay of the index node where one index fragment is located in the first index fragment group is greater than or equal to a preset delay threshold; the method may also determine that the index fragment allocated for the current time period is to be added when the interaction delay of the index node where the index fragment is located is greater than or equal to the preset delay threshold in the first index fragment group, which is not limited in this embodiment.

According to the method and the device, based on interaction delay of the index nodes where the index slices in the index slice group corresponding to the current time period are located, whether the index slices need to be newly added for the current time period is determined, the number of the index slices can be dynamically adjusted based on performance parameters (interaction delay) of the index nodes, flexibility of index creation is improved, and service sensitivity of a system is improved.

In an exemplary embodiment, in a case that it is determined that the index patch allocated for the current time period is to be added according to the obtained set of load parameters, allocating a second index patch group for the current time period includes:

S31, under the condition that the index fragments allocated for the current time period to be added are determined according to the obtained group of load parameters, a second index fragment group is allocated for the current time period from index nodes with interaction delay smaller than a preset delay threshold value in the index node cluster.

In order to avoid further reducing the performance of the hot node, in this embodiment, in the case that it is determined that the index slices to be allocated for the current time period are to be added according to the obtained set of load parameters, the second index slice group may be allocated for the current time period from the index node cluster, where the interaction delay is smaller than the preset delay threshold on the index node where the index slices in the index slice group corresponding to the current time period are located, in consideration that there may be index nodes (i.e., hot nodes) where the interaction delay is greater than or equal to the preset delay threshold.

According to the method and the device, index fragments in the newly added index fragment group are distributed to index nodes with interaction delay smaller than a preset delay threshold in the index node cluster, so that the problem that the index fragments are redistributed on 'hot nodes' with reduced performance can be avoided, and the overall performance of the system is reduced.

And S41, under the condition that the index fragments allocated for the current time period to be added are determined according to the obtained group of load parameters, allocating the first number of the index fragments from the index nodes in the index node cluster to obtain a second index fragment group so as to balance the allocated index fragments on the index nodes in the index node cluster.

In order to avoid degrading the system performance, in a case that it is determined that the index shards allocated for the current time period are to be added according to the obtained set of load parameters, a second index shard group may be allocated from the index nodes in the index node cluster, so as to balance the allocated index shards on the index nodes in the index node cluster, specifically, in order to implement the allocated index shard balance on the index nodes, the method may be implemented through step S51 or step S61.

Alternatively, index shard balancing may refer to evenly distributing index shards across all nodes in a cluster.

In an exemplary embodiment, in a case that it is determined that an index patch allocated for a current time period is to be added according to the obtained set of load parameters, allocating a first number of index patches from index nodes in the index node cluster to obtain a second index patch group, including:

S51, under the condition that the index fragments allocated for the current time period to be added are determined according to the obtained group of load parameters, according to the number of the index fragments allocated on the index nodes in the index node cluster, the first number of the index fragments are allocated on the index nodes in the index node cluster, and a second index fragment group is obtained, so that the difference value of the number of the index fragments allocated on different index nodes in the index node cluster is smaller than or equal to a second number threshold value.

According to the method, the second index fragment group is distributed from the index nodes in the index node clusters according to the number of the distributed index fragments on the index nodes in the index node clusters, so that the difference value of the number of the distributed index fragments on different index nodes in the index node clusters is smaller than or equal to the second number threshold value, and the distributed index fragments on the index nodes in the index node clusters can be balanced.

And S61, under the condition that the index slices allocated for the current time period to be increased are determined according to the obtained group of load parameters, allocating the first number of index slices from the index nodes in the index node clusters according to the allocated number proportion corresponding to the index nodes in the index node clusters, and obtaining a second index slice group so that the difference value of the allocated number proportion corresponding to different index nodes in the index node clusters is smaller than or equal to a preset duty ratio threshold value, wherein the allocated number proportion corresponding to the index nodes in the index node clusters is the ratio between the number of the allocated index slices on the index nodes in the index node clusters and the maximum number of the index slices supported by the index nodes in the index node clusters.

According to the method, the second index fragment group is distributed from the index nodes in the index node clusters according to the distributed number proportion corresponding to the index nodes in the index node clusters, so that the difference value of the distributed number proportion corresponding to different index nodes in the index node clusters is smaller than or equal to a preset proportion threshold value, and distributed index fragment balancing on the index nodes in the index node clusters can be achieved.

Here, the allocated number of the index nodes is a ratio of the number of the allocated index slices on the index node to the recommended upper limit value of the index slices supported by each index node.

According to the embodiment, the index sharding balance is beneficial to uniformly distributing the load among the nodes in the cluster, so that the influence on the performance caused by overload of some nodes is avoided, the resource utilization rate of each node in the cluster can be improved through the balanced sharding distribution, and the resource waste is avoided.

In one exemplary embodiment, the index shards on the index nodes in the index node cluster are partitioned in real time;

Under the condition that the index fragment allocated for the current time period is determined to be added according to the acquired group of load parameters, a second index fragment group is allocated for the current time period, and the method comprises the following steps:

And S71, under the condition that the index fragments allocated for the current time period to be added are determined according to the acquired group of load parameters, dividing a first number of index fragments from index nodes in the index node cluster, and obtaining a second index fragment group.

When creating an index (i.e., an index shard group), the number and type of shards may be specified. When creating an index, the number of index shards may be pre-specified to determine how many index shards an index is to be divided into, based on which the cluster of inodes may distribute data into the specified number of index shards.

Alternatively, in this embodiment, the index shards may be of two types, a master shard (PRIMARIES) and a replica shard (Replicas), the master shard being the actual shard of the stored data and the replica shard being a replica of the master shard for providing high availability and read performance.

The index slices on the index nodes in the index node cluster may be divided in real time, and under the condition that the index slices allocated for the current time period to be added are determined according to the obtained set of load parameters, a first number of index slices may be divided from the index nodes in the index node cluster, so as to obtain a second index slice group.

According to the method and the device, under the condition that the index needs to be created in the current time period, the index fragments with the specified number are divided from the index nodes in the index node cluster, so that the data indexes are distributed into the index fragments with the specified number, and load balancing can be achieved.

It should be noted that, the capacity of each index shard divided in real time may be a fixed default value, for example, 50G, or may be different, for example, hardware resources (such as CPU, memory, disk I/O, etc.) of different nodes in the cluster may affect the performance and capacity of the shards, and if the performance of some nodes is low, the index shards on them may store less data, which is not limited in this embodiment.

In an exemplary embodiment, after assigning the second index shard group for the current time period, the method further includes:

S81, adding a group name for a second index fragment group according to the group name of the first index fragment group, wherein the group name of the first index fragment group is used for indicating a data table corresponding to a data index stored by the first index fragment group and a time period corresponding to the first index fragment group in the distributed database, the data table indicated by the group name of the second index fragment group is identical to the data table indicated by the group name of the first index fragment group, and the time period indicated by the group name of the second index fragment group is identical to the time period indicated by the group name of the first index fragment group.

In this embodiment, taking an example of creating an index (index shard group) based on time granularity, in a case where it is determined based on cluster load that an index shard group needs to be newly added at the current time granularity (time period), the time period indicated by the group name of the newly added index shard group may be the same as an existing index shard group at the current time granularity (time period), and the data table in the distributed database corresponding to the group name of the newly added index shard group may be the same as an existing index shard group at the current time granularity (time period).

For example, in this embodiment, taking the table name of one data table of the distributed database as cld_tf_pass as an example, after creating the secondary index, the index name of cld_tf_pass_ indexer (index) and the index fragment group cld_tf_pass_ indexer _202301 (here, the time granularity is month, and the start time is 2023, and1 month) are created at the same time in the index node. The index aliases of the index slice groups with granularity of months, such as cld_tf_pass_ indexer _202301, namely the index aliases of the index slice groups, such as cld_tf_pass_ indexer _202302, cld_tf_pass_ indexer _202303, and the like, are referred to as cld_tf_pass_pass_ indexer.

For the query request, the index alias cld_tf_pass_ indexer can be directly accessed, and the query is automatically issued to the index fragment group (cld_tf_pass_ indexer _202301, cld_tf_pass_ indexer _202302 … and the like) corresponding to the alias through the alias mechanism of the index node, so that the data meeting the query condition in all the data indexes corresponding to the specified data table is returned.

For a write request, the write request may be written to the index shard group of the corresponding time granularity in batches according to the time stamp field of the write data.

In the case where the group name of the first index patch group is cld_tf_pass_ indexer _202302, the group name of the second index patch group may be cld_tf_pass_ indexer _202302_1.

By this embodiment, the group names of the index shard groups allocated for the same time period are set to indicate the same time period and the same data table, that is, the data indexes stored on the index shards in the index shard groups allocated for the same time period may be the service data in the same data table in the corresponding distributed database, so as to improve the scalability of the system.

And S91, storing the data indexes to be stored into index fragments contained in a current index fragment set according to the sequence of the index number of the data indexes stored by the index fragments from small to large when the data indexes to be stored corresponding to the current time period exist, wherein the current index fragment set contains the index fragments in the first index fragment group and the index fragments in the second index fragment group.

In order to reduce the index load of the original index shards, after the second index shard group is allocated for the current time period, the data index newly added in the current time period is written into the newly added index shards preferentially, so that load balancing is realized, and the existing index is shunted.

For example, in the present embodiment, in the case where there are data indexes to be stored corresponding to the current time period, the data indexes to be stored may be stored to the index shards included in the current index shard set in the order in which the number of indexes of the data indexes to which the index shards have been stored is from small to large.

Here, in the case where the original index patch group is the first index patch group and the newly added index patch group is the second index patch group in the current time period, the current index patch set may be an index patch including the index patch in the first index patch group and the index patch in the second index patch group.

According to the embodiment, the data index is written into the newly added index fragment preferentially, so that the original index fragment can be split, the fragment load balancing is realized, and the system performance can be improved.

In an exemplary embodiment, after obtaining a set of load parameters obtained by data acquisition of the inode cluster according to the first acquisition frequency, the method further includes:

S101, adjusting a second number corresponding to the next time period of the current time period according to a change curve of the load parameter in a group of load parameters, wherein the second number is the number of index slices in the index slice group allocated for the next time period.

In order to improve the resource utilization rate of the cluster, a second number corresponding to a next time period of the current time period may be adjusted based on a change curve of a load parameter in a set of load parameters acquired in the current time period, where the second number is a number of segments of index segments in an index segment group allocated for the next time period.

Here, the set of load parameters may include one or more load parameters, and in this embodiment, the second number corresponding to the next time period of the current time period may be adjusted based on a change curve of any load parameter in the set of load parameters; a different weight value may be set for each load parameter in the set of load parameters, and the second number corresponding to the next time period of the current time period may be adjusted based on a change curve of a sum of products of each load parameter in the set of load parameters and the corresponding weight value, which is not limited in this embodiment.

According to the embodiment, the index specification (namely, the number of fragments) of the next time period is adaptively adjusted based on the load change of the current time period, so that the utilization rate of the group entering resources can be improved, and the service sensitivity of the system can be improved.

In one exemplary embodiment, adjusting the second number corresponding to a next time period of the current time period according to a variation profile of a load parameter of the set of load parameters includes:

S111, under the condition that the change curves of the load parameters in the group of load parameters are in a linear ascending trend or a linear descending trend, performing linear fitting on the change curves of the load parameters in the group of load parameters to obtain fitting values, wherein the fitting values are used for representing the change trend of the load parameters in the group of load parameters;

And S112, determining the minimum value of the product of the first quantity and the fitting value and the maximum number of the index slices supported by the index nodes in the index node cluster as a second quantity.

For example, in this embodiment, if the load parameter curve such as the data amount under the index single slice (i.e. the number of data indexes stored under the single index slice), the interaction delay, etc. is linearly increased or decreased, the index specification S _i+1 of the next time granularity (time granularity is also called time period) can be calculated by fitting in the formula (1).

S _i+1=min（S_max,S_i y formula (1)

Wherein, S _max is the upper limit value of the standard supported by the cluster, S _i is the index standard of the current time granularity, y is the load variation trend fitted by binomial based on the load parameter, and represents the multiple relationship between S _i+1 and Si.

Referring to fig. 3, the index specification calculation curve may be as shown in fig. 3, and the number of index slices in the next time period gradually increases over time until the number reaches the upper limit value of the specification supported by the cluster where the number is located, and then the number becomes stable.

According to the embodiment, based on the load parameter curve linearly increasing or decreasing in the current time period, the number of the index fragments in the next time period is dynamically adjusted, so that the usability, flexibility and stability of the system can be greatly improved, and traffic flow diversion and index specification adjustment without perception of the service are realized.

S121, determining the first number as the second number in a case where the variation curve trend of the load parameter in the set of load parameters is leveled.

Under the condition that the trend of the load parameter change curve in the current time period is uniform, the setting of the number of the current index slices can be considered reasonable, and the index specification of the current time period can be continuously used in the next time period.

For example, in the present embodiment, in the case where the variation curve trend of the load parameter in the set of load parameters is leveled, the first number may be determined as the second number, that is, the index specification of the current time period is determined as the index specification of the next time period.

According to the embodiment, whether the index specification of the next time period needs to be adjusted is determined based on the change trend of the load parameter change curve of the current time period, the trend of the change curve corresponding to the load parameter at each acquisition time of the current time period is leveled, namely, the index specification of the current time period can be continuously used in the next time period under the condition that the load parameter at each acquisition time of the current time period is basically unchanged, and automatic determination of the index specification is achieved, and the system flexibility and service sensitivity are improved.

S131, under the condition that a second load parameter in a group of load parameters is larger than or equal to a first parameter threshold, adjusting the acquisition frequency of data acquisition of the index node cluster from the first acquisition frequency to a second acquisition frequency in the current time period, wherein the second load parameter is used for representing the use condition of node resources of index nodes in the index node cluster, and the first parameter threshold is the maximum value of the second load parameters obtained according to the preset parameter threshold and the previous acquisition.

In order to improve the rationality of index creation, after a set of load parameters obtained by performing data acquisition on the index node cluster according to the first acquisition frequency is acquired, whether the acquisition frequency needs to be adjusted may be determined based on a specified load parameter of the index node cluster, for example, in a case that a second load parameter in the set of load parameters is greater than or equal to a first parameter threshold, the acquisition frequency of performing data acquisition on the index node cluster may be adjusted from the first acquisition frequency to the second acquisition frequency in a current time period, where the second load parameter may be a load parameter used for representing a usage situation of node resources of index nodes in the index node cluster.

Alternatively, in this embodiment, the second load parameter may include a server CPU load, a server disk load, and so on, where one index node in the index node cluster may correspond to one server.

Under the condition that at least one of the second load parameters is greater than or equal to the first parameter threshold, the current index load can be considered to be in a sub-health state, the probability of occurrence of problems is increased, and in order to acquire the current index load more timely, the first load parameter is prevented from being abnormal, and the index acquisition frequency can be increased; if the second load parameter is normal (i.e., less than the first parameter threshold), the acquisition frequency may be kept unchanged.

For example, in the present embodiment, by default, the acquisition frequency at the current index granularity (also referred to as the current time period) is fixed to 1/60 of the index granularity. Such as index granularity of months (i.e., data indexes are stored in different index shard groups monthly), monitoring data is collected every 12 hours. If the second load parameter is abnormal (greater than or equal to the first parameter threshold), the index acquisition frequency is increased to one time of the original acquisition frequency, such as from every 12 hours of collection to every 6 hours of collection.

Taking the example that the second load parameter includes the server disk usage rate and the server CPU usage rate, the second load parameter abnormality or not may be calculated as formula (2):

R= (D _i＞D) | (C_i > C) formula (2)

Wherein D _i is the current collected disk usage, C _i is the current collected CPU usage, D is max (the standard value of the disk usage, D _i-1), C is max (the standard value of the CPU usage, C _i-1）,D_i-1 is the last collected disk usage, C _i-1 is the last collected CPU usage, and r=0 indicates that the parameters are normal if and only if Di is smaller than D and C _i is smaller than C.

According to the method and the device, under the condition that the load parameter used for representing the use condition of the node resources of the index nodes in the index node cluster is abnormal, the collection frequency of data collection of the index node cluster is increased, service sensitivity can be improved, and index fragments can be timely created to shunt the existing indexes.

In an exemplary embodiment, after adjusting the first acquisition frequency to the second acquisition frequency for data acquisition of the inode cluster in the current time period, the method further includes:

and S141, under the condition that the re-acquired second load parameters are smaller than a second parameter threshold, adjusting the acquisition frequency of data acquisition of the index node cluster to be a preset acquisition frequency in the current time period, wherein the second parameter threshold is smaller than or equal to the first parameter threshold.

Corresponding to the foregoing embodiment, when the load parameter (i.e., the second load parameter) for indicating the usage condition of the node resources of the index nodes in the index node cluster is monitored to be abnormal, after the acquisition frequency of data acquisition for the index node cluster is increased, if the reacquired second load parameter is smaller than the second parameter threshold, the acquisition frequency of data acquisition for the index node cluster may be appropriately reduced, where the second parameter threshold may be the same value or a different value from the first parameter threshold, and this is not limited in this embodiment.

For example, in this embodiment, when the re-acquired second load parameters are all smaller than the second parameter threshold, the collection frequency of data collection of the inode cluster in the current time period is adjusted to be a preset collection frequency, where the adjustment of the collection frequency may be performed when the second load parameters are all monitored to be smaller than the second parameter threshold in the set times.

According to the embodiment, under the condition that the load parameter used for representing the use condition of the node resources of the index nodes in the index node cluster is monitored to be normal, the acquisition frequency of data acquisition of the index node cluster is reduced, the system load can be reduced, and the system stability and response speed are improved.

In an exemplary embodiment, the above method further comprises:

S151, responding to a received pre-written log of the distributed database, and extracting a data operation request from the pre-written log, wherein the data operation request is used for requesting to execute specified data operation on specified service data in the distributed database;

S152, converting the data operation request into an index operation request, wherein the index operation request is used for requesting to execute a specified index operation matched with the specified index operation on the data index corresponding to the specified service data;

And S153, sending an index operation request to the index node cluster so as to execute the specified index operation on the data index corresponding to the specified service data in the index fragment corresponding to the current time period in the index node cluster.

Here, the specified data operation may include writing service data, updating service data, querying service data, deleting service data, and the like, and the specified index operation may include writing data index, updating data index, querying data index, deleting data index, and the like.

For example, in this embodiment, taking a distributed database as an example of a distributed column storage system HBase database, the HBase table is split into different regions according to a RowKey value and stored in a Region Server, there may be a plurality of different regions in one Region Server, a data synchronization device may be responsible for synchronizing data from the distributed database, and the core of the data synchronization device is an analog Region Server, and receives all WAL (WRITE AHEAD Log, pre-written Log) logs from the HBase in real time by means of HBase replication (replication) characteristics. Meanwhile, according to a preset mapping relation, converting HBase insertion requests obtained by WAL log analysis into batch insertion requests of different data indexes, and finally writing the data indexes into index fragments.

Specifically, the data synchronization device may convert the writing, updating and deleting requests contained in the WAL log into batch requests for writing, updating and deleting the designated data index according to a mapping rule of a preset HBase table and the data index, and send the batch requests to the index node cluster.

The data synchronizing device automatically processes sequential and disordered data according to the timestamp attribute of the data. Here, the out-of-order data refers to data which is not transmitted according to a normal acquisition time point due to a network, an acquisition equipment failure, and the like.

Here, the data synchronization device may be a component located on a node outside the index node cluster, which is not limited in this embodiment.

According to the embodiment, the data synchronization device is used for realizing the synchronization of the service data in the data table of the distributed database and the data index on the index fragment on the index node in the index node cluster, so that the data query efficiency can be improved, and the system performance can be improved.

In one exemplary embodiment, the cluster of inodes is a search server cluster and the distributed database is an HBase database.

For example, in this embodiment, the search server cluster may be an elastomer search cluster, where elastomer search is an open source search engine based on Apache Lucene (TM), and the data retrieval and analysis function is very powerful, abbreviated as ES. The distributed database can be a distributed column type storage system HBase database, HBase is a distributed database based on column storage, is used as a core component of an open source distributed batch processing framework Hadoop ecological circle, has good write performance, excellent expandability and stable data storage, plays a key role in the storage framework of a plurality of first-line Internet enterprises, is an ideal storage medium of mass data, is a distributed and column-oriented open source database, can store mass data, and is stored in an HDFS (Hadoop Distributed FILE SYSTEM, distributed file system).

HBase is used as a common distributed database in the field of big data and supports data storage of trillion rows and millions of columns, but HBase only provides inquiry based on row key and full-table scanning, and when multi-condition combined inquiry is involved, the HBase needs to be realized in a two-level index mode combined with ES. However, when the HBase secondary index is realized in the related technology, one HBase data table is usually adopted to correspond to one ES index, the HBase table background can realize table load balancing through region splitting, the read-write performance is less affected by the data volume, but the ES index is continuously increased along with the data volume, the number of fragments of the original index is too small, and the write-in and query performance of the existing service cannot be met. In particular, when the HBase table data is written into the ES index by the secondary indexing scheme, the document ID (i.e., row key) is specified, and the influence on the writing performance is more obvious.

At present, a common scheme in the industry generally creates a plurality of ES indexes in advance according to time granularity or service granularity, and provides index service for one HBase table at the same time, so that the problem that the performance of the ES indexes is rapidly reduced along with the increase of data quantity can be greatly reduced. However, pre-creating the index in advance requires that the business party have intensive researches on the existing data volume, business load and the like, so that the index can be pre-created, and the requirement on personnel level is extremely high. Meanwhile, if the system data access is suddenly increased (for example, the service is suddenly newly accessed into 20 bayonets from the original 10 bayonets), the service concurrency request under the condition of sudden increase of the data volume cannot be met because the index template specification is fixed and the index creation of the current granularity is completed based on the index template.

In order to at least partially solve the above problems, the present application proposes an index management scheme of HBase secondary index, which dynamically adjusts the index specification/index granularity of the pre-created index by sensing the load of the existing index, so as to respond to the load of the future service request more timely. Here, the secondary index refers to a scheme for writing HBase table data meeting the conditions into the ES index according to a preset field mapping rule, so as to make up for the deficiency of multi-condition combination query of the HBase itself, the primary index of the HBase is a row key rowkey, the HBase can be searched through rowkey, but some combination query needs to be performed on columns in the column group in the HBase, only full table scanning can be performed, if the table is larger, the query efficiency is low, so that the scheme of the secondary index is required to be proposed. The secondary index can be understood as a scheme of looking up rowkey from the column values of the column group and then looking up data from HBase according to rowkey, i.e. building the column values of the column group and looking up rowkey.

Alternatively, an overall schematic of a control system for data index storage may be as shown in FIG. 4. The control system for data index storage may include HBase RegionServer (HBase RegionServer is a key component in the Apache HBase distributed storage system, which may be used for server processes responsible for handling client requests), ES clusters, data synchronization means, index load aware means, and index map management means.

The index load sensing device can continuously collect index loads in the current time granularity (also called the current time period) according to a certain rule and frequency, and different decisions are provided based on index conditions of each level. Aiming at the current load condition of the system, the system can track and feed back in time, the index can be automatically created and the index specification can be adjusted, and the flexibility and the sensitivity of the system can be improved. Alternatively, the index load aware device workflow diagram may be as shown in fig. 5, specifically:

The index load sensing device collects the load conditions of the clusters, such as CPU utilization rate, disk utilization rate, write-in delay, query delay, data size under single slice and other key indexes according to a certain frequency, compares and analyzes the key indexes with the recommended value, and gives a decision according to the analysis result. For example, in the case where the current time period is 202302, the index specification of the next time period 202303 is adaptively adjusted.

The data quantity, the writing delay and the inquiring delay under the index single slice belong to a first-level index, and the CPU/disk load of the server belongs to a second-level index. The second level index determines the policy of decision one and the first level index determines the policies of decision two and decision three. Wherein, decision one: the index collection frequency is adjusted, and the index collection frequency change schematic diagram can be as shown in fig. 6, the index collection frequency is increased to one time of the original collection frequency under the condition that the second index is abnormal for the first time, and the index collection frequency is continuously increased to one time of the collection frequency corresponding to the current moment under the condition that the second index is abnormal for the second time; decision two: creating an index fragment group under the granularity of the current time; decision three: index specification for next time granularity.

The index load sensing device is responsible for sensing the load index of the existing index, counting the load change trend under the current time granularity, dynamically adjusting the load index collection frequency according to the index load change trend, sending a decision to the index mapping management device based on load index evaluation, and adjusting the index collection frequency of the index load sensing device according to the decision result, so that the load change condition of the system can be sensed more timely and the adjustment can be made timely; according to the index level, different load indexes give different decision results to guide the index mapping management device; according to the first-level index change trend in the current time granularity, calculating the index specification of the next time granularity, reasonably utilizing cluster resources and avoiding resource waste.

The data synchronizing device is responsible for synchronizing data from the HBase, and can convert a data writing request contained in the WAL log of the HBase into an ES in real time and insert the ES into a corresponding index.

The index mapping management device receives the decision of the index load sensing device, processes the decision according to the received index processing decision, for example, creates an index fragment group at the current time granularity, adjusts index specification of the next time granularity and the like, and simultaneously maintains a mapping relation between an HBase table related to writing and inquiring requests and an ES index, wherein the mapping relation can be realized in a mode of an ES index alias, and a ZooKeeper can be used for storing metadata of the whole HBase cluster and state information of the cluster to realize functions of cluster coordination, metadata management, fault transfer, lock service, cluster monitoring and the like. The through-table index may refer to an index created to improve query efficiency.

The index map management device is executed according to the decision of the index load sensing device. If a decision request for creating the index is received, creating an index fragment group with finer granularity, and completing index mapping, writing and forwarding of a query request; if the index specification of the next time granularity is received, an index fragment group of the next time granularity is created in advance, and index mapping is completed; meanwhile, the index mapping management device manages mapping rules among the ES indexes corresponding to the HBase table, and is used for informing the ES writing requests converted by the HBase table data of which indexes are written in specifically and the query range of the indexes during query.

For example, the HBase table is named cld_tf_pass, and after the secondary index is created, the index names cld_tf_pass_ indexer and cld_tf_pass_ indexer _202301 are created in the ES at the same time (here, the time granularity is month, and the start time is 2023, and month 1 is an example). Wherein cld_tf_pass_ indexer is an index alias of a month-granularity index (here, one index corresponds to one index slice group) such as cld_tf_pass_ indexer _202301, that is, an index alias of cld_tf_pass_ indexer _202302, cld_tf_pass_ indexer _202303 indexes. For the query request, the index alias cld_tf_pass_ indexer is directly accessed, and the query is automatically issued to the index (cld_tf_pass_ indexer _202301, cld_tf_pass_ indexer _202302 …, etc.) corresponding to the alias through the alias mechanism of the ES, so that the data meeting the query condition in all the index data corresponding to the specified HBase table is returned. For a write request, the write request is written in batches into an index with corresponding time granularity according to the time stamp field of the write data.

The series of indexes corresponding to the index aliases are distinguished based on time granularity, and the time granularity corresponding to each index is not coincident. At the same time, the device is closely related to the index load sensing device. In a series of indexes corresponding to the index aliases, only the index specification of the first time granularity is estimated based on the traffic load, and the indexes of the subsequent time granularity are all based on the index specification (fragment number) provided by the index load sensing device. In general, each HBase table corresponds to one type of traffic data. If the service corresponding to the table is promoted along with the service range, the daily access data volume is larger and larger, the index fragment number of the subsequent time granularity is also gradually increased, and if the service data belongs to the edge service, the access data volume is smaller and smaller, and then the index fragment number of the subsequent time granularity is also reduced. This mechanism avoids the ES resource waste caused by using a fixed index specification.

Meanwhile, if the service data volume is increased rapidly within a certain time granularity, and the index corresponding to the current time granularity cannot meet the use of the existing service due to writing, inquiry delay and the like, a new additional index can be created in time to accept the service data of the time granularity, so that the smooth operation of the service is ensured, and no perception is realized for a client.

By the embodiment, different decisions are executed by continuously collecting and analyzing the current time granularity load condition. The system load can be automatically maintained in a reasonable range, and the normal operation of the service is ensured. Compared with other modes of setting the timing creation of the index template in advance based on time/service granularity, the flexibility, the sensitivity and the compatibility of the system are greatly improved, and the technical capability requirement on operation and maintenance users is greatly reduced.

Although the embodiments of the present application have been described above, the foregoing description and definitions are provided to facilitate understanding of the embodiments of the present application, and are not intended to limit the present application, and any modifications and variations may be made thereto without departing from the spirit and scope of the present application, particularly any software and hardware implementation based on the algorithm, etc. are within the scope of the present application.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

Fig. 7 is an alternative control device for data index storage according to an embodiment of the present application, as shown in fig. 7, where the device includes:

an obtaining unit 702, configured to obtain, in a current time period, a set of load parameters obtained by performing data acquisition on an inode cluster according to a first acquisition frequency, where an index shard on an inode in the inode cluster is used to store a data index established for service data stored in a data table of a distributed database, an index shard on an inode in the inode cluster is divided into index shard groups corresponding to the set time period according to a set number of shards, the index shard groups corresponding to the current time period are first index shard groups, the first index shard groups are pre-allocated before the current time period, the first index shard groups include a first number of index shards, and the set of load parameters includes a first load parameter for indicating a load condition of the first index shard groups;

And an allocation unit 704, configured to allocate a second index patch group for the current time period, where the second index patch group includes a first number of index patches, and the first index patch group and the second index patch group are both used to store a data index corresponding to the current time period, where the index patches to be allocated for the current time period are determined to be added according to the obtained set of load parameters.

It should be noted that, the acquiring unit 702 in this embodiment may be used to perform the above-mentioned step S202, the allocating unit 704 in this embodiment may be used to perform the above-mentioned step S204, the acquiring unit 702 in the control device for data index storage corresponds to the index load sensing device in the foregoing embodiment, and the allocating unit 704 in the control device for data index storage corresponds to the index map management device in the foregoing embodiment.

The index load sensing device is used for acquiring a group of load parameters obtained by data acquisition of the index node cluster according to the first acquisition frequency in the current time period, and the index mapping management device is used for distributing a second index fragment group for the current time period under the condition that the index load sensing device determines index fragments to be distributed for the current time period according to the acquired group of load parameters.

According to the embodiment of the application, a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency is obtained in a current time period, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to a set time period according to the preset number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are preassigned before the current time period, the first index fragment groups comprise a first number of index fragments, and one group of load parameters comprise first load parameters for indicating the load condition of the first index fragment groups; under the condition that the index fragments allocated for the current time period are determined to be added according to the acquired set of load parameters, a second index fragment group is allocated for the current time period, wherein the second index fragment group comprises a first number of index fragments, and the first index fragment group and the second index fragment group are both used for storing the data indexes corresponding to the current time period.

The device is used for realizing the control method for data index storage provided in the above embodiment, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The device further comprises:

The first determining unit is configured to determine, after obtaining a set of load parameters obtained by performing data acquisition on the inode cluster according to the first acquisition frequency, an index fragment to be added to be allocated for the current time period when the current index number is greater than or equal to a first number threshold.

The device further comprises:

the second determining unit is configured to determine, after obtaining a set of load parameters obtained by performing data acquisition on the inode cluster according to the first acquisition frequency, an index fragment to be allocated for a current time period to be added if a current interaction delay is greater than or equal to a preset delay threshold, where the current interaction delay includes at least one of: current write latency, current query latency.

In one exemplary embodiment, the allocation unit includes:

The allocation module is used for allocating a second index fragment group for the current time period from index nodes with interaction delay smaller than a preset delay threshold value in the index node cluster under the condition that the index fragments allocated for the current time period to be increased are determined according to the acquired group of load parameters.

In one exemplary embodiment, the allocation unit includes:

And the distribution module is used for distributing the first number of index fragments from the index nodes in the index node cluster under the condition that the index fragments to be distributed for the current time period are determined to be added according to the acquired group of load parameters, so as to obtain a second index fragment group, and the distributed index fragments on the index nodes in the index node cluster are balanced.

In one exemplary embodiment, the allocation module includes:

The first allocation submodule is used for allocating the first number of index fragments from the index nodes in the index node cluster according to the number of the index fragments allocated to the index nodes in the index node cluster under the condition that the index fragments allocated to the current time period to be increased are determined according to the acquired group of load parameters, so as to obtain a second index fragment group, and the difference value of the number of the index fragments allocated to different index nodes in the index node cluster is smaller than or equal to a second number threshold.

In one exemplary embodiment, the allocation module includes:

And the second allocation submodule is used for allocating the first number of index fragments from the index nodes in the index node clusters according to the allocated number of the index nodes corresponding to the index node clusters under the condition that the index fragments allocated for the current time period are determined to be increased according to the acquired group of load parameters, so as to obtain a second index fragment group, and the difference value of the allocated number of the index fragments corresponding to different index nodes in the index node clusters is smaller than or equal to a preset duty ratio threshold value, wherein the allocated number of the index nodes corresponding to the index node clusters is the ratio of the number of the index fragments allocated to the index node in the index node clusters to the maximum number of the index fragments supported by the index nodes in the index node clusters.

The distribution unit includes:

The dividing module is used for dividing the index fragments of the first number from the index nodes in the index node cluster to obtain a second index fragment group under the condition that the index fragments to be added for the current time period are determined according to the obtained group of load parameters.

In an exemplary embodiment, the above apparatus further includes:

And the adding unit is used for adding a group name to the second index fragment group according to the group name of the first index fragment group after the second index fragment group is allocated for the current time period, wherein the group name of the first index fragment group is used for indicating a data table corresponding to the data index stored by the first index fragment group and a time period corresponding to the first index fragment group in the distributed database, the data table indicated by the group name of the second index fragment group is identical to the data table indicated by the group name of the first index fragment group, and the time period indicated by the group name of the second index fragment group is identical to the time period indicated by the group name of the first index fragment group.

In an exemplary embodiment, the above apparatus further includes:

And the storage unit is used for storing the data indexes to be stored into the index fragments contained in the current index fragment set according to the sequence of the index number of the data indexes stored by the index fragments from small to large when the data indexes to be stored corresponding to the current time period exist after the second index fragment group is allocated for the current time period, wherein the current index fragment set contains the index fragments in the first index fragment group and the index fragments in the second index fragment group.

In an exemplary embodiment, the above apparatus further includes:

The first adjusting unit is configured to adjust, according to a change curve of a load parameter in a set of load parameters after obtaining a set of load parameters obtained by performing data acquisition on the index node cluster according to the first acquisition frequency, a second number corresponding to a next time period of the current time period, where the second number is a number of index slices in the index slice group allocated for the next time period.

In one exemplary embodiment, the first adjusting unit includes:

The fitting module is used for carrying out linear fitting on the change curves of the load parameters in the group of load parameters under the condition that the change curves of the load parameters in the group of load parameters are in a linear ascending trend or a linear descending trend, so as to obtain fitting values, wherein the fitting values are used for representing the change trend of the load parameters in the group of load parameters;

and the first determining module is used for determining the minimum value of the product of the first quantity and the fitting value and the maximum number of the index fragments supported by the index nodes in the index node cluster as the second quantity.

In one exemplary embodiment, the first adjusting unit includes:

And a second determining module for determining the first number as the second number in case that the variation curve trend of the load parameter in the set of load parameters is leveled.

In an exemplary embodiment, the above apparatus further includes:

The second adjusting unit is configured to adjust, after obtaining a set of load parameters obtained by performing data collection on the index node cluster according to the first collection frequency, the collection frequency of performing data collection on the index node cluster in the current time period from the first collection frequency to the second collection frequency under the condition that a second load parameter in the set of load parameters is greater than or equal to a first parameter threshold, where the second load parameter is used to represent a usage condition of node resources of index nodes in the index node cluster, and the first parameter threshold is a maximum value of the second load parameters obtained according to a preset parameter threshold and a previous collection.

In an exemplary embodiment, the above apparatus further includes:

And the third adjusting unit is used for adjusting the acquisition frequency of the data acquisition of the index node cluster to a preset acquisition frequency in the current time period under the condition that the re-acquired second load parameters are smaller than the second parameter threshold after the first acquisition frequency is adjusted to the second acquisition frequency in the current time period, wherein the second parameter threshold is smaller than or equal to the first parameter threshold.

In an exemplary embodiment, the above apparatus further includes:

The extraction unit is used for responding to the received pre-written log of the distributed database and extracting a data operation request from the pre-written log, wherein the data operation request is used for requesting to execute specified data operation on specified service data in the distributed database;

A conversion unit configured to convert a data operation request into an index operation request, where the index operation request is configured to request execution of a specified index operation matching the specified index operation on a data index corresponding to specified service data;

and the sending unit is used for sending the index operation request to the index node cluster so as to execute the specified index operation on the data index corresponding to the specified service data in the index fragment corresponding to the current time period in the index node cluster.

In one exemplary embodiment, the cluster of inodes is a cluster of search servers and the distributed database is a Ha Duopu database.

According to a further aspect of embodiments of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a USB flash disk, a read-only memory, a random access memory, a removable hard disk, a magnetic disk or an optical disk.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

Embodiments of the present application also provide another computer program product comprising a non-volatile computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

Embodiments of the present application also provide a computer program comprising computer instructions stored in a computer-readable storage medium; the processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the steps of any of the method embodiments described above.

According to one aspect of the present application, there is provided a computer program product comprising a computer program/instruction containing program code for executing the method shown in the flow chart. In such an embodiment, referring to fig. 8, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable media 811. When executed by the central processor 801, the computer program performs various functions provided by embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

Referring to fig. 8, fig. 8 is a block diagram of an alternative computer system of an electronic device according to an embodiment of the present application.

Fig. 8 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the application. As shown in fig. 8, the computer system 800 includes a central processing unit 801 (Central Processing Unit, CPU) that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory 802 (ROM) or a program loaded from a storage section 808 into a random access Memory 803 (Random Access Memory, RAM). In the random access memory 803, various programs and data required for system operation are also stored. The central processing unit 801, the read only memory 802, and the random access memory 803 are connected to each other through a bus 804. An Input/Output interface 805 (I/O interface for short) is also connected to the bus 804.

The following components are connected to the input/output interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a Cathode Ray Tube (CRT), a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), and a speaker, etc.; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a local area network card, modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the input/output interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The computer programs, when executed by the central processor 801, perform the various functions defined in the system of the present application.

It should be noted that, the computer system 800 of the electronic device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

According to a further aspect of embodiments of the present application there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.

Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the embodiments of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the application are not limited to any specific combination of hardware and software.

The above is only a preferred embodiment of the present application and is not intended to limit the embodiment of the present application, and various modifications and variations can be made to the embodiment of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the embodiments of the present application should be included in the protection scope of the embodiments of the present application.

Claims

1. A control method for data index storage is characterized in that,

Comprising the following steps:

In a current time period, acquiring a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to the set time period according to the set number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are allocated in advance before the current time period, the first index fragment groups comprise a first number of index fragments, and the group of load parameters comprises a first load parameter used for indicating the load condition of the first index fragment groups;

And under the condition that the index slices allocated for the current time period are determined to be increased according to the acquired group of load parameters, a second index slice group is allocated for the current time period, wherein the second index slice group comprises the first number of index slices, and the first index slice group and the second index slice group are both used for storing data indexes corresponding to the current time period.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The first load parameter comprises a current index number, wherein the current index number is the index number of the data indexes stored by the index shards in the first index shard group;

after the obtaining a set of load parameters obtained by performing data collection on the index node cluster according to the first collection frequency, the method further includes:

And determining index fragments to be added to be allocated for the current time period under the condition that the current index number is greater than or equal to a first number threshold.

3. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The set of load parameters further comprises a current interaction delay, wherein the current interaction delay is an interaction delay of an index node where an index fragment in the first index fragment group is located;

And determining index slices to be added to be allocated for the current time period under the condition that the current interaction delay is greater than or equal to a preset delay threshold, wherein the current interaction delay comprises at least one of the following: current write latency, current query latency.

4. The method of claim 3, wherein the step of,

And under the condition that the index fragment allocated for the current time period is determined to be increased according to the acquired group of load parameters, allocating a second index fragment group for the current time period, including:

and under the condition that the index fragments allocated for the current time period to be added are determined according to the acquired group of load parameters, allocating the second index fragment group for the current time period from the index nodes with interaction delay smaller than the preset delay threshold value in the index node cluster.

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

And under the condition that the index fragments allocated for the current time period to be added are determined according to the acquired group of load parameters, allocating the first number of index fragments from the index nodes in the index node cluster to obtain the second index fragment group so as to balance the allocated index fragments on the index nodes in the index node cluster.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

And under the condition that the index slices allocated for the current time period to be added are determined according to the obtained group of load parameters, allocating the first number of index slices from the index nodes in the index node cluster to obtain the second index slice group, wherein the method comprises the following steps:

And under the condition that the index fragments allocated for the current time period to be added are determined according to the acquired group of load parameters, according to the number of the index fragments allocated on the index nodes in the index node cluster, allocating the first number of the index fragments from the index nodes in the index node cluster to obtain the second index fragment group, so that the difference value of the number of the index fragments allocated on different index nodes in the index node cluster is smaller than or equal to a second number threshold.

7. The method of claim 5, wherein the step of determining the position of the probe is performed,

And under the condition that the index slices allocated for the current time period to be increased are determined according to the acquired group of load parameters, according to the allocated quantity proportion corresponding to the index nodes in the index node clusters, allocating the first quantity of index slices from the index nodes in the index node clusters to obtain the second index slice group, so that the difference value of the allocated quantity proportion corresponding to different index nodes in the index node clusters is smaller than or equal to a preset proportion threshold, wherein the allocated quantity proportion corresponding to the index nodes in the index node clusters is the ratio between the quantity of the allocated index slices on the index nodes in the index node clusters and the maximum quantity of the index slices supported by the index nodes in the index node clusters.

8. The method of claim 1, wherein the step of determining the position of the substrate comprises,

Index fragments on index nodes in the index node cluster are divided in real time;

And under the condition that the index fragments allocated for the current time period to be added are determined according to the acquired group of load parameters, dividing the index fragments of the first number from the index nodes in the index node cluster, and obtaining the second index fragment group.

9. The method of claim 1, wherein the step of determining the position of the substrate comprises,

After said assigning a second index shard group for said current time period, said method further comprises:

and adding a group name for the second index fragment group according to the group name of the first index fragment group, wherein the group name of the first index fragment group is used for indicating a data table corresponding to the data index stored by the first index fragment group and a time period corresponding to the first index fragment group in the distributed database, the data table indicated by the group name of the second index fragment group is identical to the data table indicated by the group name of the first index fragment group, and the time period indicated by the group name of the second index fragment group is identical to the time period indicated by the group name of the first index fragment group.

10. The method of claim 1, wherein the step of determining the position of the substrate comprises,

And under the condition that the data indexes to be stored corresponding to the current time period exist, storing the data indexes to be stored into index fragments contained in a current index fragment set according to the sequence of the index number of the data indexes stored by the index fragments from small to large, wherein the current index fragment set contains the index fragments in the first index fragment group and the index fragments in the second index fragment group.

11. The method of claim 1, wherein the step of determining the position of the substrate comprises,

And adjusting a second number corresponding to the next time period of the current time period according to the change curve of the load parameter in the group of load parameters, wherein the second number is the number of index slices in the index slice group allocated for the next time period.

12. The method of claim 11, wherein the step of determining the position of the probe is performed,

The adjusting the second number corresponding to the next time period of the current time period according to the change curve of the load parameter in the set of load parameters includes:

Under the condition that the change curves of the load parameters in the group of load parameters are in a linear ascending trend or a linear descending trend, performing linear fitting on the change curves of the load parameters in the group of load parameters to obtain fitting values, wherein the fitting values are used for representing the change trend of the load parameters in the group of load parameters;

And determining the minimum value of the product of the first quantity and the fitting value and the maximum number of the index slices supported by the index nodes in the index node cluster as the second quantity.

13. The method of claim 11, wherein the step of determining the position of the probe is performed,

the first number is determined as the second number if the variation profile of the load parameter in the set of load parameters is leveled.

14. The method of claim 1, wherein the step of determining the position of the substrate comprises,

And under the condition that a second load parameter in the group of load parameters is greater than or equal to a first parameter threshold, adjusting the acquisition frequency of data acquisition of the index node cluster from the first acquisition frequency to a second acquisition frequency in the current time period, wherein the second load parameter is used for representing the use condition of node resources of index nodes in the index node cluster, and the first parameter threshold is the maximum value of the second load parameters obtained according to a preset parameter threshold and the previous acquisition.

15. The method of claim 14, wherein the step of providing the first information comprises,

After the first acquisition frequency is adjusted to the second acquisition frequency in the current time period for data acquisition of the index node cluster, the method further includes:

And under the condition that the re-acquired second load parameters are smaller than a second parameter threshold, adjusting the acquisition frequency of data acquisition of the index node cluster to be a preset acquisition frequency in the current time period, wherein the second parameter threshold is smaller than or equal to the first parameter threshold.

16. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The method further comprises the steps of:

responding to the received pre-written log of the distributed database, and extracting a data operation request from the pre-written log, wherein the data operation request is used for requesting to execute specified data operation on specified service data in the distributed database;

converting the data operation request into an index operation request, wherein the index operation request is used for requesting to execute a specified index operation matched with the specified index operation on a data index corresponding to the specified service data;

And sending the index operation request to the index node cluster, so as to execute the specified index operation on the data index corresponding to the specified service data in the index fragment corresponding to the current time period in the index node cluster.

17. The method according to any one of claims 1 to 16, wherein,

The index node cluster is a search server cluster, and the distributed database is an HBase database.

18. A control device for data index storage is characterized in that,

Comprising the following steps:

The system comprises an acquisition unit, a load parameter acquisition unit and a load parameter generation unit, wherein the acquisition unit is used for acquiring a group of load parameters obtained by data acquisition of an index node cluster according to a first acquisition frequency, wherein index fragments on index nodes in the index node cluster are used for storing data indexes established for service data stored in a data table of a distributed database, the index fragments on the index nodes in the index node cluster are divided into index fragment groups corresponding to a set time period according to a set number of fragments, the index fragment groups corresponding to the current time period are first index fragment groups, the first index fragment groups are pre-allocated before the current time period, the first index fragment groups comprise a first number of index fragments, and the group of load parameters comprise a first load parameter used for indicating the load condition of the first index fragment groups;

The allocation unit is configured to allocate a second index patch group for the current time period when determining that the index patch allocated for the current time period is to be increased according to the obtained set of load parameters, where the second index patch group includes the first number of index patches, and the first index patch group and the second index patch group are both used to store data indexes corresponding to the current time period.

19. A computer-readable storage medium comprising,

The computer readable storage medium having stored therein a computer program, wherein the computer program when executed by a processor implements the steps of the method of any of claims 1 to 17.

20. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that,

The processor, when executing the computer program, implements the steps of the method of any one of claims 1 to 17.

21. A computer program product comprising a computer program, characterized in that,

The computer program implementing the steps of the method of any one of claims 1 to 17 when executed by a processor.