CN117632519B - Method and device for equalizing and adjusting fragmented data, medium and electronic equipment - Google Patents

Method and device for equalizing and adjusting fragmented data, medium and electronic equipment Download PDF

Info

Publication number
CN117632519B
CN117632519B CN202410095296.3A CN202410095296A CN117632519B CN 117632519 B CN117632519 B CN 117632519B CN 202410095296 A CN202410095296 A CN 202410095296A CN 117632519 B CN117632519 B CN 117632519B
Authority
CN
China
Prior art keywords
node
value
slicing
preset
standard deviation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410095296.3A
Other languages
Chinese (zh)
Other versions
CN117632519A (en
Inventor
王晏一
赵鹏
李尚锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huoli Tianhui Technology Co ltd
Original Assignee
Shenzhen Huoli Tianhui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huoli Tianhui Technology Co ltd filed Critical Shenzhen Huoli Tianhui Technology Co ltd
Priority to CN202410095296.3A priority Critical patent/CN117632519B/en
Publication of CN117632519A publication Critical patent/CN117632519A/en
Application granted granted Critical
Publication of CN117632519B publication Critical patent/CN117632519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of load balancing, and provides a method, a device, a medium and electronic equipment for balancing and adjusting fragmented data. The method comprises the following steps: acquiring node load values of all the fragment nodes in the search service cluster; obtaining a first standard deviation value based on node load values of the respective sharded nodes; simulating and migrating any first data item in the first slicing node to the second slicing node; obtaining a second standard deviation value based on the node load value of each fragment node after simulation migration; obtaining a variation based on a difference between the second standard deviation and the first standard deviation; and when the variation is smaller than zero, actually migrating any one of the first data items to the second sliced node, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met. Dynamic migration and load balancing of the fragmented data are realized, and the equality of the query fragmented data is ensured.

Description

Method and device for equalizing and adjusting fragmented data, medium and electronic equipment
Technical Field
The application relates to the technical field of load balancing, in particular to a method, a device, a medium and electronic equipment for balancing and adjusting fragmented data.
Background
At present, an internet enterprise generally adopts a distributed concurrent storage mode to solve the problems of mass data storage and management. The data is stored in a decentralized manner by searching for data segments (shards) or database tables in individual shard nodes in the service cluster.
But the data size distribution in different data fragments (slices) or database tables is not balanced due to factors such as the insertion mode of the data, the data access mode, the data distribution strategy and the like. The imbalance of the fragmented data may cause problems such as performance degradation, resource waste, fault tolerance degradation and the like.
Therefore, the application provides a method for equalizing and adjusting fragmented data, so as to solve the technical problems.
Disclosure of Invention
The application aims to provide a method, a device, a medium and electronic equipment for equalizing and adjusting fragmented data, which can solve at least one technical problem. The specific scheme is as follows:
According to a specific embodiment of the present application, in a first aspect, the present application provides a method for equalizing and adjusting fragmented data, including:
Acquiring node load values of all the fragment nodes in the search service cluster;
Obtaining a first standard deviation value based on node load values of the respective sharded nodes;
Simulating and migrating any first data item in the first slicing node to a second slicing node, wherein the node load value of the first slicing node meets a preset larger value range, and the node load value of the second slicing node meets a preset smaller value range;
obtaining a second standard deviation value based on the node load value of each fragment node after simulation migration;
obtaining a variation based on a difference between the second standard deviation and the first standard deviation;
and when the variation is smaller than zero, actually migrating any one of the first data items to the second sliced node, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met.
Optionally, the method further comprises:
When the variation is greater than or equal to zero, acquiring a current calculated temperature value of the first slicing node based on a product value of a previous calculated temperature value of the first slicing node obtained in a previous iteration and a preset cooling rate;
calculating the quotient of the opposite number of the variation and the current calculated temperature value to obtain an index value;
obtaining a probability value based on a natural constant value e as a base and the exponent value as a power;
determining random numbers in the range of [0,1 ];
And when the random number is smaller than or equal to the probability value, actually migrating any one of the first data items to the second slicing node, and triggering and executing the iteration operation of acquiring the node load value of each slicing node in the search service cluster until a preset termination iteration condition is met.
Optionally, the method further comprises:
And when the random number is larger than the probability value, triggering and executing the iterative operation of simulating and migrating any first data item in the first slicing node to the second slicing node until a preset iteration termination condition is met.
Optionally, the preset iteration termination condition includes one of the following conditions:
The current calculated temperature value of the first slicing node is smaller than or equal to a preset termination temperature value;
the current iteration number of the first slicing node is greater than or equal to a preset maximum iteration number, wherein the current iteration number is equal to the previous iteration number plus one;
The migration failure times of the continuous iteration are larger than or equal to the preset maximum migration failure times, wherein the migration failure times are obtained by adding one to the migration failure times when the random number is larger than the probability value in the iteration process.
Optionally, the obtaining a node load value of each fragment node in the search service cluster includes:
monitoring the key resource occupancy rate of each fragment node in the search service cluster in real time;
when the key resource occupancy rate of any one of the slicing nodes meets the preset excessive occupancy rate condition, acquiring an index value of at least one load index of each data item in each slicing node;
Obtaining a single load value of the corresponding data item based on the index value of the at least one load index of each data item and a preset weight value of the corresponding load index;
and obtaining the load value of the corresponding slicing node based on the single load value of all the data items of each slicing node.
Optionally, the method further comprises:
when a preset iteration termination condition is met, acquiring a node load value of each fragment node in the search service cluster, and acquiring a third standard deviation value based on the node load value of each fragment node;
when the third standard deviation value is larger than a preset equalization standard deviation threshold value, determining a third slicing node with the maximum node load value from all the slicing nodes;
Acquiring a query frequency index value of each data item in the third slicing node;
When the query frequency index value of each of the plurality of second data items in the third slicing node is larger than the product of the query frequency index value of any one of the other second data items in the third slicing node and a preset multiple, sequentially simulating and splitting the plurality of second data items into a plurality of data item sets, simulating and uniformly splitting the plurality of data item sets after each simulation splitting into other slicing nodes except the third slicing node in the search service cluster until a fourth standard deviation value obtained after the simulation splitting and the simulation uniformly splitting is smaller than or equal to a preset balance standard deviation threshold value, and then practically uniformly dividing the plurality of data item sets after the simulation splitting into the other slicing nodes according to a simulation uniformly splitting mode after the simulation splitting.
Optionally, the obtaining a node load value of each fragment node in the search service cluster includes:
periodically acquiring node load values of all the fragment nodes in the search service cluster.
According to a second aspect of the present application, there is provided an equalization adjustment device for fragmented data, including:
The first acquisition unit is used for acquiring node load values of all the fragment nodes in the search service cluster;
a first obtaining unit, configured to obtain a first standard deviation value based on node load values of the respective sharded nodes;
The simulation migration unit is used for simulating and migrating any one of the first data items in the first sliced node to the second sliced node, wherein the node load value of the first sliced node meets a preset larger value range, and the node load value of the second sliced node meets a preset smaller value range;
a second obtaining unit, configured to obtain a second standard deviation value based on the node load values of the respective patch nodes after the simulated migration;
a third obtaining unit configured to obtain a variation amount based on a difference between the second standard deviation and the first standard deviation;
and the first actual migration unit is used for actually migrating any one of the first data items to the second slicing node when the variation is smaller than zero, and triggering and executing the iteration operation for acquiring the node load value of each slicing node in the search service cluster until the preset termination iteration condition is met.
Optionally, the apparatus further includes:
the second obtaining unit is used for obtaining the current calculated temperature value of the first slicing node based on the product value of the previous calculated temperature value of the first slicing node obtained in the previous iteration and the preset cooling rate when the variation is greater than or equal to zero;
A fourth obtaining unit configured to calculate a quotient of the opposite number of the variation and the current calculated temperature value, and obtain an index value;
a fifth obtaining unit for obtaining a probability value based on the natural constant value e as a base and the exponent value as a power;
A first determination unit configured to determine a random number in a [0,1] range;
and the second actual migration unit is used for actually migrating any one of the first data items to the second sliced node when the random number is smaller than or equal to the probability value, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met.
Optionally, the apparatus further includes:
And the third actual migration unit is used for triggering and executing the iterative operation of simulating and migrating any one of the first data items in the first slicing node to the second slicing node when the random number is larger than the probability value until a preset termination iterative condition is met.
Optionally, the preset iteration termination condition includes one of the following conditions:
The current calculated temperature value of the first slicing node is smaller than or equal to a preset termination temperature value;
the current iteration number of the first slicing node is greater than or equal to a preset maximum iteration number, wherein the current iteration number is equal to the previous iteration number plus one;
The migration failure times of the continuous iteration are larger than or equal to the preset maximum migration failure times, wherein the migration failure times are obtained by adding one to the migration failure times when the random number is larger than the probability value in the iteration process.
Optionally, the obtaining a node load value of each fragment node in the search service cluster includes:
monitoring the key resource occupancy rate of each fragment node in the search service cluster in real time;
when the key resource occupancy rate of any one of the slicing nodes meets the preset excessive occupancy rate condition, acquiring an index value of at least one load index of each data item in each slicing node;
Obtaining a single load value of the corresponding data item based on the index value of the at least one load index of each data item and a preset weight value of the corresponding load index;
and obtaining the load value of the corresponding slicing node based on the single load value of all the data items of each slicing node.
Optionally, the apparatus further includes:
a sixth obtaining unit, configured to obtain, when a preset termination iteration condition is satisfied, a node load value of each shard node in the search service cluster, and obtain a third standard deviation value based on the node load value of each shard node;
The second determining unit is used for determining a third slicing node with the maximum node load value from all the slicing nodes when the third standard deviation value is larger than a preset equalization standard deviation threshold value;
A third obtaining unit, configured to obtain a query frequency index value of each data item in the third slicing node;
and the simulation average division unit is used for dividing the multiple second data items into multiple data item sets in a successive simulation mode when the query frequency index value of each of the multiple second data items in the third slicing node is larger than the product of the query frequency index value of any one of the other second data items in the third slicing node and a preset multiple, and simulating and evenly dividing the multiple data item sets after each simulation split into other slicing nodes except the third slicing node in the search service cluster until a fourth standard deviation value obtained after the simulation split and the simulation average division is smaller than or equal to a preset equilibrium standard deviation threshold value, and then actually equally dividing the multiple data item sets after the simulation split into the other slicing nodes according to the simulation average division mode after the simulation split.
Optionally, the obtaining a node load value of each fragment node in the search service cluster includes:
periodically acquiring node load values of all the fragment nodes in the search service cluster.
According to a third aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of equalizing fragmented data as set forth in any one of the above.
According to a fourth aspect of the present application, there is provided an electronic device comprising: one or more processors; storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of balancing adjustment of sliced data as claimed in any one of the preceding claims.
Compared with the prior art, the scheme provided by the embodiment of the application has at least the following beneficial effects:
The application provides a method, a device, a medium and electronic equipment for equalizing and adjusting fragmented data. The method comprises the following steps: acquiring node load values of all the fragment nodes in the search service cluster; obtaining a first standard deviation value based on node load values of the respective sharded nodes; simulating and migrating any first data item in the first slicing node to a second slicing node, wherein the node load value of the first slicing node meets a preset larger value range, and the node load value of the second slicing node meets a preset smaller value range; obtaining a second standard deviation value based on the node load value of each fragment node after simulation migration; obtaining a variation based on a difference between the second standard deviation and the first standard deviation; and when the variation is smaller than zero, actually migrating any one of the first data items to the second sliced node, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met. The problems of query performance reduction, resource waste and fault tolerance reduction caused by unbalanced fragmented data are solved, dynamic migration and load balancing of fragmented data are realized, and the equality of the query fragmented data is ensured.
Drawings
FIG. 1 shows a flow chart of a method of equalizing adjustment of sliced data in accordance with an embodiment of the present application;
Fig. 2 shows a block diagram of a unit of an equalization adjusting apparatus of fragmented data according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application, these descriptions should not be limited to these terms. These terms are only used to distinguish one from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of embodiments of the application.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or device comprising such elements.
In particular, the symbols and/or numerals present in the description, if not marked in the description of the figures, are not numbered.
Alternative embodiments of the present application will be described in detail below with reference to the accompanying drawings.
The embodiment provided by the application is an embodiment of a method for equalizing and adjusting the sliced data.
An embodiment of the present application will be described in detail with reference to fig. 1.
Step S101, obtaining node load values of all the fragment nodes in the search service cluster.
The search service cluster is a search service system consisting of a plurality of fragment nodes. The sharded nodes form a cluster by communicating and cooperating with each other and jointly process indexing, searching and storing of data. Each of the fragment nodes has the same cluster name, and joins the same search service cluster by setting the same name. The search service cluster can be horizontally expanded, accommodating large amounts of data and requests to provide high availability and performance. For example, a search service cluster that queries for fulfillment data.
A sharded node is a separate server instance or process in a search service cluster. Each sharded node is independent and can run on any one machine in the search service cluster. One of the sharded nodes cooperates by communicating with the other sharded nodes to collectively form a search service cluster. Each sharded node has its own name, role and responsibilities, e.g., data node, master node, or coordinator node.
Slicing is the process of splitting an index into multiple fragments. The index is divided into smaller parts, each of which is called a slice. The shards can be distributed to different shard nodes, thereby enabling distributed storage and processing of data. The slices can be divided into two types, a main slice and a secondary slice. The master shard is used to store data, while the replica shard is used to provide redundancy and high availability of data.
The slicing is the logic division of data, and is distributed to slicing nodes for horizontal expansion and load balancing. A shard is a portion of an index that can be allocated and replicated on any shard node in a cluster.
The node load value is used for representing the energy value generated, converted and consumed by the sliced node in running.
In some specific embodiments, the obtaining the node load value of each fragment node in the search service cluster includes the following steps:
Step S101a-1, monitoring the key resource occupancy rate of each fragment node in the search service cluster in real time.
The key resource occupancy rate includes a CPU occupancy rate and/or a memory occupancy rate.
Step S101a-2, when the critical resource occupancy rate of any of the slicing nodes presets an excessive occupancy rate condition, acquiring an index value of at least one load index of each data item in each slicing node.
For example, the preset excessive occupancy condition includes a CPU occupancy greater than a preset CPU occupancy threshold (e.g., 80%) and/or a memory occupancy greater than a preset memory occupancy threshold (e.g., 90%).
In this embodiment, each data item has an index value of at least one load index, for example, the at least one load index includes: a query frequency index (QPS), a write frequency index (WPS), a query response time index (QRT), a write response time index (WRT), and/or a memory occupancy index (MEM).
Step S101a-3, obtaining a single load value of the corresponding data item based on the index value of the at least one load index of each data item and the preset weight value of the corresponding load index.
For example, the preset weight value of QPS is 0.4; the preset weight value of the WPS is 0.3; the preset weight value of the QRT is 0.1; the preset weight value of WRT is 0.1; the preset weight value of MEM is 0.01.
A single load value, which is used to represent the energy value generated, converted and consumed by the data item in the search service.
For example, the number of the cells to be processed,
Wherein Load represents a single Load value of a data item, W1 represents a preset weight value of QPS, W2 represents a preset weight value of WPS, W3 represents a preset weight value of QRT, W4 represents a preset weight value of WRT, and W5 represents a preset weight value of MEM.
Step S101a-4, obtaining the load value of the corresponding slicing node based on the single load value of all data items of each slicing node.
For example SLoad = Σload; wherein SLoad denotes the load value of the sharded node.
In this embodiment, the key resource occupancy rate of each sliced node in the search service cluster is monitored in real time, and when the key resource occupancy rate of any sliced node meets a preset excessive occupancy rate condition, the sliced data is triggered to be regulated in a balanced manner.
In other embodiments, the obtaining a node load value of each of the sharded nodes in the search service cluster includes:
step S101b, periodically obtaining node load values of each fragment node in the search service cluster.
In this particular embodiment, the node load values of the individual sliced nodes are periodically checked to determine whether or not to trigger a sliced data equalization adjustment. For example, check every 1 hour.
The embodiment of the application can detect the equalization condition of the sliced data in various modes and timely trigger the dynamic adjustment of the sliced data.
Step S102, a first standard deviation value is obtained based on node load values of the respective sharded nodes.
For example, target=std (SLoad (1), SLoad (2), slad (N));
where Target represents a standard deviation value, N represents a positive integer, and SLoad (N) represents a node load value of an nth fragment node.
Step S103, any one of the first data items in the first slicing node is simulated and migrated to the second slicing node.
The node load value of the first slicing node meets a preset larger value range, and the node load value of the second slicing node meets a preset smaller value range.
For example, there are 10 slicing nodes in the search service cluster, and the node load values of the 10 slicing nodes are respectively: 10. 9, 8, 6, 5, 4, 3,2 and 3; dividing node load values of the [10,9,9] three slicing nodes into a preset larger value range, wherein the corresponding slicing nodes belong to a first slicing node; dividing node load values of the [3,2,3] three slicing nodes into a preset smaller value range, wherein the corresponding slicing nodes belong to second slicing nodes.
When the sliced data is balanced, the data item can be migrated from the first sliced node to the second sliced node, and the node load value of the first sliced node is reduced.
The simulated migration may be understood as adding the single load value of any first data item in the first slicing node to the second slicing node for calculation, where the single load value of the first data item is not calculated any more, and is not the actual data item migration.
Step S104, obtaining a second standard deviation value based on the node load value of each sliced node after simulation migration.
Step S105, obtaining a variation based on the difference between the second standard deviation and the first standard deviation.
For example, Δe=target2-target1;
Where Δe represents the amount of change, target2 represents the second standard deviation value, and Target1 represents the first standard deviation value.
And step S106a, when the variation is smaller than zero, actually migrating any one of the first data items to the second slicing node, and triggering and executing the iteration operation of acquiring the node load value of each slicing node in the search service cluster until a preset termination iteration condition is met.
In the embodiment of the application, when the variation is smaller than zero, the node load value of the first sliced node can be reduced after the simulated migration, any one of the first data items accords with the actual migration condition, and any one of the first data items in the first sliced node with higher node load value is actually migrated to the second sliced node with lower node load value, so that one-time data migration is realized. Then, returning to step S101, iterating, and searching for the next first data item meeting the actual migration condition until the preset termination iteration condition is met.
In some embodiments, the method further comprises the steps of:
And step S106b-1, when the variation is greater than or equal to zero, acquiring the current calculated temperature value of the first slicing node based on the product value of the previous calculated temperature value of the first slicing node obtained in the previous iteration and the preset cooling rate.
The variable quantity is larger than or equal to zero, which indicates that the node load value of the first sliced node cannot be reduced after the simulated migration, and any one of the first data items does not accord with the actual migration condition.
The temperature value of the CPU of each of the slice nodes can be obtained from the corresponding slice node, and the calculated temperature value is obtained by calculation based on the temperature value of the CPU.
In this embodiment, a preset cooling rate (for example, 0.95) is provided, and the cooling rate is used to control the cooling rate of the calculated temperature value after each iteration.
And step S106b-2, calculating the quotient of the opposite number of the variation and the current calculated temperature value to obtain an index value.
For example, m= (- Δe)/T; where m represents an index value and T represents a current calculated temperature value.
Step S106b-3, obtaining a probability value based on the natural constant value e as a base and the exponent value as a power.
For example, p=exp (m); where P represents a probability value.
Step S106b-4, determining the random number in the range of [0,1 ].
And step S106b-5a, when the random number is smaller than or equal to the probability value, actually migrating any one of the first data items to the second sliced node, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met.
In this embodiment, when the random number is smaller than or equal to the probability value, it indicates that the node load value of the first sliced node can be reduced after the simulated migration, where any one of the first data items accords with an actual migration condition, and the any one of the first data items in the first sliced node with a higher node load value is actually migrated to a second sliced node with a lower node load value, so that data migration is implemented once. Then, returning to step S101, iterating, and searching for the next first data item meeting the actual migration condition until the preset termination iteration condition is met.
In some embodiments, the method further comprises the steps of:
And step S106b-5b, when the random number is larger than the probability value, triggering and executing the iterative operation of simulating and migrating any one first data item in the first slicing node to the second slicing node until a preset iteration termination condition is met.
In this embodiment, when the random number is greater than the probability value, it indicates that the node load value of the first shard node is unfavorable to decrease after the simulated migration, and the arbitrary first data item does not conform to the actual migration condition, and returns to step S103 to iterate continuously, and find the next first data item that conforms to the actual migration condition until the preset termination iteration condition is satisfied.
In some embodiments, the predetermined termination iteration condition includes one of:
the first condition is that the current calculated temperature value of the first slicing node is smaller than or equal to a preset termination temperature value;
The second condition is that the current iteration number of the first slicing node is larger than or equal to the preset maximum iteration number, wherein the current iteration number is equal to the previous iteration number plus one;
And thirdly, the migration failure times of continuous iteration are larger than or equal to the preset maximum migration failure times, wherein the migration failure times are obtained by adding one to the migration failure times when the random number is larger than the probability value.
For condition one, the temperature value of the CPU of each of the slice nodes can be obtained from the corresponding slice node, and the calculated temperature value is obtained by calculation based on the temperature value of the CPU.
In this embodiment, a preset cooling rate (for example, 0.95) is provided, and the cooling rate is used to control the cooling rate of the calculated temperature value after each iteration. And when the current calculated temperature value of the iterated first slicing node is smaller than or equal to a preset termination temperature value, indicating that the data items in each slicing node in the search service cluster obtain optimal distribution, and terminating the equalization adjustment.
For the second condition, the initial iteration number of the first slicing node is zero, and each iteration of the first slicing node adds one to the iteration number so as to count the iteration number. When the current iteration number of the first slicing node is greater than or equal to the preset maximum iteration number (such as 8000 times), the relative optimal distribution of the data items in each slicing node in the search service cluster is indicated, and the equalization adjustment is terminated.
For the third condition, in the continuous iteration process, no actual migration of any data item is realized, each time no actual migration of any data item is realized, the migration failure is one time, when the migration failure number of the continuous iteration is greater than or equal to the preset maximum migration failure number (for example, 20 times), it is indicated that the data items in each fragment node in the search service cluster obtain the optimal distribution, no data item capable of being migrated is available, and the equalization adjustment is terminated.
Because the data query will have a very high frequency of individual key queries, the equalization problem cannot be solved by adopting the method in any adjustment at this time, and in this extreme case, the embodiment provides another equalization adjustment method for temporary sliced data so as to solve the extreme case.
In some embodiments, the method further comprises the steps of:
And step S111, when a preset iteration termination condition is met, acquiring a node load value of each sliced node in the search service cluster, and acquiring a third standard deviation value based on the node load value of each sliced node.
The preset iteration termination condition is met, which can be understood that the data items in the slicing nodes in the service cluster are searched by the method to obtain the optimal distribution.
And step S112, when the third standard deviation value is larger than a preset equilibrium standard deviation threshold value, determining a third slicing node with the maximum node load value from all the slicing nodes.
The third standard deviation value is greater than a preset equilibrium standard deviation threshold, indicating that an extreme condition which cannot be solved by the method is present.
The third sliced node is the sliced node where the extreme case occurs.
Step S113, obtaining a query frequency index value of each data item in the third slicing node.
Step S114, when the query frequency index values of the second data items in the third slicing node are all greater than the product of the query frequency index value of any one of the other second data items in the third slicing node and a preset multiple, sequentially simulating and splitting the second data items into multiple data item sets, and simulating and splitting the multiple data item sets after each simulation and splitting into other slicing nodes except the third slicing node in the search service cluster until the fourth standard deviation value obtained after the current simulation and splitting and the simulated and splitting is less than or equal to the preset equilibrium standard deviation threshold value, and then actually splitting the multiple data item sets after the current simulation and splitting into the other slicing nodes according to the simulated and splitting mode.
For example, if there are a plurality of high frequency data items (i.e., a plurality of second data items) in the third slicing node, each of the query frequency index values is greater than 1000 times/ms, and the maximum query frequency index value of the other low frequency data items (i.e., the other second data items) is 100 times/ms, the preset multiple is 2 times, and the query frequency index value of the high frequency data item is 10 times the maximum query frequency index value of the low frequency data item, which is far greater than the preset multiple, the reason that the node load value of the third slicing node is high when the plurality of high frequency data items is determined.
In this particular embodiment, a plurality of high frequency data items are simulated and split, and a plurality of data item sets, each including at least one high frequency data item, are provided. And carrying out simulation and average division on the multiple data item sets split in the simulation to other slicing nodes except the third slicing node in the search service cluster, calculating a fourth standard deviation value after the simulation and average division, and if the fourth standard deviation value after the simulation and average division is smaller than or equal to a preset standard deviation threshold value, actually carrying out average division on the multiple data item sets split in the simulation and average division in the other slicing nodes according to the mode of the simulation and average division, thereby realizing the purpose of load balancing under extreme conditions. If the fourth standard deviation value after each simulation is equal to or greater than the preset equilibrium standard deviation threshold value, the simulation splitting is performed again on the plurality of high frequency data items, for example, the splitting number is gradually reduced each time the simulation splitting is performed. And (3) until a fourth standard deviation value obtained by simulation equipartition after the simulation splitting is smaller than or equal to a preset equalization standard deviation threshold value, actually equipartition the multiple data item sets after the simulation splitting into other slicing nodes according to the simulation equipartition mode after the simulation splitting. Therefore, after the optimal distribution is obtained by adopting the balance adjustment, the balance adjustment of the fragmented data can still be realized under the extreme condition of load balance.
The embodiment of the application solves the problems of query performance reduction, resource waste and fault tolerance reduction caused by unbalanced fragmented data, realizes dynamic migration and load balancing of fragmented data, and ensures the balance of query fragmented data.
The present application also provides an embodiment of a device for carrying out the method steps described in the above embodiment, and the explanation based on the meaning of the same names is the same as that of the above embodiment, which has the same technical effects as those of the above embodiment, and is not repeated here.
As shown in fig. 2, the present application provides an equalization adjustment device 200 for fragmented data, comprising:
a first obtaining unit 201, configured to obtain a node load value of each fragment node in the search service cluster;
a first obtaining unit 202, configured to obtain a first standard deviation value based on a node load value of each fragment node;
The simulation migration unit 203 is configured to simulate and migrate any one of the first data items in the first sliced node to a second sliced node, where a node load value of the first sliced node meets a preset larger value range, and a node load value of the second sliced node meets a preset smaller value range;
A second obtaining unit 204, configured to obtain a second standard deviation value based on the node load values of the respective patch nodes after the simulated migration;
A third obtaining unit 205 configured to obtain a variation amount based on a difference between the second standard deviation and the first standard deviation;
And the first actual migration unit 206 is configured to, when the variation is less than zero, actually migrate the any one of the first data items to the second sliced node, and trigger to execute the iterative operation of obtaining the node load value of each sliced node in the search service cluster until a preset termination iteration condition is satisfied.
Optionally, the apparatus further includes:
the second obtaining unit is used for obtaining the current calculated temperature value of the first slicing node based on the product value of the previous calculated temperature value of the first slicing node obtained in the previous iteration and the preset cooling rate when the variation is greater than or equal to zero;
A fourth obtaining unit configured to calculate a quotient of the opposite number of the variation and the current calculated temperature value, and obtain an index value;
a fifth obtaining unit for obtaining a probability value based on the natural constant value e as a base and the exponent value as a power;
A first determination unit configured to determine a random number in a [0,1] range;
and the second actual migration unit is used for actually migrating any one of the first data items to the second sliced node when the random number is smaller than or equal to the probability value, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met.
Optionally, the apparatus further includes:
And the third actual migration unit is used for triggering and executing the iterative operation of simulating and migrating any one of the first data items in the first slicing node to the second slicing node when the random number is larger than the probability value until a preset termination iterative condition is met.
Optionally, the preset iteration termination condition includes one of the following conditions:
The current calculated temperature value of the first slicing node is smaller than or equal to a preset termination temperature value;
the current iteration number of the first slicing node is greater than or equal to a preset maximum iteration number, wherein the current iteration number is equal to the previous iteration number plus one;
The migration failure times of the continuous iteration are larger than or equal to the preset maximum migration failure times, wherein the migration failure times are obtained by adding one to the migration failure times when the random number is larger than the probability value in the iteration process.
Optionally, the obtaining a node load value of each fragment node in the search service cluster includes:
monitoring the key resource occupancy rate of each fragment node in the search service cluster in real time;
when the key resource occupancy rate of any one of the slicing nodes meets the preset excessive occupancy rate condition, acquiring an index value of at least one load index of each data item in each slicing node;
Obtaining a single load value of the corresponding data item based on the index value of the at least one load index of each data item and a preset weight value of the corresponding load index;
and obtaining the load value of the corresponding slicing node based on the single load value of all the data items of each slicing node.
Optionally, the apparatus further includes:
a sixth obtaining unit, configured to obtain, when a preset termination iteration condition is satisfied, a node load value of each shard node in the search service cluster, and obtain a third standard deviation value based on the node load value of each shard node;
The second determining unit is used for determining a third slicing node with the maximum node load value from all the slicing nodes when the third standard deviation value is larger than a preset equalization standard deviation threshold value;
A third obtaining unit, configured to obtain a query frequency index value of each data item in the third slicing node;
and the simulation average division unit is used for dividing the multiple second data items into multiple data item sets in a successive simulation mode when the query frequency index value of each of the multiple second data items in the third slicing node is larger than the product of the query frequency index value of any one of the other second data items in the third slicing node and a preset multiple, and simulating and evenly dividing the multiple data item sets after each simulation split into other slicing nodes except the third slicing node in the search service cluster until a fourth standard deviation value obtained after the simulation split and the simulation average division is smaller than or equal to a preset equilibrium standard deviation threshold value, and then actually equally dividing the multiple data item sets after the simulation split into the other slicing nodes according to the simulation average division mode after the simulation split.
Optionally, the obtaining a node load value of each fragment node in the search service cluster includes:
periodically acquiring node load values of all the fragment nodes in the search service cluster.
The embodiment of the application solves the problems of query performance reduction, resource waste and fault tolerance reduction caused by unbalanced fragmented data, realizes dynamic migration and load balancing of fragmented data, and ensures the balance of query fragmented data.
The present embodiment provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor to enable the at least one processor to perform the method steps described in the embodiments above.
Embodiments of the present application provide a non-transitory computer storage medium storing computer executable instructions that perform the method steps described in the embodiments above.
Finally, it should be noted that: in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. The system or the device disclosed in the embodiments are relatively simple in description, and the relevant points refer to the description of the method section because the system or the device corresponds to the method disclosed in the embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (9)

1. A method for equalizing fragmented data, comprising:
Acquiring node load values of all the fragment nodes in the search service cluster;
Obtaining a first standard deviation value based on node load values of the respective sharded nodes;
Simulating and migrating any first data item in the first slicing node to a second slicing node, wherein the node load value of the first slicing node meets a preset larger value range, and the node load value of the second slicing node meets a preset smaller value range;
obtaining a second standard deviation value based on the node load value of each fragment node after simulation migration;
obtaining a variation based on a difference between the second standard deviation and the first standard deviation;
When the variation is smaller than zero, actually migrating any one of the first data items to the second slicing node, and triggering and executing the iteration operation of acquiring the node load value of each slicing node in the search service cluster until a preset termination iteration condition is met;
wherein the method further comprises:
When the variation is greater than or equal to zero, obtaining a current calculated temperature value of the first slicing node based on a product value of a previous calculated temperature value of the first slicing node obtained in a previous iteration and a preset cooling rate, wherein the calculated temperature value is obtained by calculation based on a temperature value of a CPU, and the temperature value of the CPU of each slicing node can be obtained from the corresponding slicing node;
calculating the quotient of the opposite number of the variation and the current calculated temperature value to obtain an index value;
obtaining a probability value based on a natural constant value e as a base and the exponent value as a power;
determining random numbers in the range of [0,1 ];
And when the random number is smaller than or equal to the probability value, actually migrating any one of the first data items to the second slicing node, and triggering and executing the iteration operation of acquiring the node load value of each slicing node in the search service cluster until a preset termination iteration condition is met.
2. The method according to claim 1, wherein the method further comprises:
And when the random number is larger than the probability value, triggering and executing the iterative operation of simulating and migrating any first data item in the first slicing node to the second slicing node until a preset iteration termination condition is met.
3. The method of claim 1, wherein the preset termination iteration condition comprises one of:
The current calculated temperature value of the first slicing node is smaller than or equal to a preset termination temperature value;
the current iteration number of the first slicing node is greater than or equal to a preset maximum iteration number, wherein the current iteration number is equal to the previous iteration number plus one;
The migration failure times of the continuous iteration are larger than or equal to the preset maximum migration failure times, wherein the migration failure times are obtained by adding one to the migration failure times when the random number is larger than the probability value in the iteration process.
4. The method of claim 1, wherein the obtaining node load values for each of the sharded nodes in the search service cluster comprises:
monitoring the key resource occupancy rate of each fragment node in the search service cluster in real time;
When the key resource occupancy rate of any one of the slicing nodes meets the preset excessive occupancy rate condition, acquiring an index value of at least one load index of each data item in each slicing node;
Obtaining a single load value of the corresponding data item based on the index value of the at least one load index of each data item and a preset weight value of the corresponding load index;
and obtaining the load value of the corresponding slicing node based on the single load value of all the data items of each slicing node.
5. The method according to claim 1, wherein the method further comprises:
when a preset iteration termination condition is met, acquiring a node load value of each fragment node in the search service cluster, and acquiring a third standard deviation value based on the node load value of each fragment node;
when the third standard deviation value is larger than a preset equalization standard deviation threshold value, determining a third slicing node with the maximum node load value from all the slicing nodes;
Acquiring a query frequency index value of each data item in the third slicing node;
When the query frequency index value of each of the plurality of second data items in the third slicing node is larger than the product of the query frequency index value of any one of the other second data items in the third slicing node and a preset multiple, sequentially simulating and splitting the plurality of second data items into a plurality of data item sets, simulating and uniformly splitting the plurality of data item sets after each simulation splitting into other slicing nodes except the third slicing node in the search service cluster until a fourth standard deviation value obtained after the simulation splitting and the simulation uniformly splitting is smaller than or equal to a preset balance standard deviation threshold value, and then practically uniformly dividing the plurality of data item sets after the simulation splitting into the other slicing nodes according to a simulation uniformly splitting mode after the simulation splitting.
6. The method of claim 1, wherein the obtaining node load values for each of the sharded nodes in the search service cluster comprises:
periodically acquiring node load values of all the fragment nodes in the search service cluster.
7. An equalization adjustment device for fragmented data, comprising:
The first acquisition unit is used for acquiring node load values of all the fragment nodes in the search service cluster;
a first obtaining unit, configured to obtain a first standard deviation value based on node load values of the respective sharded nodes;
The simulation migration unit is used for simulating and migrating any one of the first data items in the first sliced node to the second sliced node, wherein the node load value of the first sliced node meets a preset larger value range, and the node load value of the second sliced node meets a preset smaller value range;
a second obtaining unit, configured to obtain a second standard deviation value based on the node load values of the respective patch nodes after the simulated migration;
a third obtaining unit configured to obtain a variation amount based on a difference between the second standard deviation and the first standard deviation;
The first actual migration unit is configured to, when the variation is less than zero, actually migrate the any one of the first data items to the second sliced node, and trigger and execute the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iteration condition is satisfied;
Wherein the apparatus further comprises:
a second obtaining unit, configured to obtain, when the variation is greater than or equal to zero, a current calculated temperature value of the first slicing node based on a product value of a last calculated temperature value of the first slicing node obtained in a previous iteration and a preset cooling rate, where the calculated temperature value is obtained by calculation based on a temperature value of a CPU, and the temperature value of the CPU of each slicing node can be obtained from a corresponding slicing node;
A fourth obtaining unit configured to calculate a quotient of the opposite number of the variation and the current calculated temperature value, and obtain an index value;
a fifth obtaining unit for obtaining a probability value based on the natural constant value e as a base and the exponent value as a power;
A first determination unit configured to determine a random number in a [0,1] range;
and the second actual migration unit is used for actually migrating any one of the first data items to the second sliced node when the random number is smaller than or equal to the probability value, and triggering and executing the iterative operation of acquiring the node load value of each sliced node in the search service cluster until a preset termination iterative condition is met.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 6.
9. An electronic device, comprising:
one or more processors;
Storage means for storing one or more programs,
Wherein the one or more processors implement the method of any of claims 1 to 6 when the one or more programs are executed by the one or more processors.
CN202410095296.3A 2024-01-24 2024-01-24 Method and device for equalizing and adjusting fragmented data, medium and electronic equipment Active CN117632519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410095296.3A CN117632519B (en) 2024-01-24 2024-01-24 Method and device for equalizing and adjusting fragmented data, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410095296.3A CN117632519B (en) 2024-01-24 2024-01-24 Method and device for equalizing and adjusting fragmented data, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN117632519A CN117632519A (en) 2024-03-01
CN117632519B true CN117632519B (en) 2024-05-03

Family

ID=90023715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410095296.3A Active CN117632519B (en) 2024-01-24 2024-01-24 Method and device for equalizing and adjusting fragmented data, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117632519B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5539883A (en) * 1991-10-31 1996-07-23 International Business Machines Corporation Load balancing of network by maintaining in each computer information regarding current load on the computer and load on some other computers in the network
US8051174B2 (en) * 2008-03-03 2011-11-01 Microsoft Corporation Framework for joint analysis and design of server provisioning and load dispatching for connection-intensive server
CN104184813A (en) * 2014-08-20 2014-12-03 杭州华为数字技术有限公司 Load balancing method of virtual machines, related equipment and trunking system
CN109976917B (en) * 2019-04-08 2020-09-11 科大讯飞股份有限公司 Load scheduling method, device, load scheduler, storage medium and system
CN113596153A (en) * 2021-07-28 2021-11-02 新华智云科技有限公司 Data equalization method and system
CN117033004A (en) * 2023-10-10 2023-11-10 苏州元脑智能科技有限公司 Load balancing method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5539883A (en) * 1991-10-31 1996-07-23 International Business Machines Corporation Load balancing of network by maintaining in each computer information regarding current load on the computer and load on some other computers in the network
US8051174B2 (en) * 2008-03-03 2011-11-01 Microsoft Corporation Framework for joint analysis and design of server provisioning and load dispatching for connection-intensive server
CN104184813A (en) * 2014-08-20 2014-12-03 杭州华为数字技术有限公司 Load balancing method of virtual machines, related equipment and trunking system
CN109976917B (en) * 2019-04-08 2020-09-11 科大讯飞股份有限公司 Load scheduling method, device, load scheduler, storage medium and system
CN113596153A (en) * 2021-07-28 2021-11-02 新华智云科技有限公司 Data equalization method and system
CN117033004A (en) * 2023-10-10 2023-11-10 苏州元脑智能科技有限公司 Load balancing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117632519A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN110489059B (en) Data cluster storage method and device and computer equipment
CN107562531B (en) Data equalization method and device
CN103345508B (en) A kind of date storage method being applicable to community network figure and system
US8271523B2 (en) Coordination server, data allocating method, and computer program product
CN110147407B (en) Data processing method and device and database management server
CN108810115B (en) Load balancing method and device suitable for distributed database and server
US10944645B2 (en) Node of a network and a method of operating the same for resource distribution
CN113655969B (en) Data balanced storage method based on streaming distributed storage system
CN105975345A (en) Video frame data dynamic equilibrium memory management method based on distributed memory
CN116860789A (en) Data distribution optimization method and distributed database system
CN111427931A (en) Distributed query engine and method for querying relational database by using same
CN117632519B (en) Method and device for equalizing and adjusting fragmented data, medium and electronic equipment
CN106980540A (en) A kind of computational methods of distributed Multidimensional Discrete data
CN117033004B (en) Load balancing method and device, electronic equipment and storage medium
CN113111351B (en) Test method, test device and computer readable storage medium
CN108304555A (en) Distributed maps data processing method
CN115981848B (en) Memory database fragment adjustment method and equipment
Guo et al. Handling data skew at reduce stage in Spark by ReducePartition
CN110531988B (en) Application program state prediction method and related device
CN113596153A (en) Data equalization method and system
CN113867736A (en) Deployment scheme generation method and device
CN111240577B (en) MPP database-based data multi-fragment storage method and device
CN113190621A (en) Dynamic adjustment method for alliance link data fragmentation, computer equipment and storage medium
US8645525B2 (en) Using prime numbers to manage partitioning in a cluster of nodes of computers
CN113448970B (en) Graph data storage method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant