WO2020134840A1

WO2020134840A1 - Data distribution method and related product

Info

Publication number: WO2020134840A1
Application number: PCT/CN2019/121613
Authority: WO
Inventors: 刘国伟
Original assignee: 深圳云天励飞技术有限公司
Priority date: 2018-12-27
Filing date: 2019-11-28
Publication date: 2020-07-02
Also published as: CN109800204B; CN109800204A

Abstract

A data distribution method and a related product. The method comprises: acquiring hardware resource configuration information reported by each of P nodes so as to obtain P pieces of hardware resource configuration information, each node corresponding to one piece of hardware resource configuration information (101); acquiring first data to be processed (102); dividing, according to the P pieces of hardware resource configuration information, the first data and obtaining P data blocks, each piece of hardware resource configuration information corresponding one data block (103); and respectively distributing the P data blocks to corresponding nodes in the P nodes for processing (104). The method can improve the efficiency of data processing in a computer cluster.

Description

Data distribution method and related products

This application requires the priority of a Chinese patent application filed on December 27, 2018, with the application number 201811613722.9 and the invention titled "Data Distribution Method and Related Products", the entire contents of which are incorporated by reference in this application.

Technical field

This application relates to the technical field of data distribution, and in particular to a data distribution method and related products.

Background technique

With the rapid development of electronic technology, computer cluster technology has also developed rapidly. Computer cluster can be simply understood as establishing a cluster through a server and multiple nodes. During the data processing process, all data is stored in the search cluster on average. In each service node, the data processing efficiency of the computer cluster is reduced.

Summary of the invention

The embodiments of the present application provide a data distribution method and related products, which can improve the data processing efficiency of a computer cluster.

A first aspect of an embodiment of the present application provides a data distribution method, including:

Obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one hardware resource configuration information;

Obtain the first data to be processed;

Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;

Distribute the P data blocks to corresponding nodes of the P nodes for processing.

Optionally, the method further includes:

When Q new nodes are detected, the second data to be processed is obtained, and the Q is a positive integer;

Estimating the upper limit processing data volume of each of the Q new nodes to obtain Q upper limit processing data volume;

When the data amount of the second to-be-processed data is greater than the sum of the Q upper limit processing data amounts, the second to-be-processed data is divided into a first data set and a second according to the Q upper limit processing data amounts A data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;

When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain the hardware resource configuration information of the Q new nodes, and according to the hardware resource configuration of the Q new nodes The information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.

Further optionally, the estimation of the upper limit processing data amount of each of the Q new nodes includes:

Acquiring at least one child node of a new node j, where the new node j is any new node among the Q new nodes;

Determine the data processing upper limit value of the at least one child node to obtain at least one data processing upper limit value;

Acquiring target hardware resource configuration information of the new node j;

Determine the target rated upper limit data processing amount corresponding to the target hardware resource configuration information according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount;

The difference between the target rated upper limit data processing amount and the at least one data processing upper limit value is used as the upper limit data processing amount of the new node j.

A second aspect of an embodiment of the present application provides a data distribution device, including:

The first obtaining unit is used to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;

A second obtaining unit, configured to obtain the first data to be processed;

A dividing unit, configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;

The distribution unit is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.

In a third aspect, an embodiment of the present application provides a server, including a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by the processor. The program includes instructions for performing the steps in the first aspect of the embodiments of the present application.

According to a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect. The computer program product may be a software installation package.

The implementation of the embodiments of the present application has the following beneficial effects:

It can be seen that through the data distribution method and related products described in the embodiments of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration Information, obtain the first data to be processed, divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and distribute the P data blocks to The corresponding nodes of the P nodes are processed, so that the hardware resource configuration information of each reported node of the P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information and distributed to the corresponding nodes. Realize data distribution according to node performance, give full play to the processing power of each node, and improve the data processing efficiency.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

1A is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application;

1B is a schematic diagram of a computer cluster provided by an embodiment of the present application;

1C is a schematic structural diagram of a zookeeper provided by an embodiment of the present application;

2 is a schematic flowchart of another embodiment of a data distribution method according to an embodiment of the present application;

3A is a schematic structural diagram of an embodiment of a data distribution device according to an embodiment of the present application;

3B is a schematic structural diagram of another embodiment of a data distribution device according to an embodiment of the present application;

3C is a schematic structural diagram of another embodiment of a data distribution device according to an embodiment of the present application;

4 is a schematic structural diagram of an embodiment of a data distribution device provided by an embodiment of the present application.

detailed description

Please refer to FIG. 1A, which is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application. The data distribution method described in this embodiment includes the following steps:

101. Obtain hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information.

The above hardware resource configuration information may be at least one of the following: the number of cores of the central processing unit CPU, memory size, load value, whether it is a solid state drive (Solid State Drives), etc., which are not limited herein.

Specifically, as shown in FIG. 1B, the server can establish a communication connection with P nodes, the server can receive the hardware resource configuration information reported by P nodes, and each node can set the hardware resources of the node at preset time intervals. Configuration information, the preset time interval can be set by the user or the system default.

102. Obtain first to-be-processed data.

Wherein, the first data to be processed may be at least one of the following: images, files, signaling, video, voice signals, optical signals, text information, etc., which are not limited herein.

In a specific implementation, the first data to be processed may come from the terminal, or may be data forwarded from the previous node or the next node. The first data to be processed may be real-time data or pre-stored data, which is not limited herein.

103. Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block.

Among them, the different hardware resource configuration information reflects the different processing capabilities of the node. In the embodiments of the present application, based on the strong processing capacity, you can process more data, and the processing capacity is weak, you can process less data. The P pieces of hardware resource configuration information divide the first data to be processed to obtain P pieces of data. Each piece of hardware resource configuration information corresponds to a data block. In this way, the processing power of each node is fully utilized to improve data processing efficiency.

Optionally, in the above step 102, dividing the first data to be processed according to the P hardware resource configuration information to obtain P data blocks may include the following steps:

21. Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;

22. According to the P performance evaluation values, determine the allocation ratio value corresponding to each of the P nodes to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;

23. Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.

Among them, the hardware resource configuration information reflects the performance of the node to a certain extent. Therefore, the server can evaluate the performance of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values. The P performance evaluation values determine the distribution proportion value corresponding to each node in the P nodes to obtain P distribution proportion values. Specifically, the sum of the P performance evaluation values can be calculated first. For any node, the distribution proportion of the node Value = the performance evaluation value of the node/the sum of the P performance evaluation values, and so on, the distribution ratio value of each node in the P nodes can be obtained. Further, the first data to be processed is based on the P distribution ratio value By dividing, P databases are obtained, so that the data to be processed is distributed according to the performance, and the operation efficiency of the entire system is improved.

Optionally, the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;

In the above step 21, determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information to obtain P performance evaluation values may include the following steps:

211. Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, where the hardware resource configuration information i is the P pieces of hardware Any hardware resource configuration information in the configuration resource information;

212. Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;

213. Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the preset mapping relationship between the load value and the third evaluation value;

214. Obtain a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value. The first weight value, all The sum of the second weight and the third weight is 1;

215. Perform a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value To obtain the evaluation value corresponding to the hardware resource configuration information i.

Wherein, the first weight value, the second weight value, and the third weight value may be preset or the system defaults, and the sum of the first weight value, the second weight value, and the third weight value is 1. The server may pre-store the mapping relationship between the preset number of cores and the first evaluation value, the mapping relationship between the preset memory size and the second evaluation value, and the relationship between the preset load value and the third evaluation value Mapping relations. The first evaluation value, the second evaluation value, and the third evaluation value are between 0 and 1, or between 0 and 100. For example, the following provides a mapping relationship between the core number and the first evaluation value:

核数Core number	第一权值First weight
11	0.20.2

22	0.40.4
33	0.60.6
44	0.80.8
>5>5	11

In a specific implementation, taking the hardware resource configuration information i as an example, the hardware resource configuration information i is any hardware resource configuration information in the P pieces of hardware configuration resource information. If the hardware resource configuration information includes: the number of cores and memory of the central processing unit Size and load value, you can determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, and according to the preset memory size and The mapping relationship between the second evaluation values determines the target second evaluation value corresponding to the memory size in the hardware resource configuration information i, and the hardware resource configuration information is determined according to the mapping relationship between the preset load value and the third evaluation value the target third evaluation value corresponding to the load value in i, obtaining the first weight value corresponding to the first evaluation value, the second weight value corresponding to the second evaluation value, and the third weight value corresponding to the third evaluation value, the first weight The sum of the value, the second weight and the third weight is 1, based on the target first evaluation value, target second evaluation value, target third evaluation value, first weight value, second weight value and third weight value Weighted operation to obtain the evaluation value corresponding to the hardware resource configuration information i, that is, the evaluation value corresponding to the hardware resource configuration information i = target first evaluation value * first weight value + target second evaluation value * second weight value + target third Evaluation value * Third weight value. In this way, the performance of each node can be evaluated from multiple dimensions of hardware resource configuration information, which improves the accuracy of node performance evaluation.

104. Distribute the P data blocks to corresponding nodes of the P nodes for processing.

After the server completes the data distribution, the P data blocks can be distributed to the corresponding nodes of the P nodes for processing. In this way, the performance of the nodes can be fully utilized to distribute the data differentially, ensuring that the nodes with strong performance can be properly Process more data, and nodes with weak performance can process relatively few points of data, which can ensure that the data is processed in time, avoiding data backlog and improving data processing efficiency.

Optionally, after the above step 104, the following steps may also be included:

A1. When Q new nodes are detected, obtain second to-be-processed data, where Q is a positive integer;

A2. Estimate the upper limit processing data volume of each of the Q new nodes to obtain Q upper limit processing data volume;

A3. When the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts, divide the second to-be-processed data into a first data set according to the Q upper-limit processed data amounts A second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;

A4. When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processing data amounts, obtain hardware resource configuration information of the Q new nodes, and based on the hardware of the Q new nodes The resource configuration information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.

The second to-be-processed data may be at least one of the following: images, files, signaling, video, voice signals, optical signals, text information, etc., which are not limited herein.

In a specific implementation, the second to-be-processed data may come from the terminal, or may be data forwarded from the previous node or the next node. The second data to be processed may be real-time data or pre-stored data, which is not limited herein.

It can be known from the above that the server can form a cluster with P nodes, and of course, other nodes can be added to the cluster. In a specific implementation, the server can detect whether a new node appears in the cluster. If Q new nodes are detected, the second pending data can be obtained, and Q is a positive integer. Due to the limited capacity of each node, more than a certain amount of data processing, there will be stuck or crash phenomenon, but instead reduces the efficiency of data processing. Therefore, in the embodiment of the present application, the upper limit processing data amount of each of the Q new nodes can be estimated to obtain Q upper limit processing data amounts. The upper limit processing data amount can be understood as the maximum amount of data that can be processed. When the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts, it means that the Q new nodes cannot process the second to-be-processed data alone. Therefore, the second to-be-processed data can be processed according to the Q upper-limit processed data amounts The data is divided into a first data set and a second data set, where the data amount of the first data set is less than the sum of the Q upper limit processing data amounts. Furthermore, the first data set can be allocated by Q new nodes The two data sets are allocated by the P nodes. For a specific allocation method, reference may be made to the specific description of steps 101 to 104 above, and details are not described herein again.

In addition, when the data volume of the second data to be processed is less than or equal to the sum of the Q upper limit processing data volume, it means that the Q new nodes are fully capable of processing the second data to be processed, and the Q new nodes may be given priority. Processing the second data to be processed, further, the hardware resource configuration information of the Q new nodes can be obtained, and the second data to be processed is divided according to the hardware resource configuration information of the Q new nodes to obtain Q data blocks. The Q data blocks are distributed to the corresponding nodes of the Q new nodes for processing. For the specific allocation method, reference may be made to the specific descriptions of the above steps 101-104, which will not be repeated here. In this way, new nodes can be fully and quickly added for data processing, and data can be reasonably distributed among the nodes, which improves data processing efficiency.

Further optionally, in the above step A2, estimating the upper limit processing data amount of each of the Q new nodes may include the following steps:

A21. Acquire at least one child node of a new node j, where the new node j is any one of the Q new nodes;

A22. Determine the data processing upper limit value of the at least one child node to obtain at least one data processing upper limit value;

A23. Obtain target hardware resource configuration information of the new node j;

A24. Determine the target rated upper limit data processing amount corresponding to the target hardware resource configuration information according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount;

A25. The difference between the target rated upper limit data processing amount and the at least one data processing upper limit value is used as the upper limit data processing amount of the new node j.

Among them, for each new node, each node may be a server. Of course, the node may also have one or more child nodes. If the child node, there will be some interaction with the parent node. Therefore, there is Certain resources may also be occupied. Therefore, in a specific implementation, the server may acquire at least one child node of the new node j, where the new node j is any new node among the Q new nodes, and determine the data of the at least one child node Process the upper limit value to obtain at least one data processing upper limit value. The data processing upper limit value of each node can be preset or the system default. The server can also store the preset hardware resource configuration information and the rated upper limit data processing amount in advance. After the target hardware resource configuration information of the new node j is acquired, the target corresponding to the target hardware resource configuration information can be determined according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount Rated upper limit data processing volume, the difference between the target rated upper limit data processing volume and at least one data processing upper limit value is taken as the upper limit data processing volume of the new node j. The above rated upper limit data processing volume can be understood as before the product leaves the factory. The maximum amount of data that can be processed in advance is set. In this way, the upper limit of the amount of data processing can be estimated to the maximum on the basis of ensuring that the new node and its child nodes can work normally, which is conducive to ensuring the stability of the system and improving system processing. effectiveness.

Optionally, after the above step 104, the following steps may also be included:

When it is detected that the load value of any one of the P nodes exceeds a preset threshold, the node information corresponding to the any node is deleted, or an alarm message is sent to the administrator.

The above-mentioned preset threshold can be set by the user or the system default. The server can detect the load value of any one of the P nodes. If the load value exceeds the preset threshold, the node information corresponding to any one of the nodes can be deleted. In this way, the normal operation of the other nodes of the system can be ensured. The administrator sends an alarm message. The alarm message may include at least one of the following: node location, node work abnormality, abnormality level, etc., which is not limited here, so the administrator can be promptly reminded to perform maintenance on any node.

In addition, in the specific implementation, because the traditional computer cluster data processing method is to store all the data equally in each service node of the search cluster, according to this approach, there are the following disadvantages: 1. Due to the server of each node There are differences in the number of CPUs, memory size, and whether the hardware configuration such as SSD, resulting in inconsistency in the amount of data that each search node can store and the ability to process queries, so the resources used cannot be used reasonably; 2. If the data is distributed equally, if The query needs to wait for all search nodes to return results. There may be a search service with a low configuration search node query that slows down the entire service; 3. The dynamic expansion of the node cannot be performed. When the amount of data is increasing, new search nodes are added. There is a problem of data scouring, which increases maintenance costs, and this application allocates data according to the performance of the nodes, which ensures that each node can effectively exert its processing capacity and improve the efficiency of data processing.

To illustrate, taking zookeeper as an example, ZooKeeper is a distributed, open source distributed application coordination service. As shown in FIG. 1C, the zookeeper can be located on the server. The zookeeper can include at least the following Several modules: register group (registerGroup), load group (loadGroup), configuration group (configGroup), quantity group (amountGroup), you can register all search nodes to zookeeper (for example: it can be used as a registration center, subsequent data (Distribution and data request use) Save the registration information in registerGroup. The registration information can be at least one of the following: node name, node type, node function, etc., which is not limited here, and the hardware resource configuration information of all search nodes ( For example, the number of CPU cores, memory size, and SSD) are registered in the configGroup, and the report reports the current number of nodes running load (loadaverage) every 10 seconds. The zookeeper is stored in the loadGroup. Report that the total number of records currently stored by each search node is stored in the amountGroup.

Further, when the data increases, if the configGroup hardware configuration is resolved to the same, the total number of records of each node of the amountGroup is parsed, and the current service load of each search node of the loadGroup is loaded. If the current server load gap of each node search node is within 10 values, only the total number of nodes is distributed, and the total number of each node is proportional (Example 5: 4:3:1, then the amount of data delivered is 1:3:4:5 multiple delivery, there is also a situation when a node is added when a node ratio is very low (for example, 100:89:1, etc., first distribute the current data to the least data search node) Relationship. If the configuration of each search service node is different, for example, the number of CPU cores is compared to 5:4:2, the data distribution is distributed according to the relationship of 1.5:1.4:1.2 times, and the data of the memory ratio 5:4:2 is also according to 1.5:1.4:1.2 The relationship is distributed twice, whether the SSD needs to be in accordance with the relationship of 1:1.5 times (because the use of SSD can increase the efficiency of loading data from disk by 30%-50%), multiple ratios are calculated according to an average method of a hardware configuration Proportional relationship, and then distribute the data according to the first step.

Further, for the load of each search server, if the current server hardware resource gap is relatively small, and the data amount of each search node is not large, it may be caused by some hardware conditions or abnormal programs (for example: program coding problems cause deadlocks) The current search server load is relatively high. If such a problem occurs, the current abnormal node distribution can be eliminated as much as possible. Due to the high load, it may take a long time to query and increase data, and if you continue to add data or query data, it will cause it The service load is getting higher and higher until the service hangs. When the number of data loads exceeds 1.5 times the number of its CPU cores for 1 minute, it actively initiates the deletion of the corresponding registration information from registerGroup, loadGroup, configGroup, and amountGroup.

Further, when the data is divided into the data collection table list corresponding to different service nodes according to the above situation, then all data nodes that can be normally used are obtained from the registerGroup, and the batch data of the corresponding node list collection conditions are matched to the corresponding Search node. Asynchronous multithreading is used to add operations simultaneously, and each thread corresponds to a hypertext transfer protocol http to add a search node in batches.

Further, when the user initiates a query request, first obtain all available search nodes from the registerGroup, use asynchronous multi-threaded requests to send the request parameters to each request node, and set the query for each HTTP request for 10 seconds. Overtime, there may be a long time for a search node query in the cluster, which results in a long time for the entire request, so setting a single query timeout time is to lose the query and return only part of the results to improve search performance.

Further, when it is found in the service that the node responsible for the current search has a load average (load average) lasting more than 1.5 times its CPU core number for 1 minute, in addition to deleting the corresponding node information in registerGroup, loadGroup, configGroup, amountGroup, it also needs to be able to take the initiative Send emails or text messages (for example, call the mail and SMS service interface) to notify the corresponding operation and maintenance personnel, timely check and solve the problem, and the corresponding information is re-registered in each group theme of zookeeper to ensure that the service can provide services normally.

It can be seen that, through the data distribution method described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain The first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes The corresponding nodes in the process are processed, so that the hardware resource configuration information of each reporting node of P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information, and distributed to the corresponding nodes, which realizes the basis The performance of the nodes is used for data distribution, which fully exerts the processing power of each node and improves the efficiency of data processing.

Consistent with the above, please refer to FIG. 2, which is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application. The data distribution method described in this embodiment includes the following steps:

201. Obtain hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information.

202. Obtain first data to be processed.

203. Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block.

204. Distribute the P data blocks to corresponding nodes of the P nodes for processing.

205. When Q new nodes are detected, obtain second to-be-processed data, the Q positive integers.

206. Estimate the upper limit processing data amount of each of the Q new nodes to obtain Q upper limit processing data amounts.

207. When the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts, divide the second to-be-processed data into the first data set according to the Q upper-limit processed data amounts In the second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes.

208. When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain hardware resource configuration information of the Q new nodes, and based on the hardware of the Q new nodes The resource configuration information divides the second data to be processed to obtain Q data blocks, and distributes the Q databases to nodes corresponding to the Q new nodes for processing.

For the data distribution method described in the above steps 201-208, reference may be made to the corresponding steps of the data distribution method described in FIG. 1A.

It can be seen that, through the data distribution method described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain The first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes The corresponding node in the process is processed, and when Q new nodes are detected, the second data to be processed is obtained, Q positive integers, and the upper limit processing data amount of each new node in the Q new nodes is estimated to obtain Q upper limits Processing data volume, when the data volume of the second to-be-processed data is greater than the sum of the Q upper-limit processing data volumes, the second to-be-processed data is divided into the first data set and the second data set according to the Q upper-limit processing data volume, and The first data set is allocated by Q new nodes, and the second data set is allocated by P nodes. When the data amount of the second data to be processed is less than or equal to the sum of the Q upper limit processing data amounts, Q new nodes are acquired The hardware resource configuration information of the node, and the second data to be processed is divided according to the hardware resource configuration information of the Q new nodes to obtain Q data blocks, and Q databases are distributed to the corresponding nodes of the Q new nodes for processing In this way, it is possible to determine the hardware resource configuration information of each reported node of P nodes, and divide the data to be processed into P shares according to the hardware resource configuration information, and distribute to the corresponding nodes, to achieve data distribution according to the performance of the node, fully By exerting the processing power of each node, on the basis of ensuring that the new node and its child nodes can work normally, the upper limit of the amount of data processing can be estimated to the greatest extent, which is conducive to fully ensuring the stability of the system and improving the processing efficiency of the system.

Consistent with the above, the following is an apparatus for implementing the above data distribution method, specifically as follows:

Please refer to FIG. 3, which is a schematic structural diagram of an embodiment of a data distribution device according to an embodiment of the present application. The data distribution device described in this embodiment includes: a first acquisition unit 301, a second acquisition unit 302, a division unit 303, and a distribution unit 304, as follows:

The first obtaining unit 301 is configured to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;

The second obtaining unit 302 is configured to obtain the first data to be processed;

The dividing unit 303 is configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;

The distribution unit 304 is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.

It can be seen that, through the data distribution device described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain The first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes The corresponding nodes in the process are processed, so that the hardware resource configuration information of each reporting node of P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information, and distributed to the corresponding nodes, which realizes the basis The performance of nodes is used for data distribution, which fully exerts the processing power of each node and improves the efficiency of data processing.

Among them, the first obtaining unit 301 may be used to implement the method described in step 101 above, the second obtaining unit 302 may be used to implement the method described in step 102 above, and the dividing unit 303 may be used to implement the method described in step 103 above, The above-mentioned distribution unit 304 may be used to implement the method described in step 104 above, and so on.

In a possible example, in terms of dividing the first data to be processed according to the P pieces of hardware resource configuration information to obtain P pieces of data, the dividing unit 303 is specifically configured to:

Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;

Determining the allocation ratio value corresponding to each of the P nodes according to the P performance evaluation values, to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;

Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.

In a possible example, the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;

In terms of determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtaining P performance evaluation values, the dividing unit 303 is specifically configured to:

Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;

Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;

Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the mapping relationship between the preset load value and the third evaluation value;

Acquiring a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value, the first weight value, the third weight value The sum of the second weight and the third weight is 1;

Performing a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value to obtain The evaluation value corresponding to the hardware resource configuration information i.

Optionally, as shown in FIG. 3B, FIG. 3B shows yet another modified structure of the data distribution device described in FIG. 3A. Compared with FIG. 3A, it may further include: an estimation unit 305 and a processing unit 306, specifically as follows:

The second obtaining unit 302 is further configured to obtain second to-be-processed data when Q new nodes are detected, where Q is a positive integer;

The estimation unit 305 is configured to estimate the upper limit processing data amount of each of the Q new nodes to obtain Q upper limit processing data amounts;

The processing unit 306 is further configured to divide the second to-be-processed data into the second to-be-processed data according to the Q upper-limit processed data amounts when the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts A first data set and a second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes; and in the second When the data amount of the processed data is less than or equal to the sum of the Q upper limit processed data amounts, the hardware resource configuration information of the Q new nodes is obtained, and the second The data to be processed is divided to obtain Q data blocks, and the Q data blocks are distributed to nodes corresponding to the Q new nodes for processing.

Optionally, as shown in FIG. 3C, FIG. 3C shows another modified structure of the data distribution device described in FIG. 3A. Compared with FIG. 3A, it may further include: an early warning unit 307, as follows:

The early warning unit 307 is configured to delete the node information corresponding to any node or send an alarm message to the administrator when it is detected that the load value of any one of the P nodes exceeds a preset threshold.

It can be understood that the functions of each program module of the data distribution apparatus of this embodiment may be specifically implemented according to the method in the above method embodiment, and the specific implementation process may refer to the related description of the above method embodiment, which will not be repeated here.

Consistent with the above, please refer to FIG. 4, which is a schematic structural diagram of an embodiment of a server according to an embodiment of the present application. The server described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and memory 4000, the above input device 1000, output device 2000, processor 3000, and memory 4000 is connected via bus 5000.

The input device 1000 may specifically be a touch panel, physical buttons, or a mouse.

The above output device 2000 may specifically be a display screen.

The above-mentioned memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. The above memory 4000 is used to store a set of program codes, and the above input device 1000, output device 2000, and processor 3000 are used to call the program codes stored in the memory 4000, and perform the following operations:

The aforementioned processor 3000 is used for:

Obtain the first data to be processed;

It can be seen that through the server described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a piece of hardware resource configuration information to obtain the first For the data to be processed, divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and distribute the P data blocks to P nodes Process, so that it can determine the hardware resource configuration information of each reported node of P nodes, and divide the data to be processed into P shares according to the hardware resource configuration information, and distribute to the corresponding nodes, according to the performance of the node Carry out data distribution, give full play to the processing power of each node, and improve the efficiency of data processing.

The content specifically processed by the processor 3000 is reflected in the above data distribution method, and will not be repeated here.

An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes some or all steps of any one of the data distribution methods described in the foregoing method embodiments.

An embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute any of the above described embodiments of the present application Or all the steps described in this data distribution method. The computer program product may be a software installation package.

Claims

A data distribution method, characterized in that it includes:

Obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one hardware resource configuration information;

Obtain the first data to be processed;

Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;

Distribute the P data blocks to corresponding nodes of the P nodes for processing.
The method according to claim 1, wherein the dividing the first data to be processed according to the P pieces of hardware resource configuration information to obtain P pieces of data includes:

Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;

Determining the allocation ratio value corresponding to each of the P nodes according to the P performance evaluation values, to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;

Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.
The method according to claim 2, wherein the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;

The determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information to obtain P performance evaluation values includes:

Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;

Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;

Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the mapping relationship between the preset load value and the third evaluation value;

Acquiring a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value, the first weight value, the third weight value The sum of the second weight and the third weight is 1;

Performing a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value to obtain The evaluation value corresponding to the hardware resource configuration information i.
The method according to any one of claims 1-3, wherein the method further comprises:

When Q new nodes are detected, the second data to be processed is obtained, and the Q is a positive integer;

Estimating the upper limit processing data volume of each of the Q new nodes to obtain Q upper limit processing data volume;

When the data amount of the second to-be-processed data is greater than the sum of the Q upper limit processing data amounts, the second to-be-processed data is divided into a first data set and a second according to the Q upper limit processing data amounts A data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;

When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain the hardware resource configuration information of the Q new nodes, and according to the hardware resource configuration of the Q new nodes The information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.
The method according to any one of claims 1-3, wherein the method further comprises:

When it is detected that the load value of any one of the P nodes exceeds a preset threshold, the node information corresponding to the any node is deleted, or an alarm message is sent to the administrator.
A data distribution device, characterized in that it includes:

The first obtaining unit is used to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;

A second obtaining unit, configured to obtain the first data to be processed;

A dividing unit, configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;

The distribution unit is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.
The apparatus according to claim 6, wherein in terms of dividing the first data to be processed according to the P pieces of hardware resource configuration information to obtain P pieces of data, the dividing unit is specifically used for :

Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;

Determining the allocation ratio value corresponding to each of the P nodes according to the P performance evaluation values, to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;

Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.
The device according to claim 7, wherein the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processing unit;

In terms of determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information and obtaining P performance evaluation values, the dividing unit is specifically used to:

Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;

Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;

Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the mapping relationship between the preset load value and the third evaluation value;

Acquiring a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value, the first weight value, the third weight value The sum of the second weight and the third weight is 1;

Performing a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value to obtain The evaluation value corresponding to the hardware resource configuration information i.
A server, characterized in that it includes a processor and a memory, the memory is used to store one or more programs, and is configured to be executed by the processor, the program includes a program for executing any of claims 1-5 An instruction of steps in a method as described.
A computer-readable storage medium storing a computer program, the computer program is executed by a processor to implement the method according to any one of claims 1-5.