WO2020134840A1 - Data distribution method and related product - Google Patents

Data distribution method and related product Download PDF

Info

Publication number
WO2020134840A1
WO2020134840A1 PCT/CN2019/121613 CN2019121613W WO2020134840A1 WO 2020134840 A1 WO2020134840 A1 WO 2020134840A1 CN 2019121613 W CN2019121613 W CN 2019121613W WO 2020134840 A1 WO2020134840 A1 WO 2020134840A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
configuration information
resource configuration
hardware resource
nodes
Prior art date
Application number
PCT/CN2019/121613
Other languages
French (fr)
Chinese (zh)
Inventor
刘国伟
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020134840A1 publication Critical patent/WO2020134840A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/177Initialisation or configuration control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • This application relates to the technical field of data distribution, and in particular to a data distribution method and related products.
  • Computer cluster can be simply understood as establishing a cluster through a server and multiple nodes. During the data processing process, all data is stored in the search cluster on average. In each service node, the data processing efficiency of the computer cluster is reduced.
  • the embodiments of the present application provide a data distribution method and related products, which can improve the data processing efficiency of a computer cluster.
  • a first aspect of an embodiment of the present application provides a data distribution method, including:
  • the method further includes:
  • the second data to be processed is obtained, and the Q is a positive integer
  • the second to-be-processed data is divided into a first data set and a second according to the Q upper limit processing data amounts A data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;
  • the hardware resource configuration information of the Q new nodes When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain the hardware resource configuration information of the Q new nodes, and according to the hardware resource configuration of the Q new nodes The information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.
  • the estimation of the upper limit processing data amount of each of the Q new nodes includes:
  • the difference between the target rated upper limit data processing amount and the at least one data processing upper limit value is used as the upper limit data processing amount of the new node j.
  • a second aspect of an embodiment of the present application provides a data distribution device, including:
  • the first obtaining unit is used to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;
  • a second obtaining unit configured to obtain the first data to be processed
  • a dividing unit configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
  • the distribution unit is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.
  • an embodiment of the present application provides a server, including a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by the processor.
  • the program includes instructions for performing the steps in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.
  • an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect.
  • the computer program product may be a software installation package.
  • the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration Information, obtain the first data to be processed, divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and distribute the P data blocks to The corresponding nodes of the P nodes are processed, so that the hardware resource configuration information of each reported node of the P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information and distributed to the corresponding nodes. Realize data distribution according to node performance, give full play to the processing power of each node, and improve the data processing efficiency.
  • FIG. 1A is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application
  • FIG. 1B is a schematic diagram of a computer cluster provided by an embodiment of the present application.
  • 1C is a schematic structural diagram of a zookeeper provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of another embodiment of a data distribution method according to an embodiment of the present application.
  • 3A is a schematic structural diagram of an embodiment of a data distribution device according to an embodiment of the present application.
  • 3B is a schematic structural diagram of another embodiment of a data distribution device according to an embodiment of the present application.
  • 3C is a schematic structural diagram of another embodiment of a data distribution device according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of a data distribution device provided by an embodiment of the present application.
  • FIG. 1A is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application.
  • the data distribution method described in this embodiment includes the following steps:
  • the above hardware resource configuration information may be at least one of the following: the number of cores of the central processing unit CPU, memory size, load value, whether it is a solid state drive (Solid State Drives), etc., which are not limited herein.
  • the server can establish a communication connection with P nodes, the server can receive the hardware resource configuration information reported by P nodes, and each node can set the hardware resources of the node at preset time intervals. Configuration information, the preset time interval can be set by the user or the system default.
  • the first data to be processed may be at least one of the following: images, files, signaling, video, voice signals, optical signals, text information, etc., which are not limited herein.
  • the first data to be processed may come from the terminal, or may be data forwarded from the previous node or the next node.
  • the first data to be processed may be real-time data or pre-stored data, which is not limited herein.
  • the different hardware resource configuration information reflects the different processing capabilities of the node.
  • the P pieces of hardware resource configuration information divide the first data to be processed to obtain P pieces of data.
  • Each piece of hardware resource configuration information corresponds to a data block. In this way, the processing power of each node is fully utilized to improve data processing efficiency.
  • dividing the first data to be processed according to the P hardware resource configuration information to obtain P data blocks may include the following steps:
  • the hardware resource configuration information reflects the performance of the node to a certain extent. Therefore, the server can evaluate the performance of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values.
  • the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;
  • determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information to obtain P performance evaluation values may include the following steps:
  • the first weight value, the second weight value, and the third weight value may be preset or the system defaults, and the sum of the first weight value, the second weight value, and the third weight value is 1.
  • the server may pre-store the mapping relationship between the preset number of cores and the first evaluation value, the mapping relationship between the preset memory size and the second evaluation value, and the relationship between the preset load value and the third evaluation value Mapping relations.
  • the first evaluation value, the second evaluation value, and the third evaluation value are between 0 and 1, or between 0 and 100. For example, the following provides a mapping relationship between the core number and the first evaluation value:
  • the hardware resource configuration information i is any hardware resource configuration information in the P pieces of hardware configuration resource information. If the hardware resource configuration information includes: the number of cores and memory of the central processing unit Size and load value, you can determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, and according to the preset memory size and The mapping relationship between the second evaluation values determines the target second evaluation value corresponding to the memory size in the hardware resource configuration information i, and the hardware resource configuration information is determined according to the mapping relationship between the preset load value and the third evaluation value the target third evaluation value corresponding to the load value in i, obtaining the first weight value corresponding to the first evaluation value, the second weight value corresponding to the second evaluation value, and the third weight value corresponding to the third evaluation value, the first weight The sum of the value, the second weight and the third weight is 1, based on the target first evaluation value, target second evaluation value, target
  • the P data blocks can be distributed to the corresponding nodes of the P nodes for processing.
  • the performance of the nodes can be fully utilized to distribute the data differentially, ensuring that the nodes with strong performance can be properly Process more data, and nodes with weak performance can process relatively few points of data, which can ensure that the data is processed in time, avoiding data backlog and improving data processing efficiency.
  • step 104 the following steps may also be included:
  • the second to-be-processed data may be at least one of the following: images, files, signaling, video, voice signals, optical signals, text information, etc., which are not limited herein.
  • the second to-be-processed data may come from the terminal, or may be data forwarded from the previous node or the next node.
  • the second data to be processed may be real-time data or pre-stored data, which is not limited herein.
  • the server can form a cluster with P nodes, and of course, other nodes can be added to the cluster.
  • the server can detect whether a new node appears in the cluster. If Q new nodes are detected, the second pending data can be obtained, and Q is a positive integer. Due to the limited capacity of each node, more than a certain amount of data processing, there will be stuck or crash phenomenon, but instead reduces the efficiency of data processing. Therefore, in the embodiment of the present application, the upper limit processing data amount of each of the Q new nodes can be estimated to obtain Q upper limit processing data amounts.
  • the upper limit processing data amount can be understood as the maximum amount of data that can be processed.
  • the second to-be-processed data can be processed according to the Q upper-limit processed data amounts
  • the data is divided into a first data set and a second data set, where the data amount of the first data set is less than the sum of the Q upper limit processing data amounts.
  • the first data set can be allocated by Q new nodes
  • the two data sets are allocated by the P nodes.
  • the data volume of the second data to be processed is less than or equal to the sum of the Q upper limit processing data volume, it means that the Q new nodes are fully capable of processing the second data to be processed, and the Q new nodes may be given priority.
  • the hardware resource configuration information of the Q new nodes can be obtained, and the second data to be processed is divided according to the hardware resource configuration information of the Q new nodes to obtain Q data blocks.
  • the Q data blocks are distributed to the corresponding nodes of the Q new nodes for processing.
  • estimating the upper limit processing data amount of each of the Q new nodes may include the following steps:
  • A21 Acquire at least one child node of a new node j, where the new node j is any one of the Q new nodes;
  • A22 Determine the data processing upper limit value of the at least one child node to obtain at least one data processing upper limit value
  • A24 Determine the target rated upper limit data processing amount corresponding to the target hardware resource configuration information according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount;
  • the difference between the target rated upper limit data processing amount and the at least one data processing upper limit value is used as the upper limit data processing amount of the new node j.
  • each node may be a server.
  • the node may also have one or more child nodes. If the child node, there will be some interaction with the parent node. Therefore, there is Certain resources may also be occupied. Therefore, in a specific implementation, the server may acquire at least one child node of the new node j, where the new node j is any new node among the Q new nodes, and determine the data of the at least one child node Process the upper limit value to obtain at least one data processing upper limit value.
  • the data processing upper limit value of each node can be preset or the system default.
  • the server can also store the preset hardware resource configuration information and the rated upper limit data processing amount in advance.
  • the target corresponding to the target hardware resource configuration information can be determined according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount Rated upper limit data processing volume, the difference between the target rated upper limit data processing volume and at least one data processing upper limit value is taken as the upper limit data processing volume of the new node j.
  • the above rated upper limit data processing volume can be understood as before the product leaves the factory.
  • the maximum amount of data that can be processed in advance is set. In this way, the upper limit of the amount of data processing can be estimated to the maximum on the basis of ensuring that the new node and its child nodes can work normally, which is conducive to ensuring the stability of the system and improving system processing. effectiveness.
  • step 104 the following steps may also be included:
  • the node information corresponding to the any node is deleted, or an alarm message is sent to the administrator.
  • the above-mentioned preset threshold can be set by the user or the system default.
  • the server can detect the load value of any one of the P nodes. If the load value exceeds the preset threshold, the node information corresponding to any one of the nodes can be deleted. In this way, the normal operation of the other nodes of the system can be ensured.
  • the administrator sends an alarm message.
  • the alarm message may include at least one of the following: node location, node work abnormality, abnormality level, etc., which is not limited here, so the administrator can be promptly reminded to perform maintenance on any node.
  • the traditional computer cluster data processing method is to store all the data equally in each service node of the search cluster, according to this approach, there are the following disadvantages: 1. Due to the server of each node There are differences in the number of CPUs, memory size, and whether the hardware configuration such as SSD, resulting in inconsistency in the amount of data that each search node can store and the ability to process queries, so the resources used cannot be used reasonably; 2. If the data is distributed equally, if The query needs to wait for all search nodes to return results. There may be a search service with a low configuration search node query that slows down the entire service; 3. The dynamic expansion of the node cannot be performed. When the amount of data is increasing, new search nodes are added. There is a problem of data scouring, which increases maintenance costs, and this application allocates data according to the performance of the nodes, which ensures that each node can effectively exert its processing capacity and improve the efficiency of data processing.
  • ZooKeeper is a distributed, open source distributed application coordination service.
  • the zookeeper can be located on the server.
  • the zookeeper can include at least the following Several modules: register group (registerGroup), load group (loadGroup), configuration group (configGroup), quantity group (amountGroup), you can register all search nodes to zookeeper (for example: it can be used as a registration center, subsequent data (Distribution and data request use) Save the registration information in registerGroup.
  • the registration information can be at least one of the following: node name, node type, node function, etc., which is not limited here, and the hardware resource configuration information of all search nodes ( For example, the number of CPU cores, memory size, and SSD) are registered in the configGroup, and the report reports the current number of nodes running load (loadaverage) every 10 seconds.
  • the zookeeper is stored in the loadGroup. Report that the total number of records currently stored by each search node is stored in the amountGroup.
  • the configGroup hardware configuration is resolved to the same, the total number of records of each node of the amountGroup is parsed, and the current service load of each search node of the loadGroup is loaded. If the current server load gap of each node search node is within 10 values, only the total number of nodes is distributed, and the total number of each node is proportional (Example 5: 4:3:1, then the amount of data delivered is 1:3:4:5 multiple delivery, there is also a situation when a node is added when a node ratio is very low (for example, 100:89:1, etc., first distribute the current data to the least data search node) Relationship.
  • each search service node is different, for example, the number of CPU cores is compared to 5:4:2, the data distribution is distributed according to the relationship of 1.5:1.4:1.2 times, and the data of the memory ratio 5:4:2 is also according to 1.5:1.4:1.2
  • the relationship is distributed twice, whether the SSD needs to be in accordance with the relationship of 1:1.5 times (because the use of SSD can increase the efficiency of loading data from disk by 30%-50%), multiple ratios are calculated according to an average method of a hardware configuration Proportional relationship, and then distribute the data according to the first step.
  • each search server if the current server hardware resource gap is relatively small, and the data amount of each search node is not large, it may be caused by some hardware conditions or abnormal programs (for example: program coding problems cause deadlocks)
  • the current search server load is relatively high. If such a problem occurs, the current abnormal node distribution can be eliminated as much as possible. Due to the high load, it may take a long time to query and increase data, and if you continue to add data or query data, it will cause it The service load is getting higher and higher until the service hangs. When the number of data loads exceeds 1.5 times the number of its CPU cores for 1 minute, it actively initiates the deletion of the corresponding registration information from registerGroup, loadGroup, configGroup, and amountGroup.
  • the user when the user initiates a query request, first obtain all available search nodes from the registerGroup, use asynchronous multi-threaded requests to send the request parameters to each request node, and set the query for each HTTP request for 10 seconds. Overtime, there may be a long time for a search node query in the cluster, which results in a long time for the entire request, so setting a single query timeout time is to lose the query and return only part of the results to improve search performance.
  • the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain
  • the first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes
  • the corresponding nodes in the process are processed, so that the hardware resource configuration information of each reporting node of P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information, and distributed to the corresponding nodes, which realizes the basis
  • the performance of the nodes is used for data distribution, which fully exerts the processing power of each node and improves the efficiency of data processing.
  • FIG. 2 is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application.
  • the data distribution method described in this embodiment includes the following steps:
  • the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain hardware resource configuration information of the Q new nodes, and based on the hardware of the Q new nodes The resource configuration information divides the second data to be processed to obtain Q data blocks, and distributes the Q databases to nodes corresponding to the Q new nodes for processing.
  • the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain
  • the first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes
  • the corresponding node in the process is processed, and when Q new nodes are detected, the second data to be processed is obtained, Q positive integers, and the upper limit processing data amount of each new node in the Q new nodes is estimated to obtain Q upper limits Processing data volume, when the data volume of the second to-be-processed data is greater than the sum of the Q upper-limit processing data volumes, the second to-be-processed data is divided into the first data set and the second data set according to the Q upper-limit processing data volume, and The first data set is allocated by Q new nodes, and the second data set
  • Q new nodes are acquired
  • the hardware resource configuration information of the node, and the second data to be processed is divided according to the hardware resource configuration information of the Q new nodes to obtain Q data blocks, and Q databases are distributed to the corresponding nodes of the Q new nodes for processing
  • the upper limit of the amount of data processing can be estimated to the greatest extent, which is conducive to fully ensuring the stability of the system and improving the processing efficiency of the system.
  • FIG. 3 is a schematic structural diagram of an embodiment of a data distribution device according to an embodiment of the present application.
  • the data distribution device described in this embodiment includes: a first acquisition unit 301, a second acquisition unit 302, a division unit 303, and a distribution unit 304, as follows:
  • the first obtaining unit 301 is configured to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;
  • the second obtaining unit 302 is configured to obtain the first data to be processed
  • the dividing unit 303 is configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
  • the distribution unit 304 is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.
  • the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain
  • the first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes
  • the corresponding nodes in the process are processed, so that the hardware resource configuration information of each reporting node of P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information, and distributed to the corresponding nodes, which realizes the basis
  • the performance of nodes is used for data distribution, which fully exerts the processing power of each node and improves the efficiency of data processing.
  • the first obtaining unit 301 may be used to implement the method described in step 101 above
  • the second obtaining unit 302 may be used to implement the method described in step 102 above
  • the dividing unit 303 may be used to implement the method described in step 103 above
  • the above-mentioned distribution unit 304 may be used to implement the method described in step 104 above, and so on.
  • the dividing unit 303 is specifically configured to:
  • the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;
  • the dividing unit 303 is specifically configured to:
  • the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;
  • FIG. 3B shows yet another modified structure of the data distribution device described in FIG. 3A.
  • it may further include: an estimation unit 305 and a processing unit 306, specifically as follows:
  • the second obtaining unit 302 is further configured to obtain second to-be-processed data when Q new nodes are detected, where Q is a positive integer;
  • the estimation unit 305 is configured to estimate the upper limit processing data amount of each of the Q new nodes to obtain Q upper limit processing data amounts;
  • the processing unit 306 is further configured to divide the second to-be-processed data into the second to-be-processed data according to the Q upper-limit processed data amounts when the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts A first data set and a second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes; and in the second When the data amount of the processed data is less than or equal to the sum of the Q upper limit processed data amounts, the hardware resource configuration information of the Q new nodes is obtained, and the second The data to be processed is divided to obtain Q data blocks, and the Q data blocks are distributed to nodes corresponding to the Q new nodes for processing.
  • FIG. 3C shows another modified structure of the data distribution device described in FIG. 3A. Compared with FIG. 3A, it may further include: an early warning unit 307, as follows:
  • the early warning unit 307 is configured to delete the node information corresponding to any node or send an alarm message to the administrator when it is detected that the load value of any one of the P nodes exceeds a preset threshold.
  • FIG. 4 is a schematic structural diagram of an embodiment of a server according to an embodiment of the present application.
  • the server described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and memory 4000, the above input device 1000, output device 2000, processor 3000, and memory 4000 is connected via bus 5000.
  • the input device 1000 may specifically be a touch panel, physical buttons, or a mouse.
  • the above output device 2000 may specifically be a display screen.
  • the above-mentioned memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the above memory 4000 is used to store a set of program codes, and the above input device 1000, output device 2000, and processor 3000 are used to call the program codes stored in the memory 4000, and perform the following operations:
  • the aforementioned processor 3000 is used for:
  • the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a piece of hardware resource configuration information to obtain the first
  • divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and distribute the P data blocks to P nodes Process so that it can determine the hardware resource configuration information of each reported node of P nodes, and divide the data to be processed into P shares according to the hardware resource configuration information, and distribute to the corresponding nodes, according to the performance of the node Carry out data distribution, give full play to the processing power of each node, and improve the efficiency of data processing.
  • the content specifically processed by the processor 3000 is reflected in the above data distribution method, and will not be repeated here.
  • An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes some or all steps of any one of the data distribution methods described in the foregoing method embodiments.
  • An embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute any of the above described embodiments of the present application Or all the steps described in this data distribution method.
  • the computer program product may be a software installation package.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data distribution method and a related product. The method comprises: acquiring hardware resource configuration information reported by each of P nodes so as to obtain P pieces of hardware resource configuration information, each node corresponding to one piece of hardware resource configuration information (101); acquiring first data to be processed (102); dividing, according to the P pieces of hardware resource configuration information, the first data and obtaining P data blocks, each piece of hardware resource configuration information corresponding one data block (103); and respectively distributing the P data blocks to corresponding nodes in the P nodes for processing (104). The method can improve the efficiency of data processing in a computer cluster.

Description

数据分配方法及相关产品Data distribution method and related products
本申请要求于2018年12月27日提交中国专利局,申请号为201811613722.9、发明名称为“数据分配方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of a Chinese patent application filed on December 27, 2018, with the application number 201811613722.9 and the invention titled "Data Distribution Method and Related Products", the entire contents of which are incorporated by reference in this application.
技术领域Technical field
本申请涉及数据分配技术领域,具体涉及一种数据分配方法及相关产品。This application relates to the technical field of data distribution, and in particular to a data distribution method and related products.
背景技术Background technique
随着电子技术的快速发展,计算机集群技术也得到了快速发展,计算机集群可以简单地理解为,通过服务器与多个节点建立一个集群,在数据处理过程中,将所有数据平均存放在搜索集群的各个服务节点中,降低了计算机集群数据处理效率。With the rapid development of electronic technology, computer cluster technology has also developed rapidly. Computer cluster can be simply understood as establishing a cluster through a server and multiple nodes. During the data processing process, all data is stored in the search cluster on average. In each service node, the data processing efficiency of the computer cluster is reduced.
发明内容Summary of the invention
本申请实施例提供了一种数据分配方法及相关产品,可以提升计算机集群数据处理效率。The embodiments of the present application provide a data distribution method and related products, which can improve the data processing efficiency of a computer cluster.
本申请实施例第一方面提供了一种数据分配方法,包括:A first aspect of an embodiment of the present application provides a data distribution method, including:
获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息;Obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one hardware resource configuration information;
获取第一待处理数据;Obtain the first data to be processed;
依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块;Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。Distribute the P data blocks to corresponding nodes of the P nodes for processing.
可选地,所述方法还包括:Optionally, the method further includes:
检测到出现Q个新节点时,获取第二待处理数据,所述Q为正整数;When Q new nodes are detected, the second data to be processed is obtained, and the Q is a positive integer;
预估所述Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量;Estimating the upper limit processing data volume of each of the Q new nodes to obtain Q upper limit processing data volume;
在所述第二待处理数据的数据量大于所述Q个上限处理数据量的总和时,依据所述Q个上限处理数据量将所述第二待处理数据划分为第一数据集和第二数据集,将所述第一数据集由所述Q个新节点进行分配,将所述第二数据集由所述P个节点进行分配;When the data amount of the second to-be-processed data is greater than the sum of the Q upper limit processing data amounts, the second to-be-processed data is divided into a first data set and a second according to the Q upper limit processing data amounts A data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;
在所述第二待处理数据的数据量小于或等于所述Q个上限处理数据量的总和时,获取所述Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将所述第二待处理数据进行划分,得到Q个数据块,将所述Q个数据块分别分发给所述Q个新节点相应的节点进行处理。When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain the hardware resource configuration information of the Q new nodes, and according to the hardware resource configuration of the Q new nodes The information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.
进一步可选地,所述预估所述Q个新节点中的每一新节点的上限处理数据量,包括:Further optionally, the estimation of the upper limit processing data amount of each of the Q new nodes includes:
获取新节点j的至少一个子节点,所述新节点j为所述Q个新节点中的任一新节点;Acquiring at least one child node of a new node j, where the new node j is any new node among the Q new nodes;
确定所述至少一个子节点的数据处理上限值,得到至少一个数据处理上限值;Determine the data processing upper limit value of the at least one child node to obtain at least one data processing upper limit value;
获取所述新节点j的目标硬件资源配置信息;Acquiring target hardware resource configuration information of the new node j;
按照预设的硬件资源配置信息与额定上限数据处理量之间的映射关系,确定所述目标硬件资源配置信息对应的目标额定上限数据处理量;Determine the target rated upper limit data processing amount corresponding to the target hardware resource configuration information according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount;
将所述目标额定上限数据处理量与所述至少一个数据处理上限值之间的差值作为所述新节点j的上限数据处理量。The difference between the target rated upper limit data processing amount and the at least one data processing upper limit value is used as the upper limit data processing amount of the new node j.
本申请实施例第二方面提供了一种数据分配装置,包括:A second aspect of an embodiment of the present application provides a data distribution device, including:
第一获取单元,用于获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息;The first obtaining unit is used to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;
第二获取单元,用于获取第一待处理数据;A second obtaining unit, configured to obtain the first data to be processed;
划分单元,用于依据所述P个硬件资源配置信息将所述第一待处理数据进 行划分,得到P个数据块,每一硬件资源配置信息对应一数据块;A dividing unit, configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
分发单元,用于将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。The distribution unit is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.
第三方面,本申请实施例提供一种服务器,包括处理器、存储器以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。In a third aspect, an embodiment of the present application provides a server, including a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by the processor. The program includes instructions for performing the steps in the first aspect of the embodiments of the present application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。According to a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In a fifth aspect, an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect. The computer program product may be a software installation package.
实施本申请实施例,具备如下有益效果:The implementation of the embodiments of the present application has the following beneficial effects:
可以看出,通过本申请实施例所描述的数据分配方法及相关产品,获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息,获取第一待处理数据,依据P个硬件资源配置信息将第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块,将P个数据块分别分发给P个节点中相应的节点进行处理,如此,能够确定P个节点每个上报节点的硬件资源配置信息,并依据该硬件资源配置信息将待处理数据划分为P份,并分发给相应的节点,实现了依据节点性能进行数据分配,充分发挥了每一节点的处理能力,提升了数据处理效率。It can be seen that through the data distribution method and related products described in the embodiments of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration Information, obtain the first data to be processed, divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and distribute the P data blocks to The corresponding nodes of the P nodes are processed, so that the hardware resource configuration information of each reported node of the P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information and distributed to the corresponding nodes. Realize data distribution according to node performance, give full play to the processing power of each node, and improve the data processing efficiency.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所 需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1A是本申请实施例提供的一种数据分配方法的实施例流程示意图;1A is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application;
图1B是本申请实施例提供的计算机集群的演示示意图;1B is a schematic diagram of a computer cluster provided by an embodiment of the present application;
图1C是本申请实施例提供的zookeeper的结构示意图;1C is a schematic structural diagram of a zookeeper provided by an embodiment of the present application;
图2是本申请实施例提供的一种数据分配方法的另一实施例流程示意图;2 is a schematic flowchart of another embodiment of a data distribution method according to an embodiment of the present application;
图3A是本申请实施例提供的一种数据分配装置的实施例结构示意图;3A is a schematic structural diagram of an embodiment of a data distribution device according to an embodiment of the present application;
图3B是本申请实施例提供的一种数据分配装置的另一实施例结构示意图;3B is a schematic structural diagram of another embodiment of a data distribution device according to an embodiment of the present application;
图3C是本申请实施例提供的一种数据分配装置的另一实施例结构示意图;3C is a schematic structural diagram of another embodiment of a data distribution device according to an embodiment of the present application;
图4是本申请实施例提供的一种数据分配装置的实施例结构示意图。4 is a schematic structural diagram of an embodiment of a data distribution device provided by an embodiment of the present application.
具体实施方式detailed description
请参阅图1A,为本申请实施例提供的一种数据分配方法的实施例流程示意图。本实施例中所描述的数据分配方法,包括以下步骤:Please refer to FIG. 1A, which is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application. The data distribution method described in this embodiment includes the following steps:
101、获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息。101. Obtain hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information.
其中,上述硬件资源配置信息可以为以下至少一种:中央处理器CPU的核数、内存大小、负载值、是否为固态硬盘(Solid State Drives)等等在此不做限定。The above hardware resource configuration information may be at least one of the following: the number of cores of the central processing unit CPU, memory size, load value, whether it is a solid state drive (Solid State Drives), etc., which are not limited herein.
具体地,如图1B所示,服务器可以与P个节点之间建立通信连接,服务器可以接收P个节点上报的硬件资源配置信息,每一节点可以每隔预设时间间隔上述该节点的硬件资源配置信息,该预设时间间隔可以由用户自行设置或者系统默认。Specifically, as shown in FIG. 1B, the server can establish a communication connection with P nodes, the server can receive the hardware resource configuration information reported by P nodes, and each node can set the hardware resources of the node at preset time intervals. Configuration information, the preset time interval can be set by the user or the system default.
102、获取第一待处理数据。102. Obtain first to-be-processed data.
其中,上述第一待处理数据可以为以下至少一种:图像、文件、信令、视 频、语音信号、光信号、文本信息等等,在此不做限定。Wherein, the first data to be processed may be at least one of the following: images, files, signaling, video, voice signals, optical signals, text information, etc., which are not limited herein.
具体实现中,第一待处理数据可以来自于终端,也可以来自于上一节点或者下一节点转发过来的数据。第一待处理数据可以为实时数据,或者,为预先存储的数据,在此不做限定。In a specific implementation, the first data to be processed may come from the terminal, or may be data forwarded from the previous node or the next node. The first data to be processed may be real-time data or pre-stored data, which is not limited herein.
103、依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块。103. Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block.
其中,不同的硬件资源配置信息反映了节点的不同的处理能力,本申请实施例中,本着处理能力强,则可以多处理些数据,处理能力弱,则可以少处理些数据的思想,依据P个硬件资源配置信息将第一待处理数据进行划分,得到P个数据快,每一硬件资源配置信息对应一个数据块,如此,充分利用了每一节点的处理能力,能够提升数据处理效率。Among them, the different hardware resource configuration information reflects the different processing capabilities of the node. In the embodiments of the present application, based on the strong processing capacity, you can process more data, and the processing capacity is weak, you can process less data. The P pieces of hardware resource configuration information divide the first data to be processed to obtain P pieces of data. Each piece of hardware resource configuration information corresponds to a data block. In this way, the processing power of each node is fully utilized to improve data processing efficiency.
可选地,上述步骤102,依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,可包括如下步骤:Optionally, in the above step 102, dividing the first data to be processed according to the P hardware resource configuration information to obtain P data blocks may include the following steps:
21、依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值;21. Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;
22、依据所述P个性能评价值确定所述P个节点中每一节点对应的分配比例值,得到P个分配比例值,所述P个分配比例值之和为1;22. According to the P performance evaluation values, determine the allocation ratio value corresponding to each of the P nodes to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;
23、依据所述P个分配比例值将所述第一待处理数据进行划分,得到所述P个数据块。23. Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.
其中,硬件资源配置信息在一定程度上反映了节点的性能,因此,服务器可以依据P个硬件资源配置信息对P个节点中每个节点的性能进行评价值,得到P个性能评价值,然后依据P个性能评价值确定P个节点中每个节点对应的分配比例值,得到P个分配比例值,具体地,可以先计算P个性能评价值的总和,对于任一节点,该节点的分配比例值=该节点的性能评价值/P个性能评价值的总和,以此类推,可以得到P个节点中每一节点的分配比例值,进一步地,依据P个分配比例值将第一待处理数据进行划分,得到P个数据库,如此,实 现了依据性能对待处理数据进行分配,提升了整个系统的运作效率。Among them, the hardware resource configuration information reflects the performance of the node to a certain extent. Therefore, the server can evaluate the performance of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values. The P performance evaluation values determine the distribution proportion value corresponding to each node in the P nodes to obtain P distribution proportion values. Specifically, the sum of the P performance evaluation values can be calculated first. For any node, the distribution proportion of the node Value = the performance evaluation value of the node/the sum of the P performance evaluation values, and so on, the distribution ratio value of each node in the P nodes can be obtained. Further, the first data to be processed is based on the P distribution ratio value By dividing, P databases are obtained, so that the data to be processed is distributed according to the performance, and the operation efficiency of the entire system is improved.
可进一步可选地,所述硬件资源配置信息包括:中央处理器的核数、内存大小和负载值;Optionally, the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;
上述步骤21,依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值,可包括如下步骤:In the above step 21, determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information to obtain P performance evaluation values may include the following steps:
211、按照预设的核数与第一评价值之间的映射关系,确定硬件资源配置信息i中的核数对应的目标第一评价值,所述硬件资源配置信息i为所述P个硬件配置资源信息中的任一硬件资源配置信息;211. Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, where the hardware resource configuration information i is the P pieces of hardware Any hardware resource configuration information in the configuration resource information;
212、按照预设的内存大小与第二评价值之间的映射关系,确定所述硬件资源配置信息i中的内存大小对应的目标第二评价值;212. Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;
213、按照预设的负载值与第三评价值之间的映射关系,确定所述硬件资源配置信息i中的负载值对应的目标第三评价值;213. Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the preset mapping relationship between the load value and the third evaluation value;
214、获取所述第一评价值对应的第一权值、所述第二评价值对应的第二权值以及所述第三评价值对应的第三权值,所述第一权值、所述第二权值与所述第三权值之和为1;214. Obtain a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value. The first weight value, all The sum of the second weight and the third weight is 1;
215、依据所述目标第一评价值、所述目标第二评价值、所述目标第三评价值、所述第一权值、所述第二权值和所述第三权值进行加权运算,得到所述硬件资源配置信息i对应的评价值。215. Perform a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value To obtain the evaluation value corresponding to the hardware resource configuration information i.
其中,上述第一权值、第二权值、第三权值均可以预先设置或者系统默认,上述第一权值、第二权值与第三权值之和为1。服务器中可以预先存储预设的核数与第一评价值之间的映射关、预设的内存大小与第二评价值之间的映射关系以及预设的负载值与第三评价值之间的映射关系。上述第一评价值、第二评价值、第三评价值为0~1之间的数,或者0~100之间的数。举例说明下,如下提供一种核数与第一评价值之间的映射关系:Wherein, the first weight value, the second weight value, and the third weight value may be preset or the system defaults, and the sum of the first weight value, the second weight value, and the third weight value is 1. The server may pre-store the mapping relationship between the preset number of cores and the first evaluation value, the mapping relationship between the preset memory size and the second evaluation value, and the relationship between the preset load value and the third evaluation value Mapping relations. The first evaluation value, the second evaluation value, and the third evaluation value are between 0 and 1, or between 0 and 100. For example, the following provides a mapping relationship between the core number and the first evaluation value:
核数Core number 第一权值First weight
11 0.20.2
22 0.40.4
33 0.60.6
44 0.80.8
>5>5 11
具体实现中,以硬件资源配置信息i为例,该硬件资源配置信息i为P个硬件配置资源信息中的任一硬件资源配置信息,若硬件资源配置信息包括:中央处理器的核数、内存大小和负载值,则可以按照上述预设的核数与第一评价值之间的映射关系,确定硬件资源配置信息i中的核数对应的目标第一评价值,按照预设的内存大小与第二评价值之间的映射关系,确定硬件资源配置信息i中的内存大小对应的目标第二评价值,按照预设的负载值与第三评价值之间的映射关系,确定硬件资源配置信息i中的负载值对应的目标第三评价值,获取第一评价值对应的第一权值、第二评价值对应的第二权值以及第三评价值对应的第三权值,第一权值、第二权值与第三权值之和为1,依据目标第一评价值、目标第二评价值、目标第三评价值、第一权值、第二权值和第三权值进行加权运算,得到硬件资源配置信息i对应的评价值,即硬件资源配置信息i对应的评价值=目标第一评价值*第一权值+目标第二评价值*第二权值+目标第三评价值*第三权值,如此,可以从硬件资源配置信息的多个维度出发,对每一节点的性能进行评价,提升了节点性能评价精度。In a specific implementation, taking the hardware resource configuration information i as an example, the hardware resource configuration information i is any hardware resource configuration information in the P pieces of hardware configuration resource information. If the hardware resource configuration information includes: the number of cores and memory of the central processing unit Size and load value, you can determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, and according to the preset memory size and The mapping relationship between the second evaluation values determines the target second evaluation value corresponding to the memory size in the hardware resource configuration information i, and the hardware resource configuration information is determined according to the mapping relationship between the preset load value and the third evaluation value the target third evaluation value corresponding to the load value in i, obtaining the first weight value corresponding to the first evaluation value, the second weight value corresponding to the second evaluation value, and the third weight value corresponding to the third evaluation value, the first weight The sum of the value, the second weight and the third weight is 1, based on the target first evaluation value, target second evaluation value, target third evaluation value, first weight value, second weight value and third weight value Weighted operation to obtain the evaluation value corresponding to the hardware resource configuration information i, that is, the evaluation value corresponding to the hardware resource configuration information i = target first evaluation value * first weight value + target second evaluation value * second weight value + target third Evaluation value * Third weight value. In this way, the performance of each node can be evaluated from multiple dimensions of hardware resource configuration information, which improves the accuracy of node performance evaluation.
104、将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。104. Distribute the P data blocks to corresponding nodes of the P nodes for processing.
其中,服务器在完成数据分配之后,可以将P个数据块分发给P个节点中相应的节点进行处理,如此,可以充分利用节点的性能,对数据进行差异分配,保证了性能强的节点可以适当多处理些数据,性能弱的节点可以处理相对少点数据,能够保证数据及时被处理,避免了数据积压情况,提升了数据处理效率。After the server completes the data distribution, the P data blocks can be distributed to the corresponding nodes of the P nodes for processing. In this way, the performance of the nodes can be fully utilized to distribute the data differentially, ensuring that the nodes with strong performance can be properly Process more data, and nodes with weak performance can process relatively few points of data, which can ensure that the data is processed in time, avoiding data backlog and improving data processing efficiency.
可选地,上述步骤104之后,还可以包括如下步骤:Optionally, after the above step 104, the following steps may also be included:
A1、检测到出现Q个新节点时,获取第二待处理数据,所述Q为正整数;A1. When Q new nodes are detected, obtain second to-be-processed data, where Q is a positive integer;
A2、预估所述Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量;A2. Estimate the upper limit processing data volume of each of the Q new nodes to obtain Q upper limit processing data volume;
A3、在所述第二待处理数据的数据量大于所述Q个上限处理数据量的总和时,依据所述Q个上限处理数据量将所述第二待处理数据划分为第一数据集和第二数据集,将所述第一数据集由所述Q个新节点进行分配,将所述第二数据集由所述P个节点进行分配;A3. When the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts, divide the second to-be-processed data into a first data set according to the Q upper-limit processed data amounts A second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;
A4、在所述第二待处理数据的数据量小于或等于所述Q个上限处理数据量的总和时,获取所述Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将所述第二待处理数据进行划分,得到Q个数据块,将所述Q个数据块分别分发给所述Q个新节点相应的节点进行处理。A4. When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processing data amounts, obtain hardware resource configuration information of the Q new nodes, and based on the hardware of the Q new nodes The resource configuration information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.
其中,上述第二待处理数据可以为以下至少一种:图像、文件、信令、视频、语音信号、光信号、文本信息等等,在此不做限定。The second to-be-processed data may be at least one of the following: images, files, signaling, video, voice signals, optical signals, text information, etc., which are not limited herein.
具体实现中,第二待处理数据可以来自于终端,也可以来自于上一节点或者下一节点转发过来的数据。第二待处理数据可以为实时数据,或者,为预先存储的数据,在此不做限定。In a specific implementation, the second to-be-processed data may come from the terminal, or may be data forwarded from the previous node or the next node. The second data to be processed may be real-time data or pre-stored data, which is not limited herein.
其中,由上述可知,服务器可以与P个节点构成一个集群,当然,集群还可以添加其他节点。具体实现中,服务器可以检测集群中是否出现了新节点,若检测到出现Q个新节点,则可以获取第二待处理数据,Q为正整数。由于每个节点均能力有限,超过一定的数据处理量,则会出现卡顿或者死机现象,反而降低了数据处理效率。因此,本申请实施例中,可以预估Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量,上限处理数据量可以理解为最多能够处理的数据量,在第二待处理数据的数据量大于Q个上限处理数据量的总和时,则说明光靠Q个新节点无法处理第二待处理数据,因此,可以依据Q个上限处理数据量将第二待处理数据划分为第一数据集和第二数据集,其中,第一数据集的数据量小于Q个上限处理数据量的总和,进而,可以 将第一数据集由Q个新节点进行分配,将第二数据集由所述P个节点进行分配,具体的分配方式可以参照上述步骤101-步骤104的具体描述,在此不再赘述。It can be known from the above that the server can form a cluster with P nodes, and of course, other nodes can be added to the cluster. In a specific implementation, the server can detect whether a new node appears in the cluster. If Q new nodes are detected, the second pending data can be obtained, and Q is a positive integer. Due to the limited capacity of each node, more than a certain amount of data processing, there will be stuck or crash phenomenon, but instead reduces the efficiency of data processing. Therefore, in the embodiment of the present application, the upper limit processing data amount of each of the Q new nodes can be estimated to obtain Q upper limit processing data amounts. The upper limit processing data amount can be understood as the maximum amount of data that can be processed. When the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts, it means that the Q new nodes cannot process the second to-be-processed data alone. Therefore, the second to-be-processed data can be processed according to the Q upper-limit processed data amounts The data is divided into a first data set and a second data set, where the data amount of the first data set is less than the sum of the Q upper limit processing data amounts. Furthermore, the first data set can be allocated by Q new nodes The two data sets are allocated by the P nodes. For a specific allocation method, reference may be made to the specific description of steps 101 to 104 above, and details are not described herein again.
此外,在第二待处理数据的数据量小于或等于Q个上限处理数据量的总和时,则说明Q个新节点完全有能力处理完第二待处理数据,则可以优先考虑让Q个新节点处理第二待处理数据,进一步地,可以获取Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将第二待处理数据进行划分,得到Q个数据块,将Q个数据块分别分发给Q个新节点相应的节点进行处理,具体的分配方式可以参照上述步骤101-步骤104的具体描述,在此不再赘述。如此,可以充分且快速加入新节点进行数据处理,还可以保证数据在节点之间合理分配,提升了数据处理效率。In addition, when the data volume of the second data to be processed is less than or equal to the sum of the Q upper limit processing data volume, it means that the Q new nodes are fully capable of processing the second data to be processed, and the Q new nodes may be given priority. Processing the second data to be processed, further, the hardware resource configuration information of the Q new nodes can be obtained, and the second data to be processed is divided according to the hardware resource configuration information of the Q new nodes to obtain Q data blocks. The Q data blocks are distributed to the corresponding nodes of the Q new nodes for processing. For the specific allocation method, reference may be made to the specific descriptions of the above steps 101-104, which will not be repeated here. In this way, new nodes can be fully and quickly added for data processing, and data can be reasonably distributed among the nodes, which improves data processing efficiency.
进一步可选地,上述步骤A2,预估所述Q个新节点中的每一新节点的上限处理数据量,可以包括如下步骤:Further optionally, in the above step A2, estimating the upper limit processing data amount of each of the Q new nodes may include the following steps:
A21、获取新节点j的至少一个子节点,所述新节点j为所述Q个新节点中的任一新节点;A21. Acquire at least one child node of a new node j, where the new node j is any one of the Q new nodes;
A22、确定所述至少一个子节点的数据处理上限值,得到至少一个数据处理上限值;A22. Determine the data processing upper limit value of the at least one child node to obtain at least one data processing upper limit value;
A23、获取所述新节点j的目标硬件资源配置信息;A23. Obtain target hardware resource configuration information of the new node j;
A24、按照预设的硬件资源配置信息与额定上限数据处理量之间的映射关系,确定所述目标硬件资源配置信息对应的目标额定上限数据处理量;A24. Determine the target rated upper limit data processing amount corresponding to the target hardware resource configuration information according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount;
A25、将所述目标额定上限数据处理量与所述至少一个数据处理上限值之间的差值作为所述新节点j的上限数据处理量。A25. The difference between the target rated upper limit data processing amount and the at least one data processing upper limit value is used as the upper limit data processing amount of the new node j.
其中,对于每一新节点而言,每一节点有可能是服务器,当然,节点也可以有一个或者多个子节点,子节点的话,则会与父节点之间会有一定的交互,因此,有可能也会占用一定的资源,因此,具体实现中,服务器可以获取新节点j的至少一个子节点,新节点j为Q个新节点中的任一新节点,确定所述至少一个子节点的数据处理上限值,得到至少一个数据处理上限值,每一节点的数 据处理上限值可以预先设置或者系统默认,服务器中还可以预先存储预设的硬件资源配置信息与额定上限数据处理量之间的映射关系,进而,在获取新节点j的目标硬件资源配置信息之后,可以按照预设的硬件资源配置信息与额定上限数据处理量之间的映射关系,确定目标硬件资源配置信息对应的目标额定上限数据处理量,将目标额定上限数据处理量与至少一个数据处理上限值之间的差值作为新节点j的上限数据处理量,上述额定上限数据处理量则可以理解为产品出厂之前,预先设置的最多能够处理的数据量,如此,可以在保证新节点与其子节点均能正常工作的基础上,最大限度地预估数据处理量上限,有利于充分保证系统的稳定性,提升系统处理效率。Among them, for each new node, each node may be a server. Of course, the node may also have one or more child nodes. If the child node, there will be some interaction with the parent node. Therefore, there is Certain resources may also be occupied. Therefore, in a specific implementation, the server may acquire at least one child node of the new node j, where the new node j is any new node among the Q new nodes, and determine the data of the at least one child node Process the upper limit value to obtain at least one data processing upper limit value. The data processing upper limit value of each node can be preset or the system default. The server can also store the preset hardware resource configuration information and the rated upper limit data processing amount in advance. After the target hardware resource configuration information of the new node j is acquired, the target corresponding to the target hardware resource configuration information can be determined according to the mapping relationship between the preset hardware resource configuration information and the rated upper limit data processing amount Rated upper limit data processing volume, the difference between the target rated upper limit data processing volume and at least one data processing upper limit value is taken as the upper limit data processing volume of the new node j. The above rated upper limit data processing volume can be understood as before the product leaves the factory. The maximum amount of data that can be processed in advance is set. In this way, the upper limit of the amount of data processing can be estimated to the maximum on the basis of ensuring that the new node and its child nodes can work normally, which is conducive to ensuring the stability of the system and improving system processing. effectiveness.
可选地,上述步骤104之后,还可以包括如下步骤:Optionally, after the above step 104, the following steps may also be included:
在检测到所述P个节点中任一节点的负载值超过预设阈值时,则删除该任一节点对应的节点信息,或者,向管理员发送报警信息。When it is detected that the load value of any one of the P nodes exceeds a preset threshold, the node information corresponding to the any node is deleted, or an alarm message is sent to the administrator.
其中,上述预设阈值可以由用户自行设置或者系统默认。服务器可以检测P个节点中任一节点的负载值,若负载值超过预设阈值,在可以删除该任一节点对应的节点信息,如此,可以保证系统其它节点的正常运行,另外,还可以向管理员发送报警信息,报警信息可以包括以下至少一种:节点位置、节点工作异常原因、异常等级等等,在此不做限定,如此,可以及时提醒管理员对该任一节点进行维护。The above-mentioned preset threshold can be set by the user or the system default. The server can detect the load value of any one of the P nodes. If the load value exceeds the preset threshold, the node information corresponding to any one of the nodes can be deleted. In this way, the normal operation of the other nodes of the system can be ensured. The administrator sends an alarm message. The alarm message may include at least one of the following: node location, node work abnormality, abnormality level, etc., which is not limited here, so the administrator can be promptly reminded to perform maintenance on any node.
另外,在具体实现中,由于传统计算机集群数据处理方式是将所有数据平均存放在搜索集群的各个服务节点中,,是按照该种做法,有以下几个缺点:1、由于每个节点的服务器CPU数、内存大小和是否SSD等硬件配置存在差异,导致每个搜索节点能够存储数据量和处理查询能力不一致,所以对出现资源得不到合理利用;2、如果按照平均分配数据的方式,如果查询需要等待所有搜索节点返回结果,可能存在配置低的搜索节点查询耗时拖慢整个服务的搜索服务;3、不能够进行节点的动态扩充,当数据量越来越大,新增搜索节点,存在数据冲刷的问题,增大维护成本,而本申请依据节点性能进行数据分配,保证了每个 节点均有效发挥处理能力,提升了数据处理效率。In addition, in the specific implementation, because the traditional computer cluster data processing method is to store all the data equally in each service node of the search cluster, according to this approach, there are the following disadvantages: 1. Due to the server of each node There are differences in the number of CPUs, memory size, and whether the hardware configuration such as SSD, resulting in inconsistency in the amount of data that each search node can store and the ability to process queries, so the resources used cannot be used reasonably; 2. If the data is distributed equally, if The query needs to wait for all search nodes to return results. There may be a search service with a low configuration search node query that slows down the entire service; 3. The dynamic expansion of the node cannot be performed. When the amount of data is increasing, new search nodes are added. There is a problem of data scouring, which increases maintenance costs, and this application allocates data according to the performance of the nodes, which ensures that each node can effectively exert its processing capacity and improve the efficiency of data processing.
举例说明下,以zookeeper(动物管理员)为例,ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,如图1C所示,该zookeeper可以位于服务器中,该zookeeper至少可以包括如下几个模块:寄存器组(registerGroup),负载组(loadGroup),配置组(configGroup),数量组(amountGroup),可以将所有搜索节点注册到zookeeper(例:将其可以作为注册中心使用,后续进行数据分发和数据请求使用)将注册信息保存在registerGroup中,注册信息可以为以下至少一种:节点名称、节点类型、节点功能等等,在此不做限定,将所有搜索节点的硬件资源配置信息(如CPU核数,内存大小,是否SSD)注册到configGroup中,上报将节点当前运行的负载数(load average)每个10秒上报zookeeper存储在loadGroup中。上报当前每个搜索节点存储的记录总数存在amountGroup中。To illustrate, taking zookeeper as an example, ZooKeeper is a distributed, open source distributed application coordination service. As shown in FIG. 1C, the zookeeper can be located on the server. The zookeeper can include at least the following Several modules: register group (registerGroup), load group (loadGroup), configuration group (configGroup), quantity group (amountGroup), you can register all search nodes to zookeeper (for example: it can be used as a registration center, subsequent data (Distribution and data request use) Save the registration information in registerGroup. The registration information can be at least one of the following: node name, node type, node function, etc., which is not limited here, and the hardware resource configuration information of all search nodes ( For example, the number of CPU cores, memory size, and SSD) are registered in the configGroup, and the report reports the current number of nodes running load (loadaverage) every 10 seconds. The zookeeper is stored in the loadGroup. Report that the total number of records currently stored by each search node is stored in the amountGroup.
进一步地,当数据增加时,如果解析configGroup硬件配置相同,则解析amountGroup各个节点的记录总数,并加载loadGroup各个搜索节点当前的服务负载情况。如果当前各个节点搜索节点的服务器负载差距在10个数值以内,则只按照各个节点总数做数据分发,将每个节点总数按照比例(例5:4:3:1,则下发的数据量是1:3:4:5倍数下发,还存在一种情况当一个节点新增的时候出现一个节点比例非常低例100:89:1等,先将当前数据分发到最少数据的搜索节点)的关系进行分发。如果当各个搜索服务节点的配置有差异,比如CPU核数对比5:4:2则数据分发按照1.5:1.4:1.2倍的关系分发,内存比例5:4:2数据也是按照1.5:1.4:1.2倍的关系分发,是否SSD则需要按照1:1.5倍的关系(由于使用SSD能够提高30%-50%的从磁盘加载数据的效率),多个比例按照一个均值的方式求出一个硬件配置的比例关系,然后再按照第一步的方式再对数据进行分发。Further, when the data increases, if the configGroup hardware configuration is resolved to the same, the total number of records of each node of the amountGroup is parsed, and the current service load of each search node of the loadGroup is loaded. If the current server load gap of each node search node is within 10 values, only the total number of nodes is distributed, and the total number of each node is proportional (Example 5: 4:3:1, then the amount of data delivered is 1:3:4:5 multiple delivery, there is also a situation when a node is added when a node ratio is very low (for example, 100:89:1, etc., first distribute the current data to the least data search node) Relationship. If the configuration of each search service node is different, for example, the number of CPU cores is compared to 5:4:2, the data distribution is distributed according to the relationship of 1.5:1.4:1.2 times, and the data of the memory ratio 5:4:2 is also according to 1.5:1.4:1.2 The relationship is distributed twice, whether the SSD needs to be in accordance with the relationship of 1:1.5 times (because the use of SSD can increase the efficiency of loading data from disk by 30%-50%), multiple ratios are calculated according to an average method of a hardware configuration Proportional relationship, and then distribute the data according to the first step.
进一步地,对于各个搜索服务器负载情况,如果当前服务器硬件资源差距比较小,各个搜索节点数据总量差距不大时,可能因为一些硬件情况或者异常的程序(例:程序编码问题导致死锁)导致当前搜索服务器负载比较高,出现 这样的问题只能尽量剔除当前异常节点分发,由于负载较高可能会导致查询和增加数据耗时非常久,而且如果继续向其增加数据或者查询数据,会导致其服务负载越来越高直到服务挂掉。当数据负载数持续1分钟超过其CPU核数的1.5倍时,主动发起从registerGroup,loadGroup,configGroup,amountGroup删除对应注册信息。Further, for the load of each search server, if the current server hardware resource gap is relatively small, and the data amount of each search node is not large, it may be caused by some hardware conditions or abnormal programs (for example: program coding problems cause deadlocks) The current search server load is relatively high. If such a problem occurs, the current abnormal node distribution can be eliminated as much as possible. Due to the high load, it may take a long time to query and increase data, and if you continue to add data or query data, it will cause it The service load is getting higher and higher until the service hangs. When the number of data loads exceeds 1.5 times the number of its CPU cores for 1 minute, it actively initiates the deletion of the corresponding registration information from registerGroup, loadGroup, configGroup, and amountGroup.
进一步地,当按照上面上面的情况进行切分数据到不同服务节点对应的数据集合表list中,然后从registerGroup中获取所有当前能够正常使用的数据节点,将对应节点list集合数据批量条件到对应的搜索节点中。使用异步多线程同时进行添加操作,每个线程对应一个超文本传输协议http批量添加一个搜索节点。Further, when the data is divided into the data collection table list corresponding to different service nodes according to the above situation, then all data nodes that can be normally used are obtained from the registerGroup, and the batch data of the corresponding node list collection conditions are matched to the corresponding Search node. Asynchronous multithreading is used to add operations simultaneously, and each thread corresponds to a hypertext transfer protocol http to add a search node in batches.
进一步地,当用户发起查询请求时,先根据registerGroup中获取所有的可以使用的搜索节点,使用异步多线程请求将请求参数发到每个请求节点中,对于每个http请求设置查询10秒钟查询超时时间,可能存在集群中某个搜索节点查询耗时非常久,这样导致整个请求耗时比较久,所以设置单个查询的超时时间是以损失查询只返回部分结果来提高搜索性能。Further, when the user initiates a query request, first obtain all available search nodes from the registerGroup, use asynchronous multi-threaded requests to send the request parameters to each request node, and set the query for each HTTP request for 10 seconds. Overtime, there may be a long time for a search node query in the cluster, which results in a long time for the entire request, so setting a single query timeout time is to lose the query and return only part of the results to improve search performance.
进一步地,在服务中当发现当前搜索负责的节点出现负载均值(load average)持续1分钟超过其CPU核数的1.5倍,除了删除registerGroup,loadGroup,configGroup,amountGroup中对应节点信息,还需要能够主动发送邮件或者短信(例:调用邮件和短信服务接口)通知对应运维人员,及时查看解决问题,是的对应信息重新注册到zookeeper的各个group主题中,保证服务能正常提供服务。Further, when it is found in the service that the node responsible for the current search has a load average (load average) lasting more than 1.5 times its CPU core number for 1 minute, in addition to deleting the corresponding node information in registerGroup, loadGroup, configGroup, amountGroup, it also needs to be able to take the initiative Send emails or text messages (for example, call the mail and SMS service interface) to notify the corresponding operation and maintenance personnel, timely check and solve the problem, and the corresponding information is re-registered in each group theme of zookeeper to ensure that the service can provide services normally.
可以看出,通过本申请实施例所描述的数据分配方法,获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息,获取第一待处理数据,依据P个硬件资源配置信息将第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块,将P个数据块分别分发给P个节点中相应的节点进行处理,如此,能够确定P个节点每个上报节点的硬件资源配置信息,并依据该硬件资源配置信息 将待处理数据划分为P份,并分发给相应的节点,实现了依据节点性能进行数据分配,充分发挥了每一节点的处理能力,提升了数据处理效率。It can be seen that, through the data distribution method described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain The first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes The corresponding nodes in the process are processed, so that the hardware resource configuration information of each reporting node of P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information, and distributed to the corresponding nodes, which realizes the basis The performance of the nodes is used for data distribution, which fully exerts the processing power of each node and improves the efficiency of data processing.
与上述一致地,请参阅图2,为本申请实施例提供的一种数据分配方法的实施例流程示意图。本实施例中所描述的数据分配方法,包括以下步骤:Consistent with the above, please refer to FIG. 2, which is a schematic flowchart of an embodiment of a data distribution method according to an embodiment of the present application. The data distribution method described in this embodiment includes the following steps:
201、获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息。201. Obtain hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information.
202、获取第一待处理数据。202. Obtain first data to be processed.
203、依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块。203. Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block.
204、将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。204. Distribute the P data blocks to corresponding nodes of the P nodes for processing.
205、检测到出现Q个新节点时,获取第二待处理数据,所述Q个正整数。205. When Q new nodes are detected, obtain second to-be-processed data, the Q positive integers.
206、预估所述Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量。206. Estimate the upper limit processing data amount of each of the Q new nodes to obtain Q upper limit processing data amounts.
207、在所述第二待处理数据的数据量大于所述Q个上限处理数据量的总和时,依据所述Q个上限处理数据量将所述第二待处理数据划分为第一数据集和第二数据集,将所述第一数据集由所述Q个新节点进行分配,将所述第二数据集由所述P个节点进行分配。207. When the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts, divide the second to-be-processed data into the first data set according to the Q upper-limit processed data amounts In the second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes.
208、在所述第二待处理数据的数据量小于或等于所述Q个上限处理数据量的总和时,获取所述Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将所述第二待处理数据进行划分,得到Q个数据块,将所述Q个数据库分别分发给所述Q个新节点相应的节点进行处理。208. When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain hardware resource configuration information of the Q new nodes, and based on the hardware of the Q new nodes The resource configuration information divides the second data to be processed to obtain Q data blocks, and distributes the Q databases to nodes corresponding to the Q new nodes for processing.
其中,上述步骤201-步骤208所描述的数据分配方法可参考图1A所描述的数据分配方法的对应步骤。For the data distribution method described in the above steps 201-208, reference may be made to the corresponding steps of the data distribution method described in FIG. 1A.
可以看出,通过本申请实施例所描述的数据分配方法,获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息,获取第一待处理数据,依据P个硬件资源配置信息将 第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块,将P个数据块分别分发给P个节点中相应的节点进行处理,检测到出现Q个新节点时,获取第二待处理数据,Q个正整数,预估Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量,在第二待处理数据的数据量大于Q个上限处理数据量的总和时,依据Q个上限处理数据量将第二待处理数据划分为第一数据集和第二数据集,将第一数据集由Q个新节点进行分配,将第二数据集由P个节点进行分配,在第二待处理数据的数据量小于或等于Q个上限处理数据量的总和时,获取Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将第二待处理数据进行划分,得到Q个数据块,将Q个数据库分别分发给Q个新节点相应的节点进行处理如此,能够确定P个节点每个上报节点的硬件资源配置信息,并依据该硬件资源配置信息将待处理数据划分为P份,并分发给相应的节点,实现了依据节点性能进行数据分配,充分发挥了每一节点的处理能力,可以在保证新节点与其子节点均能正常工作的基础上,最大限度地预估数据处理量上限,有利于充分保证系统的稳定性,提升系统处理效率。It can be seen that, through the data distribution method described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain The first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes The corresponding node in the process is processed, and when Q new nodes are detected, the second data to be processed is obtained, Q positive integers, and the upper limit processing data amount of each new node in the Q new nodes is estimated to obtain Q upper limits Processing data volume, when the data volume of the second to-be-processed data is greater than the sum of the Q upper-limit processing data volumes, the second to-be-processed data is divided into the first data set and the second data set according to the Q upper-limit processing data volume, and The first data set is allocated by Q new nodes, and the second data set is allocated by P nodes. When the data amount of the second data to be processed is less than or equal to the sum of the Q upper limit processing data amounts, Q new nodes are acquired The hardware resource configuration information of the node, and the second data to be processed is divided according to the hardware resource configuration information of the Q new nodes to obtain Q data blocks, and Q databases are distributed to the corresponding nodes of the Q new nodes for processing In this way, it is possible to determine the hardware resource configuration information of each reported node of P nodes, and divide the data to be processed into P shares according to the hardware resource configuration information, and distribute to the corresponding nodes, to achieve data distribution according to the performance of the node, fully By exerting the processing power of each node, on the basis of ensuring that the new node and its child nodes can work normally, the upper limit of the amount of data processing can be estimated to the greatest extent, which is conducive to fully ensuring the stability of the system and improving the processing efficiency of the system.
与上述一致地,以下为实施上述数据分配方法的装置,具体如下:Consistent with the above, the following is an apparatus for implementing the above data distribution method, specifically as follows:
请参阅图3,为本申请实施例提供的一种数据分配装置的实施例结构示意图。本实施例中所描述的数据分配装置,包括:第一获取单元301、第二获取单元302、划分单元303和分发单元304,具体如下:Please refer to FIG. 3, which is a schematic structural diagram of an embodiment of a data distribution device according to an embodiment of the present application. The data distribution device described in this embodiment includes: a first acquisition unit 301, a second acquisition unit 302, a division unit 303, and a distribution unit 304, as follows:
第一获取单元301,用于获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息;The first obtaining unit 301 is configured to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;
第二获取单元302,用于获取第一待处理数据;The second obtaining unit 302 is configured to obtain the first data to be processed;
划分单元303,用于依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块;The dividing unit 303 is configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
分发单元304,用于将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。The distribution unit 304 is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.
可以看出,通过本申请实施例所描述的数据分配装置,获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息,获取第一待处理数据,依据P个硬件资源配置信息将第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块,将P个数据块分别分发给P个节点中相应的节点进行处理,如此,能够确定P个节点每个上报节点的硬件资源配置信息,并依据该硬件资源配置信息将待处理数据划分为P份,并分发给相应的节点,实现了依据节点性能进行数据分配,充分发挥了每一节点的处理能力,提升了数据处理效率。It can be seen that, through the data distribution device described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a hardware resource configuration information to obtain The first data to be processed divides the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and the P data blocks are distributed to P nodes The corresponding nodes in the process are processed, so that the hardware resource configuration information of each reporting node of P nodes can be determined, and the data to be processed is divided into P shares according to the hardware resource configuration information, and distributed to the corresponding nodes, which realizes the basis The performance of nodes is used for data distribution, which fully exerts the processing power of each node and improves the efficiency of data processing.
其中,上述第一获取单元301可用于实现上述步骤101所描述的方法,第二获取单元302可用于实现上述步骤102所描述的方法,上述划分单元303可用于实现上述步骤103所描述的方法,上述分发单元304可用于实现上述步骤104所描述的方法,以下如此类推。Among them, the first obtaining unit 301 may be used to implement the method described in step 101 above, the second obtaining unit 302 may be used to implement the method described in step 102 above, and the dividing unit 303 may be used to implement the method described in step 103 above, The above-mentioned distribution unit 304 may be used to implement the method described in step 104 above, and so on.
在一个可能的示例中,在所述依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块方面,所述划分单元303具体用于:In a possible example, in terms of dividing the first data to be processed according to the P pieces of hardware resource configuration information to obtain P pieces of data, the dividing unit 303 is specifically configured to:
依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值;Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;
依据所述P个性能评价值确定所述P个节点中每一节点对应的分配比例值,得到P个分配比例值,所述P个分配比例值之和为1;Determining the allocation ratio value corresponding to each of the P nodes according to the P performance evaluation values, to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;
依据所述P个分配比例值将所述第一待处理数据进行划分,得到所述P个数据块。Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.
在一个可能的示例中,所述硬件资源配置信息包括:中央处理器的核数、内存大小和负载值;In a possible example, the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;
在所述依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值方面,所述划分单元303具体用于:In terms of determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtaining P performance evaluation values, the dividing unit 303 is specifically configured to:
按照预设的核数与第一评价值之间的映射关系,确定硬件资源配置信息i中的核数对应的目标第一评价值,所述硬件资源配置信息i为所述P个硬件配置 资源信息中的任一硬件资源配置信息;Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;
按照预设的内存大小与第二评价值之间的映射关系,确定所述硬件资源配置信息i中的内存大小对应的目标第二评价值;Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;
按照预设的负载值与第三评价值之间的映射关系,确定所述硬件资源配置信息i中的负载值对应的目标第三评价值;Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the mapping relationship between the preset load value and the third evaluation value;
获取所述第一评价值对应的第一权值、所述第二评价值对应的第二权值以及所述第三评价值对应的第三权值,所述第一权值、所述第二权值与所述第三权值之和为1;Acquiring a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value, the first weight value, the third weight value The sum of the second weight and the third weight is 1;
依据所述目标第一评价值、所述目标第二评价值、所述目标第三评价值、所述第一权值、所述第二权值和所述第三权值进行加权运算,得到所述硬件资源配置信息i对应的评价值。Performing a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value to obtain The evaluation value corresponding to the hardware resource configuration information i.
可选地,如图3B所示,图3B示出了图3A所描述的数据分配装置的又一变型结构,其与图3A相比较,还可以包括:预估单元305和处理单元306,具体如下:Optionally, as shown in FIG. 3B, FIG. 3B shows yet another modified structure of the data distribution device described in FIG. 3A. Compared with FIG. 3A, it may further include: an estimation unit 305 and a processing unit 306, specifically as follows:
第二获取单元302,还用于检测到出现Q个新节点时,获取第二待处理数据,所述Q为正整数;The second obtaining unit 302 is further configured to obtain second to-be-processed data when Q new nodes are detected, where Q is a positive integer;
预估单元305,用于预估所述Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量;The estimation unit 305 is configured to estimate the upper limit processing data amount of each of the Q new nodes to obtain Q upper limit processing data amounts;
处理单元306,还用于在所述第二待处理数据的数据量大于所述Q个上限处理数据量的总和时,依据所述Q个上限处理数据量将所述第二待处理数据划分为第一数据集和第二数据集,将所述第一数据集由所述Q个新节点进行分配,将所述第二数据集由所述P个节点进行分配;以及在所述第二待处理数据的数据量小于或等于所述Q个上限处理数据量的总和时,获取所述Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将所述第二待处理数据进行划分,得到Q个数据块,将所述Q个数据块分别分发给所述Q个新节点相应的节点进行处理。The processing unit 306 is further configured to divide the second to-be-processed data into the second to-be-processed data according to the Q upper-limit processed data amounts when the data amount of the second to-be-processed data is greater than the sum of the Q upper-limit processed data amounts A first data set and a second data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes; and in the second When the data amount of the processed data is less than or equal to the sum of the Q upper limit processed data amounts, the hardware resource configuration information of the Q new nodes is obtained, and the second The data to be processed is divided to obtain Q data blocks, and the Q data blocks are distributed to nodes corresponding to the Q new nodes for processing.
可选地,如图3C所示,图3C示出了图3A所描述的数据分配装置的又一变型结构,其与图3A相比较,还可以包括:预警单元307,具体如下:Optionally, as shown in FIG. 3C, FIG. 3C shows another modified structure of the data distribution device described in FIG. 3A. Compared with FIG. 3A, it may further include: an early warning unit 307, as follows:
所述预警单元307,用于在检测到所述P个节点中任一节点的负载值超过预设阈值时,则删除该任一节点对应的节点信息,或者,向管理员发送报警信息。The early warning unit 307 is configured to delete the node information corresponding to any node or send an alarm message to the administrator when it is detected that the load value of any one of the P nodes exceeds a preset threshold.
可以理解的是,本实施例的数据分配装置的各程序模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。It can be understood that the functions of each program module of the data distribution apparatus of this embodiment may be specifically implemented according to the method in the above method embodiment, and the specific implementation process may refer to the related description of the above method embodiment, which will not be repeated here.
与上述一致地,请参阅图4,为本申请实施例提供的一种服务器的实施例结构示意图。本实施例中所描述的服务器,包括:至少一个输入设备1000;至少一个输出设备2000;至少一个处理器3000,例如CPU;和存储器4000,上述输入设备1000、输出设备2000、处理器3000和存储器4000通过总线5000连接。Consistent with the above, please refer to FIG. 4, which is a schematic structural diagram of an embodiment of a server according to an embodiment of the present application. The server described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and memory 4000, the above input device 1000, output device 2000, processor 3000, and memory 4000 is connected via bus 5000.
其中,上述输入设备1000具体可为触控面板、物理按键或者鼠标。The input device 1000 may specifically be a touch panel, physical buttons, or a mouse.
上述输出设备2000具体可为显示屏。The above output device 2000 may specifically be a display screen.
上述存储器4000可以是高速RAM存储器,也可为非易失存储器(non-volatile memory),例如磁盘存储器。上述存储器4000用于存储一组程序代码,上述输入设备1000、输出设备2000和处理器3000用于调用存储器4000中存储的程序代码,执行如下操作:The above-mentioned memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. The above memory 4000 is used to store a set of program codes, and the above input device 1000, output device 2000, and processor 3000 are used to call the program codes stored in the memory 4000, and perform the following operations:
上述处理器3000,用于:The aforementioned processor 3000 is used for:
获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息;Obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one hardware resource configuration information;
获取第一待处理数据;Obtain the first data to be processed;
依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块;Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。Distribute the P data blocks to corresponding nodes of the P nodes for processing.
可以看出,通过本申请实施例所描述的服务器,获取P个节点中每一节点 上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息,获取第一待处理数据,依据P个硬件资源配置信息将第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块,将P个数据块分别分发给P个节点中相应的节点进行处理,如此,能够确定P个节点每个上报节点的硬件资源配置信息,并依据该硬件资源配置信息将待处理数据划分为P份,并分发给相应的节点,实现了依据节点性能进行数据分配,充分发挥了每一节点的处理能力,提升了数据处理效率。It can be seen that through the server described in the embodiment of the present application, the hardware resource configuration information reported by each of the P nodes is obtained to obtain the P hardware resource configuration information, and each node corresponds to a piece of hardware resource configuration information to obtain the first For the data to be processed, divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, each hardware resource configuration information corresponds to a data block, and distribute the P data blocks to P nodes Process, so that it can determine the hardware resource configuration information of each reported node of P nodes, and divide the data to be processed into P shares according to the hardware resource configuration information, and distribute to the corresponding nodes, according to the performance of the node Carry out data distribution, give full play to the processing power of each node, and improve the efficiency of data processing.
上述处理器3000具体用于处理的内容在上述数据分配方法均有体现,在此不再赘述。The content specifically processed by the processor 3000 is reflected in the above data distribution method, and will not be repeated here.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任何一种数据分配方法的部分或全部步骤。An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes some or all steps of any one of the data distribution methods described in the foregoing method embodiments.
本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例上述任一种数据分配方法中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。An embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute any of the above described embodiments of the present application Or all the steps described in this data distribution method. The computer program product may be a software installation package.

Claims (10)

  1. 一种数据分配方法,其特征在于,包括:A data distribution method, characterized in that it includes:
    获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息;Obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one hardware resource configuration information;
    获取第一待处理数据;Obtain the first data to be processed;
    依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,每一硬件资源配置信息对应一数据块;Divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
    将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。Distribute the P data blocks to corresponding nodes of the P nodes for processing.
  2. 根据权利要求1所述的方法,其特征在于,所述依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块,包括:The method according to claim 1, wherein the dividing the first data to be processed according to the P pieces of hardware resource configuration information to obtain P pieces of data includes:
    依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值;Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;
    依据所述P个性能评价值确定所述P个节点中每一节点对应的分配比例值,得到P个分配比例值,所述P个分配比例值之和为1;Determining the allocation ratio value corresponding to each of the P nodes according to the P performance evaluation values, to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;
    依据所述P个分配比例值将所述第一待处理数据进行划分,得到所述P个数据块。Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.
  3. 根据权利要求2所述的方法,其特征在于,所述硬件资源配置信息包括:中央处理器的核数、内存大小和负载值;The method according to claim 2, wherein the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processor;
    所述依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值,包括:The determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information to obtain P performance evaluation values includes:
    按照预设的核数与第一评价值之间的映射关系,确定硬件资源配置信息i中的核数对应的目标第一评价值,所述硬件资源配置信息i为所述P个硬件配置资源信息中的任一硬件资源配置信息;Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;
    按照预设的内存大小与第二评价值之间的映射关系,确定所述硬件资源配置信息i中的内存大小对应的目标第二评价值;Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;
    按照预设的负载值与第三评价值之间的映射关系,确定所述硬件资源配置信息i中的负载值对应的目标第三评价值;Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the mapping relationship between the preset load value and the third evaluation value;
    获取所述第一评价值对应的第一权值、所述第二评价值对应的第二权值以及所述第三评价值对应的第三权值,所述第一权值、所述第二权值与所述第三权值之和为1;Acquiring a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value, the first weight value, the third weight value The sum of the second weight and the third weight is 1;
    依据所述目标第一评价值、所述目标第二评价值、所述目标第三评价值、所述第一权值、所述第二权值和所述第三权值进行加权运算,得到所述硬件资源配置信息i对应的评价值。Performing a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value to obtain The evaluation value corresponding to the hardware resource configuration information i.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    检测到出现Q个新节点时,获取第二待处理数据,所述Q为正整数;When Q new nodes are detected, the second data to be processed is obtained, and the Q is a positive integer;
    预估所述Q个新节点中的每一新节点的上限处理数据量,得到Q个上限处理数据量;Estimating the upper limit processing data volume of each of the Q new nodes to obtain Q upper limit processing data volume;
    在所述第二待处理数据的数据量大于所述Q个上限处理数据量的总和时,依据所述Q个上限处理数据量将所述第二待处理数据划分为第一数据集和第二数据集,将所述第一数据集由所述Q个新节点进行分配,将所述第二数据集由所述P个节点进行分配;When the data amount of the second to-be-processed data is greater than the sum of the Q upper limit processing data amounts, the second to-be-processed data is divided into a first data set and a second according to the Q upper limit processing data amounts A data set, the first data set is allocated by the Q new nodes, and the second data set is allocated by the P nodes;
    在所述第二待处理数据的数据量小于或等于所述Q个上限处理数据量的总和时,获取所述Q个新节点的硬件资源配置信息,并依据该Q个新节点的硬件资源配置信息将所述第二待处理数据进行划分,得到Q个数据块,将所述Q个数据块分别分发给所述Q个新节点相应的节点进行处理。When the data amount of the second to-be-processed data is less than or equal to the sum of the Q upper limit processed data amounts, obtain the hardware resource configuration information of the Q new nodes, and according to the hardware resource configuration of the Q new nodes The information divides the second data to be processed to obtain Q data blocks, and distributes the Q data blocks to nodes corresponding to the Q new nodes for processing.
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    在检测到所述P个节点中任一节点的负载值超过预设阈值时,则删除该任一节点对应的节点信息,或者,向管理员发送报警信息。When it is detected that the load value of any one of the P nodes exceeds a preset threshold, the node information corresponding to the any node is deleted, or an alarm message is sent to the administrator.
  6. 一种数据分配装置,其特征在于,包括:A data distribution device, characterized in that it includes:
    第一获取单元,用于获取P个节点中每一节点上报的硬件资源配置信息,得到P个硬件资源配置信息,每一节点对应一个硬件资源配置信息;The first obtaining unit is used to obtain the hardware resource configuration information reported by each of the P nodes to obtain P hardware resource configuration information, and each node corresponds to one piece of hardware resource configuration information;
    第二获取单元,用于获取第一待处理数据;A second obtaining unit, configured to obtain the first data to be processed;
    划分单元,用于依据所述P个硬件资源配置信息将所述第一待处理数据进 行划分,得到P个数据块,每一硬件资源配置信息对应一数据块;A dividing unit, configured to divide the first data to be processed according to the P hardware resource configuration information to obtain P data blocks, and each hardware resource configuration information corresponds to a data block;
    分发单元,用于将所述P个数据块分别分发给所述P个节点中相应的节点进行处理。The distribution unit is configured to distribute the P data blocks to corresponding nodes of the P nodes for processing.
  7. 根据权利要求6所述的装置,其特征在于,在所述依据所述P个硬件资源配置信息将所述第一待处理数据进行划分,得到P个数据块方面,所述划分单元具体用于:The apparatus according to claim 6, wherein in terms of dividing the first data to be processed according to the P pieces of hardware resource configuration information to obtain P pieces of data, the dividing unit is specifically used for :
    依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值;Determine the performance evaluation value of each of the P nodes according to the P hardware resource configuration information, and obtain P performance evaluation values;
    依据所述P个性能评价值确定所述P个节点中每一节点对应的分配比例值,得到P个分配比例值,所述P个分配比例值之和为1;Determining the allocation ratio value corresponding to each of the P nodes according to the P performance evaluation values, to obtain P allocation ratio values, and the sum of the P allocation ratio values is 1;
    依据所述P个分配比例值将所述第一待处理数据进行划分,得到所述P个数据块。Divide the first data to be processed according to the P allocation ratio values to obtain the P data blocks.
  8. 根据权利要求7所述的装置,其特征在于,所述硬件资源配置信息包括:中央处理器的核数、内存大小和负载值;The device according to claim 7, wherein the hardware resource configuration information includes: the number of cores, memory size, and load value of the central processing unit;
    在所述依据所述P个硬件资源配置信息确定所述P个节点中每一节点的性能评价值,得到P个性能评价值方面,所述划分单元具体用于:In terms of determining the performance evaluation value of each of the P nodes according to the P hardware resource configuration information and obtaining P performance evaluation values, the dividing unit is specifically used to:
    按照预设的核数与第一评价值之间的映射关系,确定硬件资源配置信息i中的核数对应的目标第一评价值,所述硬件资源配置信息i为所述P个硬件配置资源信息中的任一硬件资源配置信息;Determine the target first evaluation value corresponding to the core number in the hardware resource configuration information i according to the mapping relationship between the preset core number and the first evaluation value, the hardware resource configuration information i is the P hardware configuration resources Any hardware resource configuration information in the information;
    按照预设的内存大小与第二评价值之间的映射关系,确定所述硬件资源配置信息i中的内存大小对应的目标第二评价值;Determine the target second evaluation value corresponding to the memory size in the hardware resource configuration information i according to the mapping relationship between the preset memory size and the second evaluation value;
    按照预设的负载值与第三评价值之间的映射关系,确定所述硬件资源配置信息i中的负载值对应的目标第三评价值;Determine the target third evaluation value corresponding to the load value in the hardware resource configuration information i according to the mapping relationship between the preset load value and the third evaluation value;
    获取所述第一评价值对应的第一权值、所述第二评价值对应的第二权值以及所述第三评价值对应的第三权值,所述第一权值、所述第二权值与所述第三权值之和为1;Acquiring a first weight value corresponding to the first evaluation value, a second weight value corresponding to the second evaluation value, and a third weight value corresponding to the third evaluation value, the first weight value, the third weight value The sum of the second weight and the third weight is 1;
    依据所述目标第一评价值、所述目标第二评价值、所述目标第三评价值、所述第一权值、所述第二权值和所述第三权值进行加权运算,得到所述硬件资源配置信息i对应的评价值。Performing a weighted operation according to the target first evaluation value, the target second evaluation value, the target third evaluation value, the first weight value, the second weight value, and the third weight value to obtain The evaluation value corresponding to the hardware resource configuration information i.
  9. 一种服务器,其特征在于,包括处理器、存储器,所述存储器用于存储一个或多个程序,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-5任一项所述的方法中的步骤的指令。A server, characterized in that it includes a processor and a memory, the memory is used to store one or more programs, and is configured to be executed by the processor, the program includes a program for executing any of claims 1-5 An instruction of steps in a method as described.
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求1-5任一项所述的方法。A computer-readable storage medium storing a computer program, the computer program is executed by a processor to implement the method according to any one of claims 1-5.
PCT/CN2019/121613 2018-12-27 2019-11-28 Data distribution method and related product WO2020134840A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811613722.9A CN109800204B (en) 2018-12-27 2018-12-27 Data distribution method and related product
CN201811613722.9 2018-12-27

Publications (1)

Publication Number Publication Date
WO2020134840A1 true WO2020134840A1 (en) 2020-07-02

Family

ID=66557924

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121613 WO2020134840A1 (en) 2018-12-27 2019-11-28 Data distribution method and related product

Country Status (2)

Country Link
CN (1) CN109800204B (en)
WO (1) WO2020134840A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800204B (en) * 2018-12-27 2021-03-05 深圳云天励飞技术有限公司 Data distribution method and related product
CN110287000B (en) * 2019-05-29 2021-08-17 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN111625644B (en) * 2020-04-14 2023-09-12 北京捷通华声科技股份有限公司 Text classification method and device
CN112887919B (en) * 2021-01-18 2022-07-05 浙江百应科技有限公司 Short message sending method and system for multi-channel short message cluster scheduling and electronic equipment
CN113242302A (en) * 2021-05-11 2021-08-10 鸬鹚科技(深圳)有限公司 Data access request processing method and device, computer equipment and medium
CN114201319A (en) * 2022-02-17 2022-03-18 广东东华发思特软件有限公司 Data scheduling method, device, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013387A (en) * 2007-02-09 2007-08-08 华中科技大学 Load balancing method based on object storage device
CN103902379A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Task scheduling method and device and server cluster
US20140258446A1 (en) * 2013-03-07 2014-09-11 Citrix Systems, Inc. Dynamic configuration in cloud computing environments
CN105912399A (en) * 2016-04-05 2016-08-31 杭州嘉楠耘智信息科技有限公司 Task processing method, device and system
CN109800204A (en) * 2018-12-27 2019-05-24 深圳云天励飞技术有限公司 Data distributing method and Related product

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104363300B (en) * 2014-11-26 2018-06-05 浙江宇视科技有限公司 Task distribution formula dispatching device is calculated in a kind of server cluster
CN105740063A (en) * 2014-12-08 2016-07-06 杭州华为数字技术有限公司 Data processing method and apparatus
CN104657214A (en) * 2015-03-13 2015-05-27 华存数据信息技术有限公司 Multi-queue multi-priority big data task management system and method for achieving big data task management by utilizing system
CN105488134A (en) * 2015-11-25 2016-04-13 用友网络科技股份有限公司 Big data processing method and big data processing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101013387A (en) * 2007-02-09 2007-08-08 华中科技大学 Load balancing method based on object storage device
CN103902379A (en) * 2012-12-25 2014-07-02 中国移动通信集团公司 Task scheduling method and device and server cluster
US20140258446A1 (en) * 2013-03-07 2014-09-11 Citrix Systems, Inc. Dynamic configuration in cloud computing environments
CN105912399A (en) * 2016-04-05 2016-08-31 杭州嘉楠耘智信息科技有限公司 Task processing method, device and system
CN109800204A (en) * 2018-12-27 2019-05-24 深圳云天励飞技术有限公司 Data distributing method and Related product

Also Published As

Publication number Publication date
CN109800204B (en) 2021-03-05
CN109800204A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
WO2020134840A1 (en) Data distribution method and related product
USRE47464E1 (en) Intelligent work load manager
US10764132B2 (en) Scale-out association method and apparatus, and system
CN107241281B (en) Data processing method and device
CN107402956B (en) Data processing method and device for large task and computer readable storage medium
CN107819797B (en) Access request processing method and device
CN104753994A (en) Method and device for data synchronization based on cluster server system
WO2018121404A1 (en) Method and device for timeout monitoring
CN110471749B (en) Task processing method, device, computer readable storage medium and computer equipment
CN106874100B (en) Computing resource allocation method and device
CN105760240A (en) Distributed task processing method and device
US10425273B2 (en) Data processing system and data processing method
CN110647392A (en) Intelligent elastic expansion method based on container cluster
EP3423940A1 (en) A method and device for scheduling resources
WO2016119389A1 (en) Management method, device and system for system docking
US20190012087A1 (en) Method, apparatus, and system for preventing loss of memory data
CN111737027A (en) Lookup processing method, system, terminal and storage medium of distributed storage system
CN107479966B (en) Signaling acquisition method based on multi-core CPU
WO2017114180A1 (en) Component logical threads quantity adjustment method and device
CN213876703U (en) Resource pool management system
CN110209548B (en) Service control method, system, electronic device and computer readable storage medium
CN111158896A (en) Distributed process scheduling method and system
US11736562B1 (en) Method and system for achieving high availability of service under high-load scene in distributed system
CN110019372A (en) Data monitoring method, device, server and storage medium
CN115580522A (en) Method and device for monitoring running state of container cloud platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19902575

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19902575

Country of ref document: EP

Kind code of ref document: A1