CN114579311A - Method, apparatus, device and storage medium for executing distributed computing task - Google Patents

Method, apparatus, device and storage medium for executing distributed computing task Download PDF

Info

Publication number
CN114579311A
CN114579311A CN202210214311.2A CN202210214311A CN114579311A CN 114579311 A CN114579311 A CN 114579311A CN 202210214311 A CN202210214311 A CN 202210214311A CN 114579311 A CN114579311 A CN 114579311A
Authority
CN
China
Prior art keywords
node
nodes
result
hardware information
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210214311.2A
Other languages
Chinese (zh)
Other versions
CN114579311B (en
Inventor
奎志清
李龙
吴志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210214311.2A priority Critical patent/CN114579311B/en
Publication of CN114579311A publication Critical patent/CN114579311A/en
Application granted granted Critical
Publication of CN114579311B publication Critical patent/CN114579311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure provides a method, an apparatus, a device, a storage medium, and a program product for executing a distributed computing task, which relate to the technical field of artificial intelligence, and in particular, to the technical fields of cloud computing, distributed computing, and the like. The specific implementation scheme is as follows: sending to-be-processed computing data related to the distributed computing task to each first node in the first node set, wherein the hardware information of the first nodes is matched with the local hardware information; aggregating first calculation results of each first node aiming at the calculation data to be processed to obtain a first aggregation result; sharing the first aggregation result with each second node in the second node set to obtain a shared result, wherein the hardware information of the second node is not matched with the local hardware information; and determining a sharing result as new to-be-processed computing data, and returning the operation of sending the to-be-processed computing data to each node in the first node set until the distributed computing task is executed.

Description

Method, apparatus, device and storage medium for executing distributed computing task
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and in particular, to the field of cloud computing, distributed computing, and the like.
Background
Distributed computing is a computing method, as opposed to centralized computing. With the development of computing technology, some computing tasks need larger computing power to be completed, and if centralized computing is adopted, the computing tasks need to be completed in a longer time. Distributed computing can break the computing task down into many small parts, distributed to multiple node processes. Therefore, the calculation time can be shortened, and the calculation efficiency can be improved.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, storage medium and program product for performing distributed computing tasks.
According to an aspect of the present disclosure, there is provided a method of performing a distributed computing task, comprising: sending to-be-processed computing data related to a distributed computing task to each first node in a first node set, wherein hardware information of the first nodes is matched with local hardware information;
aggregating the first calculation results of each first node aiming at the calculation data to be processed to obtain first aggregation results; sharing the first aggregation result with each second node in a second node set to obtain a shared result, wherein the hardware information of the second node is not matched with the local hardware information; and determining the sharing result as new to-be-processed computing data, and returning the operation of sending the to-be-processed computing data to each node in the first node set until the distributed computing task is executed.
According to another aspect of the present disclosure, there is provided a method of performing a distributed computing task, comprising: receiving to-be-processed computing data from a main node; performing calculation operation according to the to-be-processed calculation data to obtain a first calculation result; and sending the first calculation result to the master node.
According to another aspect of the present disclosure, there is provided an apparatus for performing a distributed computing task, including: the system comprises a first sending module, a second sending module and a third sending module, wherein the first sending module is used for sending to-be-processed computing data related to distributed computing tasks to each first node in a first node set, and the hardware information of the first nodes is matched with local hardware information; the aggregation module is used for aggregating a first calculation result of each first node aiming at the to-be-processed calculation data to obtain a first aggregation result; a sharing module, configured to share the first aggregation result with each second node in a second node set to obtain a shared result, where hardware information of the second node is not matched with the local hardware information; and the determining module is used for determining the sharing result as new to-be-processed computing data and returning the operation of sending the to-be-processed computing data to each node in the first node set until the distributed computing task is executed.
According to another aspect of the present disclosure, there is provided an apparatus for performing a distributed computing task, including: the receiving module is used for receiving the to-be-processed computing data from the main node; the calculation module is used for performing calculation operation according to the to-be-processed calculation data to obtain a first calculation result; and the sending module is used for sending the first calculation result to the main node.
Another aspect of the present disclosure provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.
According to another aspect of the disclosed embodiments, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method shown in the disclosed embodiments.
According to another aspect of the embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method shown in the embodiments of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of an exemplary distributed system of methods, apparatus, electronic devices, and storage media to perform distributed computing tasks, in accordance with embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of performing a distributed computing task, in accordance with an embodiment of the present disclosure;
FIG. 3 schematically shows a flow diagram of a method of determining a first set of nodes and a second set of nodes, in accordance with an embodiment of the present disclosure;
fig. 4 schematically shows a flow chart of a method of determining a master node according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a method of performing a distributed computing task according to another embodiment of the disclosure;
FIG. 6 schematically shows a block diagram of an apparatus for performing distributed computing tasks according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of an apparatus for performing distributed computing tasks according to another embodiment of the present disclosure; and
FIG. 8 schematically shows a block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An application scenario of the method and apparatus for performing distributed computing tasks provided by the present disclosure will be described below with reference to fig. 1.
FIG. 1 is a schematic diagram of an exemplary distributed system of methods, apparatus, electronic devices, and storage media for performing distributed computing tasks in accordance with embodiments of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the distributed system 100 includes a plurality of nodes, such as nodes 111, 112, 113, 114, 121, 122, 123, 124, 131, 132, 133, and 134.
The plurality of nodes may be based on different hardware architectures, in this embodiment, nodes based on the same hardware architecture may be referred to as homogeneous nodes, and nodes based on different hardware architectures may be referred to as heterogeneous nodes.
For example, nodes 111, 112, 113, and 114 may be based on a GPU (Graphics Processing Unit), nodes 121, 122, 123, and 124 may be based on an NPU (Natural-network Processing Unit), and nodes 131, 132, 133, and 134 may be based on a CPU (Central Processing Unit). Based on this, the nodes 111, 112, 113 and 114 are isomorphic nodes, the nodes 121, 122, 123 and 124 are isomorphic nodes, and the nodes 131, 132, 133 and 134 are isomorphic nodes. Nodes 111, 112, 113, and 114 and nodes 121, 122, 123, and 124 are heterogeneous nodes with respect to each other, nodes 111, 112, 113, and 114 and nodes 131, 132, 133, and 134 are heterogeneous nodes with respect to each other, and nodes 121, 122, 123, and 124 and nodes 131, 132, 133, and 134 are heterogeneous nodes with respect to each other.
According to embodiments of the present disclosure, for example, homogeneous nodes may be grouped into a homogeneous node set. The master nodes in each node set are then determined. And forming a heterogeneous node set by the main nodes in each node set. For example, nodes 111, 112, 113, and 114 may be grouped into a set of homogeneous nodes 10, and node 111 may be determined to be the master node of the set of homogeneous nodes 10. Nodes 121, 122, 123, and 124 may be grouped into a set of homogeneous nodes 20 and node 121 may be determined to be the master node of the set of homogeneous nodes 20. Nodes 131, 132, 133, and 134 may be grouped into a set of homogeneous nodes 30 and node 131 determined to be the master node of the set of homogeneous nodes 30.
According to the embodiment of the disclosure, when a distributed computing task is executed, the master node in each homogeneous node set can send data to be computed to the nodes in the homogeneous node set. And then the main node can aggregate the calculation results of all the nodes in the isomorphic node set to obtain an aggregation result.
According to the embodiment of the disclosure, the absolute computing power of the heterogeneous nodes is different, and the speed of the completed computing task is also different, so that after the computation in the homogeneous node set is completed, data synchronization between the homogeneous node sets needs to be performed. Based on this, each master node may share the aggregation result with other master nodes in the heterogeneous node set. And each main node continues subsequent calculation according to the shared aggregation result. According to embodiments of the present disclosure, data synchronization may be performed between homogeneous node sets on a regular basis. Wherein, the data synchronization period can be determined according to actual needs. It will be appreciated that the number of computations performed in each set of homogeneous nodes may be different at each data synchronization cycle. The isomorphic node set with larger absolute computing power has more computation times, and the isomorphic node set with smaller absolute computing power has more computation times.
For example, an optimal or better communication method may be adopted inside the homogeneous node set, and as the nodes in the homogeneous node set 10 are all based on the GPU, the nodes in the homogeneous node set 10 may communicate using a set communication library dedicated to the GPU. The nodes in the homogeneous node set 20 are based on NPU, and the nodes in the homogeneous node set 20 may communicate using a set communication library dedicated to the NPU. In addition, a general communication mode can be adopted when the heterogeneous nodes communicate, for example, a general set communication library suitable for a GPU, a CPU and an NPU is used among the main nodes 111, 121 and 131.
According to the method for executing the distributed computing task, the nodes can be grouped according to the hardware information, the isomorphic nodes are divided into the isomorphic node sets, the isomorphic node sets are communicated through the main nodes of the isomorphic node sets, and distributed computing by utilizing the heterogeneous nodes can be realized. In addition, a special communication mode can be used in the isomorphic node set, so that hardware resources can be more efficiently utilized, and the calculation efficiency is improved.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the related data such as the calculation data, the hardware information and the like all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The method of performing a distributed computing task provided by the present disclosure will be described below in conjunction with fig. 2. Wherein the method of performing a distributed computing task may be performed by a master node, for example.
FIG. 2 schematically shows a flow diagram of a method of performing a distributed computing task according to an embodiment of the present disclosure.
As shown in fig. 2, the method 200 of performing a distributed computing task includes, at operation S210a, a master node sending pending computing data related to the distributed computing task to each first node in a first set of nodes.
According to an embodiment of the present disclosure, the local hardware information may be, for example, hardware information of the master node. The hardware information may include, for example, a hardware type. The first node may be, for example, a node that matches hardware information of the master node. The first set of nodes may comprise, for example, at least one first node. For example, in this embodiment, if the hardware information of two nodes is matched, it may indicate that the two nodes are isomorphic nodes.
According to an embodiment of the present disclosure, the distributed computing task may include, for example, a training task of a deep learning model, and the like, and the to-be-processed computing data may include, for example, model parameters, and the like.
Then, for each first node, the following operations S210 b-S230 b are performed.
Therein, in operation S210b, pending calculation data is received from the master node.
In operation S220b, a calculation operation is performed according to the calculation data to be processed to obtain a first calculation result.
According to embodiments of the present disclosure, the calculation operation may include, for example, gradient calculation or the like.
In operation S230b, the first calculation result is transmitted to the master node.
Next, the master node continues to perform operations S220 a-S240 a.
In operation S220a, the first calculation results of each first node for the calculation data to be processed are aggregated to obtain a first aggregation result.
According to the embodiment of the present disclosure, the master node may, for example, perform weighted summation on the first calculation results according to the weight of each first node, so as to obtain a first aggregation result. Wherein, the weight of each first node can be set according to actual needs.
According to another embodiment of the present disclosure, the master node may also average all the first calculation results to obtain the first aggregation result, for example.
In operation S230a, the first aggregation result is shared with each second node in the second node set, resulting in a shared result, and the shared result is determined as new to-be-processed calculation data.
According to an embodiment of the present disclosure, the second node may be, for example, a node that does not match the hardware information of the master node. The second set of nodes may comprise, for example, at least one second node. For example, in this embodiment, if the hardware information of two nodes does not match, it may indicate that the two nodes are heterogeneous nodes.
According to an embodiment of the present disclosure, for example, an average of the first aggregation result and all the second aggregation results may be calculated as the shared result.
According to another embodiment of the present disclosure, for example, the first aggregation result and all the second aggregation results may also be weighted and summed according to the weight of each master node, so as to obtain a shared result. The weight of each master node can be set according to actual needs.
In operation S240a, it is determined whether the distributed computing task is executed. If the distributed computing task has not been completed, then operation S220a is returned to. And if the distributed computing task is not completely executed, ending the distributed computing task.
According to the embodiment of the disclosure, whether the distributed computing task is executed completely can be determined according to whether the execution rounds reach the preset number. And if the execution turns reach the preset number, determining that the distributed computing task is executed completely. And if the execution turns do not reach the preset number, determining that the distributed computing task is not executed completely.
According to the method for executing the distributed computing task, the distributed computing can be realized by utilizing the heterogeneous nodes, the hardware resource can be more effectively utilized, and the computing efficiency is improved.
According to another embodiment of the present disclosure, the master node may also perform a calculation operation on the calculation data to be processed to obtain a second calculation result. Based on this, after receiving the first calculation result from each first node, the second calculation result and the first calculation result of each first node may be weighted and summed to obtain a first aggregation result. The weight of the master node and each first node can be set according to actual needs. According to another embodiment of the present disclosure, the master node may also average the second calculation result and all the first calculation results, for example, to obtain the first aggregation result.
The method for determining the first node set and the second node set provided by the present disclosure will be described below with reference to fig. 3. Wherein the method of performing a distributed computing task may be performed by a master node, for example.
Fig. 3 schematically shows a flow chart of a method of determining a first set of nodes and a second set of nodes according to an embodiment of the disclosure.
As shown in fig. 3, the method of determining the first node set and the second node set may include acquiring local hardware information and hardware information of a plurality of nodes in operation S310.
According to an embodiment of the present disclosure, the hardware information may include, for example, a hardware type. The hardware types may include, for example, GPU, CPU, NPU, and so on.
In operation S320, a node, of the plurality of nodes, whose hardware information matches the local hardware information is determined as a first node, resulting in a first node set.
According to an embodiment of the present disclosure, for example, a node of the same hardware type as the master node may be determined as the first node. All the first nodes are grouped into a first node set.
In operation S330, a master node is determined among nodes in which hardware information does not match with local hardware information among the plurality of nodes, and the master node is used as a second node to obtain a second node set.
According to an embodiment of the present disclosure, for example, a node that is not the same type of hardware as the master node may be determined. And then determining a main node of the nodes with different hardware types as a second node. All the second nodes are grouped into a second node set.
The method for determining a master node provided by the present disclosure will be described below with reference to fig. 4. The method for determining the master node may be performed by the first node, for example.
Fig. 4 schematically shows a flow chart of a method of determining a master node according to an embodiment of the present disclosure.
As shown in fig. 4, the method of determining a master node may include acquiring hardware information of a plurality of nodes in operation S410.
According to an embodiment of the present disclosure, the hardware information may include, for example, a hardware type and a node identification.
In operation S420, a master node is determined among the plurality of nodes according to hardware information among the plurality of nodes.
According to the embodiment of the disclosure, a node of the plurality of nodes, which is the same in hardware type as the first node, i.e., a homogeneous node, may be determined. And then determining the main nodes in the isomorphic nodes according to the node identifications of the isomorphic nodes. Illustratively, the node identification may include a number, and the master node in the homogeneous node may be determined according to the size of the number. For example, the least numbered homogeneous node acts as the master node.
According to embodiments of the present disclosure, after determining the first set of nodes, the master node may establish a first communication domain with each first node in the first set of nodes. Based on this, the master node may for example broadcast the pending calculation data in the first communication domain, so that the pending calculation data is sent to each first node of the first set of nodes. Correspondingly, each first node may also establish a first communication domain with the master node. The first calculation result is then sent to the master node via the first communication domain.
According to another embodiment of the present disclosure, the nodes may also be grouped in advance. And forming the nodes matched with the hardware information into a homogeneous node set. The master nodes in each set of homogeneous nodes are then determined. And forming a heterogeneous node set by the main nodes in each homogeneous node set. For each master node, other nodes in the homogeneous node set to which the master node belongs form a first node set, and other master nodes in the heterogeneous node set to which the master node belongs form a second node set.
The method for performing a distributed computing task described above is further described with reference to FIG. 5 in conjunction with specific embodiments. Those skilled in the art will appreciate that the following example embodiments are only for the understanding of the present disclosure, and the present disclosure is not limited thereto.
FIG. 5 schematically shows a schematic diagram of a method of performing a distributed computing task according to another embodiment of the present disclosure.
As shown in fig. 5, the identification of nodes may include a0, a1, a2, b0, b1, b2, b3, c0, c1, c2, c3, d0, d1, d2, and d 3. The nodes can be started respectively, and the information of the hardware type, the node identification, the network address and the like of the node can be obtained through the built-in program. The network address may comprise, for example, an IP address.
Each node sends the hardware type, the node identification and the network address to a registration center so as to be registered for the node by the registration center. After the registration is successful, each node acquires global information through a registration center, wherein the global information comprises hardware type node identifications and network addresses of all nodes. Illustratively, the registry may be implemented using services such as a pre-declared master node or etcd. Among them, etcd is a distributed key value database.
According to the embodiment of the disclosure, each node determines isomorphic nodes matched with the hardware information of the node and heterogeneous nodes unmatched with the hardware information of the node according to the hardware information of the node and the hardware information of other nodes, and isomorphic node sets are formed among the isomorphic nodes. For example, nodes a0, a1, a2, and a3 form a set of isomorphic nodes g 1. The nodes b0, b1, b2 and b3 form an isomorphic node set g 2. The nodes c0, c1, c2 and c3 form an isomorphic node set g 3. Nodes d0, d1, d2, and d3 form a homogenous node set g 4.
According to the embodiment of the disclosure, each node may sort all nodes according to the node identifiers of all nodes. Illustratively, in this embodiment, the master nodes may be determined for the nodes according to the number numbers in the node identifiers, for example, node number 0 in each isomorphic set serves as the master node, that is, node a0, node b0, node c0, and node d0 serve as the master nodes of isomorphic sets g1, g2, g3, and g4, respectively. a0, b0, c0, and d0 form heterogeneous set g 5.
Then, each node starts its own working process, and establishes a communication domain through the acquired network address. The main node establishes an isomorphic communication domain and a heterogeneous communication domain, and the non-main node establishes an isomorphic communication domain. Each master node broadcasts the first data in a corresponding homogeneous communication domain to send the first data to each homogeneous node in the homogeneous communication domain.
And then, each node performs preset calculation operation according to the first data to obtain a calculation result.
And responding to the arrival of each synchronization period, and sending the calculation result obtained by the non-master node through self calculation to the master node of the isomorphic node set.
And the master node receives the calculation results from all the nodes in the isomorphic node set. And then aggregating the calculation results of the self calculation and the calculation results of other nodes in the isomorphic node set to obtain aggregated data. The aggregated data is then broadcast in the heterogeneous communication domain for transmission to the master nodes of other homogeneous sets.
The master node receives aggregated data sent from other master nodes. And determining and updating the to-be-processed calculation data according to the aggregation data of other main nodes and the aggregation data of the main nodes. And then sending the calculation data to be processed to each node in the same set so as to carry out the next round of calculation according to the new calculation data to be processed.
In the related art, collective communication is designed based on homogeneous hardware, and communication is not supported by heterogeneous hardware.
According to the method for executing the distributed computing task, the distributed computing can be realized by utilizing the heterogeneous nodes, the hardware resource can be more effectively utilized, and the computing efficiency is improved.
FIG. 6 schematically shows a block diagram of an apparatus for performing a distributed computing task according to an embodiment of the present disclosure.
As shown in fig. 6, the apparatus 600 for performing distributed computing tasks includes a first sending module 610, an aggregation module 620, a sharing module 630, and a determination module 640.
The first sending module 610 is configured to send to-be-processed computing data related to the distributed computing task to each first node in the first node set, where hardware information of the first node is matched with local hardware information.
The aggregating module 620 is configured to aggregate the first calculation result of each first node for the to-be-processed calculation data to obtain a first aggregation result.
The sharing module 630 is configured to share the first aggregation result with each second node in the second node set to obtain a shared result, where the hardware information of the second node is not matched with the local hardware information.
The determining module 640 is configured to determine the sharing result as new to-be-processed computing data, and return an operation of sending the to-be-processed computing data to each node in the first node set until the distributed computing task is executed.
FIG. 7 schematically illustrates a block diagram of an apparatus for performing a distributed computing task, according to another embodiment of the present disclosure.
As shown in FIG. 7, the apparatus 700 for performing distributed computing tasks includes a receiving module 710, a computing module 720, and a sending module 730.
The receiving module 710 is configured to receive the to-be-processed computing data from the master node.
The calculating module 720 is configured to perform a calculating operation according to the to-be-processed calculating data to obtain a first calculating result.
A sending module 730, configured to send the first calculation result to the master node.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 8 schematically illustrates a block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 801 performs the various methods and processes described above, such as a method of performing a distributed computing task. For example, in some embodiments, the method of performing distributed computing tasks may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, a computer program may perform one or more steps of the above described method of performing a distributed computing task. Alternatively, in other embodiments, the computing unit 801 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of performing distributed computing tasks.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (14)

1. A method of performing a distributed computing task, comprising:
sending to-be-processed computing data related to a distributed computing task to each first node in a first node set, wherein hardware information of the first nodes is matched with local hardware information;
aggregating the first calculation results of each first node aiming at the calculation data to be processed to obtain first aggregation results;
sharing the first aggregation result with each second node in a second node set to obtain a shared result, wherein the hardware information of the second node is not matched with the local hardware information; and
and determining the sharing result as new to-be-processed computing data, and returning the operation of sending the to-be-processed computing data to each node in the first node set until the distributed computing task is executed.
2. The method of claim 1, further comprising:
acquiring the local hardware information and hardware information of a plurality of nodes;
determining a node of the plurality of nodes, the hardware information of which is matched with the local hardware information, as a first node, and obtaining the first node set; and
and determining a main node from the nodes of which the hardware information is not matched with the local hardware information in the plurality of nodes, and taking the main node as the second node to obtain the second node set.
3. The method of claim 1 or 2, further comprising:
establishing a first communication domain with each first node in the first set of nodes;
wherein sending the to-be-processed computing data to each first node in the first set of nodes comprises:
broadcasting the pending computed data in the first communication domain.
4. The method of claim 1, wherein the aggregating the first computation results of each first node for the computed data to be processed to obtain a first aggregated result comprises:
calculating the to-be-processed calculation data to obtain a second calculation result;
receiving a first calculation result from each first node; and
and performing weighted summation on the second calculation result and the first calculation result of each first node to obtain the first aggregation result.
5. The method of claim 1 or 2, further comprising:
establishing a second communication domain with each second node in the set of second nodes;
wherein the sharing the first aggregation result with each second node in the second node set to obtain a shared result includes:
broadcasting the first aggregation result in the second communication domain;
receiving a second aggregation result from each of the second nodes; and
and determining the sharing result according to the first aggregation result and the second aggregation result.
6. The method of claim 5, wherein the determining the shared result from the first aggregated result and the second aggregated result comprises:
calculating an average of the first and second aggregation results as the shared result.
7. A method of performing a distributed computing task, comprising:
receiving to-be-processed computing data from a main node;
performing calculation operation according to the to-be-processed calculation data to obtain a first calculation result; and
and sending the first calculation result to the main node.
8. The method of claim 7, further comprising:
acquiring hardware information and node identifiers of a plurality of nodes; and
and determining the main node in the plurality of nodes according to the hardware information and the node identification in the plurality of nodes.
9. The method of claim 8, further comprising:
a first communication domain is established with the master node.
Wherein sending the first computation result to the master node comprises:
and sending the first calculation result to the main node through the first communication domain.
10. An apparatus to perform distributed computing tasks, comprising:
the system comprises a first sending module, a second sending module and a third sending module, wherein the first sending module is used for sending to-be-processed computing data related to distributed computing tasks to each first node in a first node set, and the hardware information of the first nodes is matched with local hardware information;
the aggregation module is used for aggregating a first calculation result of each first node aiming at the to-be-processed calculation data to obtain a first aggregation result;
the sharing module is used for sharing the first aggregation result with each second node in a second node set to obtain a sharing result, wherein the hardware information of the second node is not matched with the local hardware information; and
and the determining module is used for determining the sharing result as new to-be-processed computing data and returning the operation of sending the to-be-processed computing data to each node in the first node set until the distributed computing task is executed.
11. An apparatus for performing distributed computing tasks, comprising:
the receiving module is used for receiving the to-be-processed computing data from the main node;
the calculation module is used for performing calculation operation according to the to-be-processed calculation data to obtain a first calculation result; and
and the sending module is used for sending the first calculation result to the main node.
12. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
14. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method of any of claims 1-9.
CN202210214311.2A 2022-03-04 2022-03-04 Method, device, equipment and storage medium for executing distributed computing task Active CN114579311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210214311.2A CN114579311B (en) 2022-03-04 2022-03-04 Method, device, equipment and storage medium for executing distributed computing task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210214311.2A CN114579311B (en) 2022-03-04 2022-03-04 Method, device, equipment and storage medium for executing distributed computing task

Publications (2)

Publication Number Publication Date
CN114579311A true CN114579311A (en) 2022-06-03
CN114579311B CN114579311B (en) 2023-05-30

Family

ID=81773683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210214311.2A Active CN114579311B (en) 2022-03-04 2022-03-04 Method, device, equipment and storage medium for executing distributed computing task

Country Status (1)

Country Link
CN (1) CN114579311B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024001861A1 (en) * 2022-06-29 2024-01-04 华为技术有限公司 Model training method, apparatus and system, and related device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040098447A1 (en) * 2002-11-14 2004-05-20 Verbeke Jerome M. System and method for submitting and performing computational tasks in a distributed heterogeneous networked environment
US20060122942A1 (en) * 1998-04-03 2006-06-08 Francois-Xavier Nuttall System and methods providing secure delivery of licenses and content
CN102882973A (en) * 2012-10-11 2013-01-16 北京邮电大学 Distributed load balancing system and distributed load balancing method based on peer to peer (P2P) technology
CN104852819A (en) * 2015-05-21 2015-08-19 杭州天宽科技有限公司 Energy consumption management method for SLA (service level agreement) based on heterogeneous MapReduce cluster
CN105183796A (en) * 2015-08-24 2015-12-23 同济大学 Distributed link prediction method based on clustering
US9483367B1 (en) * 2014-06-27 2016-11-01 Veritas Technologies Llc Data recovery in distributed storage environments
CN107203422A (en) * 2016-08-28 2017-09-26 深圳晶泰科技有限公司 A kind of job scheduling method towards high-performance calculation cloud platform
US20190146837A1 (en) * 2014-09-29 2019-05-16 Samsung Electronics Co., Ltd. Distributed real-time computing framework using in-storage processing
CN111385352A (en) * 2020-02-26 2020-07-07 深信服科技股份有限公司 Instance control method, node, terminal and distributed storage system
CN111538865A (en) * 2020-03-27 2020-08-14 中国人民解放军国防科技大学 Multi-party set synchronization method and device and electronic equipment
CN112084258A (en) * 2020-08-18 2020-12-15 腾讯科技(深圳)有限公司 Data synchronization method and device
CN113505021A (en) * 2021-05-26 2021-10-15 南京大学 Fault-tolerant method and system based on multi-master-node master-slave distributed architecture

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060122942A1 (en) * 1998-04-03 2006-06-08 Francois-Xavier Nuttall System and methods providing secure delivery of licenses and content
US20040098447A1 (en) * 2002-11-14 2004-05-20 Verbeke Jerome M. System and method for submitting and performing computational tasks in a distributed heterogeneous networked environment
CN102882973A (en) * 2012-10-11 2013-01-16 北京邮电大学 Distributed load balancing system and distributed load balancing method based on peer to peer (P2P) technology
US9483367B1 (en) * 2014-06-27 2016-11-01 Veritas Technologies Llc Data recovery in distributed storage environments
US20190146837A1 (en) * 2014-09-29 2019-05-16 Samsung Electronics Co., Ltd. Distributed real-time computing framework using in-storage processing
CN104852819A (en) * 2015-05-21 2015-08-19 杭州天宽科技有限公司 Energy consumption management method for SLA (service level agreement) based on heterogeneous MapReduce cluster
CN105183796A (en) * 2015-08-24 2015-12-23 同济大学 Distributed link prediction method based on clustering
CN107203422A (en) * 2016-08-28 2017-09-26 深圳晶泰科技有限公司 A kind of job scheduling method towards high-performance calculation cloud platform
CN111385352A (en) * 2020-02-26 2020-07-07 深信服科技股份有限公司 Instance control method, node, terminal and distributed storage system
CN111538865A (en) * 2020-03-27 2020-08-14 中国人民解放军国防科技大学 Multi-party set synchronization method and device and electronic equipment
CN112084258A (en) * 2020-08-18 2020-12-15 腾讯科技(深圳)有限公司 Data synchronization method and device
CN113505021A (en) * 2021-05-26 2021-10-15 南京大学 Fault-tolerant method and system based on multi-master-node master-slave distributed architecture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAN HURU等: "BigClue: Towards a generic IoT cross-domain data processing platform", 《2018 IEEE 14TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP)》, pages 427 - 434 *
王杰: "基于多核机群环境的并行程序设计方法研究——MPI+OpenMP混合编程", 《中国优秀硕士学位论文全文数据库 信息科技辑》, pages 137 - 49 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024001861A1 (en) * 2022-06-29 2024-01-04 华为技术有限公司 Model training method, apparatus and system, and related device

Also Published As

Publication number Publication date
CN114579311B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN113568860B (en) Deep learning-based multi-machine cluster topology mapping method and device and program product
CN114253979B (en) Message processing method and device and electronic equipment
CN114697391A (en) Data processing method, device, equipment and storage medium
CN114579311B (en) Method, device, equipment and storage medium for executing distributed computing task
CN116567077A (en) Bare metal instruction sending method, device, equipment and storage medium
CN113691403B (en) Topology node configuration method, related device and computer program product
CN114070889B (en) Configuration method, traffic forwarding device, storage medium, and program product
CN115016934A (en) Method, device and system for federated learning, electronic equipment and storage medium
CN114565105A (en) Data processing method and deep learning model training method and device
CN114915516A (en) Communication method and device
CN113778645A (en) Task scheduling method, device and equipment based on edge calculation and storage medium
CN114650222B (en) Parameter configuration method, device, electronic equipment and storage medium
CN115600687B (en) Model training method, device, equipment and storage medium
CN115600671B (en) Data processing method, device, equipment and storage medium of deep learning framework
CN112948246B (en) AB test control method, device and equipment of data platform and storage medium
CN116306407B (en) Verification method, device, equipment and storage medium of Network On Chip (NOC)
CN113032040B (en) Method, apparatus, device, medium, and article for processing tasks
CN114449031B (en) Information acquisition method, device, equipment and storage medium
CN115168486A (en) Clock synchronization method and device, electronic equipment and readable storage medium
CN115776489A (en) Information acquisition method and device, electronic equipment and computer readable storage medium
CN115988124A (en) Method and device for determining equipment ID
CN117651078A (en) Data transmission method and device, electronic equipment and storage medium
CN114553787A (en) Flow distribution method, device, electronic equipment and storage medium
CN118113420A (en) Application deployment method and device, electronic equipment and storage medium
CN117032746A (en) Data updating method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant