CN113504980A - Node switching method in distributed computation graph, electronic device and readable storage medium - Google Patents

Node switching method in distributed computation graph, electronic device and readable storage medium Download PDF

Info

Publication number
CN113504980A
CN113504980A CN202110838844.3A CN202110838844A CN113504980A CN 113504980 A CN113504980 A CN 113504980A CN 202110838844 A CN202110838844 A CN 202110838844A CN 113504980 A CN113504980 A CN 113504980A
Authority
CN
China
Prior art keywords
node
provider
information
implementation
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110838844.3A
Other languages
Chinese (zh)
Inventor
金卓军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zebred Network Technology Co Ltd
Original Assignee
Zebred Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zebred Network Technology Co Ltd filed Critical Zebred Network Technology Co Ltd
Priority to CN202110838844.3A priority Critical patent/CN113504980A/en
Publication of CN113504980A publication Critical patent/CN113504980A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Abstract

The application provides a node switching method in a distributed computation graph, electronic equipment and a readable storage medium, wherein the method is applied to the electronic equipment, the electronic equipment comprises a computation graph manager, and the method comprises the following steps: the method comprises the steps that a computational graph of a computational task is built by a computational graph manager, the computational graph comprises a plurality of computational nodes used for completing the computational task, and each computational node is managed by a corresponding node manager; the node manager receives information which is issued by a provider in a network and corresponds to the node realization of the managed computing node, and writes the information into a node registry corresponding to the computing node, wherein the information comprises the information of the provider which realizes the node; when the original node is detected to realize the switching, the computational graph manager selects a new node to replace the original node based on the provider information in the node registry corresponding to the original node. The method can be dynamically and automatically completed by a computing framework in the whole process of node switching.

Description

Node switching method in distributed computation graph, electronic device and readable storage medium
Technical Field
The present application relates to the field of driving technologies, and in particular, to a node switching method in a distributed computation graph, an electronic device, and a readable storage medium.
Background
With the continuous development of communication services, more and more devices can join the same network at the same time and communicate with each other. When an application in an electronic device processes a task, the computing task may be represented as a computational graph. The computational graph may include many computational nodes, and the topology may be complex. When a task is calculated, any point has a problem or the calculation capability is poor, which affects the whole task or causes the task to be impossible. Therefore, there is a need to solve such problems.
Disclosure of Invention
In view of the above, the present application provides a node switching method in a distributed computation graph, an electronic device and a readable storage medium, so as to solve the technical problems mentioned in the foregoing background.
Some embodiments of the present application provide a method of node switching in a distributed computation graph. The present application is described below in various aspects, and embodiments and advantageous effects of the following aspects may be mutually referenced.
In a first aspect, the present application provides a node switching method in a distributed computation graph, which is applied in an electronic device, where the electronic device includes a computation graph manager, and the method includes:
the method comprises the steps that a calculation graph of a calculation task is built by a calculation graph manager, the calculation graph comprises a plurality of calculation nodes used for completing the calculation task, and each calculation node is managed by a corresponding node manager;
the node manager receives information which is issued by a provider in a network and corresponds to the node implementation of the managed computing node, and writes the information into a node registry corresponding to the computing node, wherein the information comprises the information of the provider of the node implementation;
when the original node implementation of the calculation in the calculation graph is detected to need switching, the calculation graph manager selects a new node to implement replacing the original node implementation based on the provider information in the node registry corresponding to the original node implementation.
As an embodiment of the first aspect of the application, the information further comprises the node implementation as being selected to be involved in completing the computing task,
when the original node implementation participating in the calculation graph is detected to need switching, the calculation graph manager selects a new node to replace the original node implementation based on the provider information in the corresponding node registry of the original node implementation and the priority.
As an embodiment of the first aspect of the present application, the stronger the computing power of the provider of the node implementation, the higher the priority of the node implementation being selected to participate in completing the computing task.
As an embodiment of the first aspect of the present application, detecting that a node participating in computation in a computation graph needs to be switched includes:
according to the priority in the node registry, when the calculation graph manager judges that the priority realized by the node in the node registry is higher than the priority realized by the node currently participating in the calculation task, the original node is determined to realize the switching; alternatively, the first and second electrodes may be,
and the computational graph manager determines that the original node needs to be switched when receiving that the provider realized by the original node exits the network or the provider realized by the original node fails when operating the original node.
As an embodiment of the first aspect of the present application, when a new provider joins a network, the node manager receives information, which is issued by the provider to the network and belongs to the node implementation managed by the provider, and writes the information in the node registry corresponding to the computing node.
As an embodiment of the first aspect of the present application, the provider includes different devices, virtual machines, or different processes in the same device.
As an embodiment of the first aspect of the present application, the information of the node-implemented provider includes: the node implements the device ID, virtual machine ID, or process ID in which it is located.
In a second aspect, the present application further provides an electronic device, comprising:
the computational graph manager is used for constructing a computational graph of the computational task, and the computational graph comprises a plurality of computational nodes for completing the computational task; a node manager for managing each of the compute nodes;
the node manager is used for receiving information which is issued by a provider in a network and corresponds to the node realization of the managed computing node, and writing the information into a node registry corresponding to the computing node, wherein the information comprises the information of the provider which realizes the node;
when it is detected that an original node implementation participating in computation in the computation graph needs to be switched, the computation graph manager is used for selecting a new node implementation to replace the original node implementation based on the information of the provider in the node registry corresponding to the original node implementation.
As an embodiment of the second aspect of the application, the information further comprises the node implementation as being selected to be involved in completing the computing task,
when it is detected that the original node implementation participating in the computation graph needs to be switched, the computation graph manager is used for implementing provider information in the corresponding node registry based on the original node, and selecting a new node to implement replacement of the original node implementation according to the priority.
As an embodiment of the second aspect of the present application, the more computing power a provider of a node implementation is, the higher priority the node implementation is selected to participate in completing a computing task.
As an embodiment of the second aspect of the present application, the computation graph manager is configured to:
when the realization priority of the node in the node registry is higher than that of the node participating in the calculation task at present, the original node is determined to realize the switching; alternatively, the first and second electrodes may be,
the computational graph manager is used for determining that the original node needs to be switched when receiving that the provider realized by the original node exits the network or the provider realized by the original node fails when the original node is operated.
As an embodiment of the second aspect of the present application, when a new provider joins the network, the node manager is configured to receive information, which is issued by the new provider to the network and belongs to the node implementation managed by the new provider, and write the information into the node registry corresponding to the computing node.
As an embodiment of the second aspect of the application, the provider comprises different devices, virtual machines or different processes in the same device.
As an embodiment of the second aspect of the present application, the information of the node-implemented provider includes: the node implements the device ID, virtual machine ID, or process ID in which it is located.
In a third aspect, the present application further provides an electronic device, including:
a memory for storing instructions for execution by one or more processors of the device, an
A processor configured to perform the method of the first aspect.
In a fourth aspect, the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, causes the processor to execute the method of the first aspect.
Drawings
FIG. 1 is a schematic diagram of an exemplary distributed computing graph according to the present application;
FIG. 2 is a flowchart of a node switching method in a distributed computing graph according to an embodiment of the present application;
FIG. 3 is a framework diagram of creating nodes according to one embodiment of the present application;
FIG. 4 is a schematic structural diagram of a new node join according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram illustrating an example of the present application when a primary node is unavailable;
FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 8 is a block diagram of an apparatus of some embodiments of the present application;
fig. 9 is a block diagram of a system on a chip (SoC) in accordance with some embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below clearly and completely with reference to the drawings in the embodiments of the present application.
Referring to fig. 1, fig. 1 schematically illustrates a structure of a distributed computation graph. As shown in fig. 1, in the schematic structural diagram, a device 110 and a device 120, the device 110 includes a process 1 and a process 2 and a computation graph manager in an application, and the device 120 includes a process 3. Devices 110 and 120 are in the same network and are capable of communicating with each other. When two devices need to collectively complete a computing task, as shown in fig. 1, the computing nodes that complete the computing task include node 01, node O2, node 03, and node 04. Where node 01 is located in process 1 in device 110, nodes 02 and 03 are located in process 2 in device 110, and node 04 is located in process 3 in device 120. The computational graph manager manages the life cycles of the computational nodes and constructs a computational graph based on the node realizations of the computational nodes. When any one of the node implementations in the computational graph cannot be run, or the provider (device 110 or device 120) of that node implementation exits the network, the computational task will not be completed properly. Therefore, in order to realize that the computing task can be normally executed, the computing task can be effectively completed by automatically replacing the node.
The following describes a node switching method in a distributed computing graph according to the present application with reference to a specific embodiment.
Referring to fig. 2, fig. 2 is a flowchart illustrating a node switching method in a distributed computation graph, which may be applied to an electronic device including a computation graph manager. As shown in fig. 2, the method comprises the following steps:
s210, the calculation graph manager constructs a calculation graph of the calculation task. Wherein, the computation graph is constructed by the computation manager according to the business logic for realizing the computation task. For example, the calculation manager of the vehicle forward-looking perception calculation graph sequentially comprises the following calculation nodes according to the business logic for realizing the forward-looking perception, and the camera acquires pictures (lane lines and obstacles) shot in front of the vehicle, performs preprocessing, model reasoning and post-processing on picture data, and renders and outputs the pictures. Thereby completing the computational task of front perception. The compute manager builds these compute nodes into a computation graph.
Each computation node in the computation graph corresponds to a node type, such as model inference or clustering. Each node type may include a plurality of node implementations, the functions of the node implementations are the same, the computing tasks of the corresponding computing nodes can be completed, and the node implementations may be located in processes of different devices or in different processes of the same device. For example, the computing node for inputting the image data may acquire the data through a camera of a mobile phone, or may acquire the data through a camera of a vehicle. The mobile phone acquisition mode and the vehicle acquisition mode are realized by two different nodes. For another example, when an object in an image is determined, the determination may be performed by inference from a model or clustering. The two judgment methods are realized by different nodes and realize the same function. The process of each node implementation has a node manager, the node manager is used for managing the node implementation in the process, and then the node implementation is used for creating node instances to form the computation graph.
The management of node implementation by the node manager is described in detail in S220-S220 below.
S220, the node manager receives the information which is sent by the provider in the network and is realized by the node corresponding to the managed node type, and maintains the information which is realized by the node in the node information registry in the process. The information realized by the node may include provider information realized by the node, or may also include a priority corresponding to the node, where the higher the priority is, the higher the priority realized by the node as a substitute is in the node replacement process.
The provider information realized by the node includes information such as a machine ID, a virtual machine ID, and a process ID of the provider. The priority defines which is used preferentially when there are multiple node implementations of the same type in the network. The priority may be defined when implemented based on device effort or time consumption. For example, for a data collection type, a car machine is compared with a mobile phone, and the mobile phone obtains image data as a node. Because the data collected by the camera of the mobile phone is clearer and the communication capability is stronger, the priority of the node in the mobile phone for realizing (acquiring the image data) can be set to be higher than that of the vehicle. In the process of selecting the nodes in the data acquisition type, the nodes in the mobile phone with the higher priority can be preferentially selected for implementation. Such as the node information registry shown in table 1.
Table 1 node information registry
Figure BDA0003178125010000061
As shown in table 1, when a node implementation in node type 1 needs to be selected, since there are 2 node implementations under this type, since the priority of node implementation 2 is higher than that of node implementation 1, node implementation 2 is prioritized when selecting a node.
In addition, when a node manager is detected to be offline (due to a network disconnection or process exit), the information implemented by the corresponding node in the registry is deleted, indicating that they are no longer available.
When a specific type of node needs to be used, a corresponding provider is applied for creating a node instance. The node manager receiving the application needs to create a corresponding node instance.
Referring to fig. 3, fig. 3 illustrates a framework diagram of creating a node. The framework diagram includes a device 310 and a device 320, where the device 310 includes a process 1, and a process 2, the process 1 is provided with a node manager 1, and the process 2 is provided with a node manager 2. The device 320 includes a process 3, and the process 3 is provided with a node manager 3. The device 320 informs the device 310 of the node implementation it has. When the device 310 acts as an initiator of a computing task, when the device 310 needs to use a node of a certain type, it may apply to the device 320 to create a node instance corresponding to the node implementation. The computer manager of the device 310 may send an instruction to create a node to a process in the device 320 that owns the type, create a corresponding node instance according to the instruction by the node manager in the device 320 that receives the instruction, and issue information of the node instance to the device 310. The computational graph is constructed from node instances by a computational graph manager in the device 310.
And S230, when a new device is added into the network, the node manager judges whether the node implementation released by the new device belongs to the managed node type.
For example, when a new device joins, the new device may issue a message with the node implementation in the node type in the computational graph to all processes in the entire network that contain the node manager. A node manager in the network determines whether it is the same type of node it manages based on the message implemented by the node.
If yes, the node manager executes S240, that is, the node manager receives the node implementation, registers the information of the node implementation in the node information registry of the corresponding computing node, and notifies the computational graph manager.
If not, the node manager executes S250, that is, the information of the node implementation is not written into the node information registry.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a new node joining structure. As shown in fig. 4, includes: the mobile phone comprises a car machine and a mobile phone. The type provided by the original calculation map on the vehicle machine is A, B, C, D, E. Taking the forward-looking perception calculation graph in step 210 as an example, node a is camera input, node B is pre-processing, node C is model inference, node D is post-processing, and node E is rendering. When the mobile phone joins the network where the car machine is located, the mobile phone has B, C, D type node implementation, and the mobile phone will issue the message to all processes including the node manager in the whole network. The node manager in the application process in the car machine registers the information into the node information registry and notifies the computational graph manager. If the node implementation on the mobile phone has a higher priority (because the computation is faster), the computation graph manager will start a node switching process to switch the original node in the computation graph to the node with a higher priority. Namely, the B, C, D type node on the vehicle machine is stopped, and the B, C, D type node on the mobile phone is started.
And S260, when the original node becomes unavailable, selecting the node with the higher priority under the corresponding type of the original node to realize.
In the present application, an original node becomes unavailable and can be subdivided into two cases: one is when the equipment where the node is located is disconnected; and the other is that the node exits by mistake during operation (such as the required resources cannot be allocated, or the operation is overtime, etc.).
Referring to fig. 5, fig. 5 exemplarily shows a structural diagram when an original node is unavailable. The schematic includes machine a and machine B. There are two types a and B in the calculation graph. Wherein type B is implemented by two nodes, respectively located in machine a and in machine B. The node B in machine B is used before the handover. In the first case, i.e. when machine B is switched off or disconnected, the application process may detect this event through the auto-discovery mechanism of the communication layer. At this time, the node manager deletes the relevant node implementation information in machine B in the registry, and then the computational graph manager in the application process attempts to restart the node of this type. The process in machine a becomes the only provider of this type of node implementation since machine B has exited at this time. Thus, the same type of node in machine a will eventually be started. In the second case, when the original node exits due to an error, the application process receives the error state event sent by the original node. The priority of the node implementation in the registry may then be lowered by a specified policy (which may be manually specified, e.g., halved each time) while the computational graph manager attempts to restart the type node. Thus, if two providers provide this type of node implementation at the same time in the system, the missing node implementation is not easily reselected due to the reduced priority. If the error node is operated again, the priority of the error node is reduced again until the priority is reduced to a specified threshold value, and the error node is judged to be completely unavailable.
According to the method of the embodiment of the application, the whole process of node switching is dynamically and automatically completed by the computing framework, and the applied logic does not need to be processed. The calculation task can be completed as efficiently as possible.
With reference to fig. 6, the present application further provides an electronic device comprising:
the computational graph manager is used for constructing a computational graph of the computational task, and the computational graph comprises a plurality of computational nodes for completing the computational task;
a node manager for managing each of the compute nodes;
the node manager is used for receiving information which is issued by a provider in a network and corresponds to the node realization of the managed computing node, and writing the information into a node registry corresponding to the computing node, wherein the information comprises the information of the provider which realizes the node;
when it is detected that an original node implementation participating in computation in the computation graph needs to be switched, the computation graph manager is used for selecting a new node implementation to replace the original node implementation based on the information of the provider in the node registry corresponding to the original node implementation.
In one embodiment of the subject application, the information further includes the priority of the node implementation as being selected to participate in completing the computing task,
when it is detected that the original node implementation participating in the computation graph needs to be switched, the computation graph manager is used for implementing provider information in the corresponding node registry based on the original node, and selecting a new node to implement replacement of the original node implementation according to the priority.
In one embodiment of the present application, the more computing power a provider of a node implementation is, the higher priority the node implementation is selected to participate in completing a computing task.
In one embodiment of the present application, the computation graph manager is to: when the realization priority of the node in the node registry is higher than that of the node participating in the calculation task at present, the original node is determined to realize the switching; or, the computational graph manager is configured to determine that the original node implementation needs to be switched when receiving that the provider implemented by the original node exits the network or the provider implemented by the original node fails when the original node is operated.
In an embodiment of the present application, when a new provider joins the network, the node manager is configured to receive information, which is issued by the new provider to the network and belongs to the node implementation managed by the new provider, and write the information into the node registry corresponding to the computing node.
In one embodiment of the present application, the providers include different devices, virtual machines, or different processes in the same device.
In one embodiment of the present application, the information of the node-implemented provider includes: the device ID, virtual machine ID, or process ID where the node implements.
The working process and the function of each module of the electronic device of the present application have been described in detail in the foregoing embodiments, and refer to the description of the method in fig. 2 in the foregoing embodiments, which are not described herein again.
With reference to fig. 7, the present application further provides an electronic device comprising:
a memory 710 for storing instructions for execution by one or more processors of the device, an
A processor 720 for executing the method of fig. 2 in the above embodiment.
The present application also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program causes the processor to execute the method shown in fig. 2 in the above embodiment.
Referring now to FIG. 8, shown is a block diagram of an apparatus 1200 in accordance with one embodiment of the present application. The device 1200 may include one or more processors 1201 coupled to a controller hub 1203. For at least one embodiment, the controller hub 1203 communicates with the processor 1201 via a multi-drop Bus such as a Front Side Bus (FSB), a point-to-point interface such as a Quick Path Interconnect (QPI), or similar connection 1206. The processor 1201 executes instructions that control general types of data processing operations. In one embodiment, Controller Hub 1203 includes, but is not limited to, a Graphics Memory Controller Hub (GMCH) (not shown) and an Input/Output Hub (IOH) (which may be on separate chips) (not shown), where the GMCH includes a Memory and a Graphics Controller and is coupled to the IOH.
The device 1200 may also include a coprocessor 1202 and a memory 1204 coupled to the controller hub 1203. Alternatively, one or both of the memory and GMCH may be integrated within the processor (as described herein), with the memory 1204 and coprocessor 1202 being directly coupled to the processor 1201 and to the controller hub 1203, with the controller hub 1203 and IOH being in a single chip. The Memory 1204 may be, for example, a Dynamic Random Access Memory (DRAM), a Phase Change Memory (PCM), or a combination of the two. In one embodiment, coprocessor 1202 is a special-Purpose processor, such as, for example, a high-throughput MIC processor (MIC), a network or communication processor, compression engine, graphics processor, General Purpose Graphics Processor (GPGPU), embedded processor, or the like. The optional nature of coprocessor 1202 is represented in FIG. 8 by dashed lines.
Memory 1204, as a computer-readable storage medium, may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. For example, the memory 1204 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as one or more Hard-Disk drives (Hard-Disk drives, hdd (s)), one or more Compact Discs (CD) drives, and/or one or more Digital Versatile Discs (DVD) drives.
In one embodiment, device 1200 may further include a Network Interface Controller (NIC) 1206. Network interface 1206 may include a transceiver to provide a radio interface for device 1200 to communicate with any other suitable device (e.g., front end module, antenna, etc.). In various embodiments, the network interface 1206 may be integrated with other components of the device 1200. The network interface 1206 may implement the functions of the communication unit in the above-described embodiments.
The device 1200 may further include an Input/Output (I/O) device 1205. I/O1205 may include: a user interface designed to enable a user to interact with the device 1200; the design of the peripheral component interface enables peripheral components to also interact with the device 1200; and/or sensors may be configured to determine environmental conditions and/or location information associated with device 1200.
It is noted that fig. 8 is merely exemplary. That is, although fig. 8 shows that the apparatus 1200 includes a plurality of devices, such as the processor 1201, the controller hub 1203, and the memory 1204, in an actual application, an apparatus using the methods of the present application may include only a part of the devices of the apparatus 1200, and for example, may include only the processor 1201 and the NIC 1206. The properties of the alternative device in fig. 8 are shown in dashed lines.
According to some embodiments of the present application, the memory 1204 serving as a computer-readable storage medium stores instructions, and when the instructions are executed on a computer, the system 1200 executes a computing method according to the above embodiments, which may specifically refer to the method shown in fig. 2 in the above embodiments, and is not described herein again.
Referring now to fig. 9, shown is a block diagram of a SoC (System on Chip) 1300 in accordance with an embodiment of the present application. In fig. 9, like parts have the same reference numerals. In addition, the dashed box is an optional feature of more advanced socs. In fig. 9, SoC1300 includes: an interconnect unit 1350 coupled to the application processor 1310; a system agent unit 1380; a bus controller unit 1390; an integrated memory controller unit 1340; a set or one or more coprocessors 1320 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) unit 1330; a Direct Memory Access (DMA) unit 1360. In one embodiment, the coprocessor 1320 includes a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPGPU, a high-throughput MIC processor, embedded processor, or the like.
Included in Static Random Access Memory (SRAM) unit 1330 may be one or more computer-readable media for storing data and/or instructions. A computer-readable storage medium may have stored therein instructions, and in particular, temporary and permanent copies of the instructions. The instructions may include: when executed by at least one unit in the processor, the Soc1300 may execute the calculation method according to the foregoing embodiment, which specifically refers to the method shown in fig. 2 in the foregoing embodiment, and is not described herein again.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, Compact disk Read Only memories (CD-ROMs), magneto-optical disks, Read Only Memories (ROMs), Random Access Memories (RAMs), Erasable Programmable Read Only Memories (EPROMs), Electrically Erasable Programmable Read Only Memories (EEPROMs), magnetic or optical cards, flash Memory, or a tangible machine-readable Memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet to transmit information in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (15)

1. A node switching method in a distributed computation graph is applied to an electronic device, the electronic device comprises a computation graph manager, and the method comprises the following steps:
the calculation graph manager constructs a calculation graph of a calculation task, the calculation graph comprises a plurality of calculation nodes used for completing the calculation task, and each calculation node is managed by a corresponding node manager;
the node manager receives information which is issued by a provider in a network and corresponds to the node realization of the managed computing node, and writes the information into a node registry corresponding to the computing node, wherein the information comprises the information of the provider which is realized by the node;
when it is detected that the original node implementation participating in the computation graph needs to be switched, the computation graph manager selects a new node implementation to replace the original node implementation based on the provider information in the node registry corresponding to the original node implementation.
2. The method of claim 1, wherein the information further includes the node implementation as a priority selected to participate in completing a computing task,
when the original node in the calculation graph is detected to be switched, the calculation graph manager selects a new node to replace the original node based on the provider information in the node registry corresponding to the original node implementation and the priority.
3. The method of claim 2, wherein the more computing power the provider of the node implementation is, the higher priority the node implementation is selected to participate in completing the computing task.
4. The method according to claim 2 or 3, wherein the detecting that the node participating in the computation graph implements a handover, comprises:
the calculation graph manager determines that the original node needs to be switched when judging that the priority realized by the node in the node registry is higher than the priority realized by the node participating in the calculation task at present according to the priority in the node registry; alternatively, the first and second electrodes may be,
and the computational graph manager receives that a provider realized by the original node exits the network or the provider realized by the original node fails when the original node is operated, and then the original node is determined to realize the switching.
5. The method of claim 1, wherein when a new provider joins the network, the node manager receives information published to the network by the provider pertaining to its managed node implementation and writes the information into the node registry corresponding to the computing node.
6. The method of claim 1, wherein the providers comprise different devices, virtual machines, or different processes in the same device.
7. The method of claim 1, wherein the information of the node-implemented provider comprises: the node implements the device ID, virtual machine ID, or process ID.
8. An electronic device, comprising:
the computational graph manager is used for constructing a computational graph of a computational task, and the computational graph comprises a plurality of computational nodes for completing the computational task;
a node manager for managing each of the compute nodes;
the node manager is used for receiving information which is issued by a provider in a network and corresponds to the node realization of the managed computing node, and writing the information into a node registry corresponding to the computing node, wherein the information comprises the information of the provider which is realized by the node;
when it is detected that an original node implementation participating in computation in the computation graph needs to be switched, the computation graph manager is configured to select a new node implementation to replace the original node implementation based on information of a provider in a node registry corresponding to the original node implementation.
9. The electronic device of claim 8, wherein the information further includes a priority of the node implementation as being selected to participate in completing a computing task,
when it is detected that an original node in the computation graph needs to be switched, the computation graph manager is configured to select a new node to implement the replacement of the original node based on the information of the provider in the node registry corresponding to the original node implementation and the priority.
10. The electronic device of claim 9, wherein the more computing power the provider of the node implementation is, the higher priority the node implementation is selected to participate in completing the computing task.
11. The electronic device of claim 9 or 10, wherein the computational graph manager is configured to:
when the realization priority of the node in the node registry is higher than that of the node participating in the calculation task at present, determining that the original node needs to be switched; alternatively, the first and second electrodes may be,
the computational graph manager is configured to determine that switching is required for implementation of an original node when receiving that a provider implemented by the original node exits a network or a fault occurs when the provider implemented by the original node operates the original node.
12. The electronic device of claim 8, wherein when a new provider joins the network, the node manager is configured to receive information issued by the new provider to the network and pertaining to the node implementation managed by the new provider, and write the information into the node registry corresponding to the computing node.
13. The electronic device of claim 8, wherein the providers comprise different devices, virtual machines, or different processes in the same device.
14. The electronic device of claim 8, wherein the information of the node-implemented provider comprises: the node implements the device ID, virtual machine ID, or process ID.
15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the method of any one of claims 1-7.
CN202110838844.3A 2021-07-23 2021-07-23 Node switching method in distributed computation graph, electronic device and readable storage medium Pending CN113504980A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110838844.3A CN113504980A (en) 2021-07-23 2021-07-23 Node switching method in distributed computation graph, electronic device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110838844.3A CN113504980A (en) 2021-07-23 2021-07-23 Node switching method in distributed computation graph, electronic device and readable storage medium

Publications (1)

Publication Number Publication Date
CN113504980A true CN113504980A (en) 2021-10-15

Family

ID=78014458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110838844.3A Pending CN113504980A (en) 2021-07-23 2021-07-23 Node switching method in distributed computation graph, electronic device and readable storage medium

Country Status (1)

Country Link
CN (1) CN113504980A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237179A (en) * 2021-12-16 2022-03-25 常熟华庆汽车部件有限公司 Implementation method of flexible coating automatic control system based on industrial Internet of things
CN115473802A (en) * 2022-09-13 2022-12-13 重庆紫光华山智安科技有限公司 Node management method, system, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105283831A (en) * 2013-05-29 2016-01-27 微软技术许可有限责任公司 Distributed storage defense in a cluster
CN109542627A (en) * 2018-11-30 2019-03-29 北京金山云网络技术有限公司 Node switching method, device, supervisor, node device and distributed system
US20190190802A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation System and method for managing a moving peer-to-peer network
CN110990329A (en) * 2019-12-09 2020-04-10 杭州趣链科技有限公司 Method, equipment and medium for high availability of federated computing
CN111757462A (en) * 2019-09-20 2020-10-09 广州极飞科技有限公司 Automatic node discovery method and related device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105283831A (en) * 2013-05-29 2016-01-27 微软技术许可有限责任公司 Distributed storage defense in a cluster
US20190190802A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation System and method for managing a moving peer-to-peer network
CN109542627A (en) * 2018-11-30 2019-03-29 北京金山云网络技术有限公司 Node switching method, device, supervisor, node device and distributed system
CN111757462A (en) * 2019-09-20 2020-10-09 广州极飞科技有限公司 Automatic node discovery method and related device
CN110990329A (en) * 2019-12-09 2020-04-10 杭州趣链科技有限公司 Method, equipment and medium for high availability of federated computing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张智龙: "分布式存储构建规划", 《电脑知识与技术》, vol. 13, no. 21, pages 230 - 232 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237179A (en) * 2021-12-16 2022-03-25 常熟华庆汽车部件有限公司 Implementation method of flexible coating automatic control system based on industrial Internet of things
CN114237179B (en) * 2021-12-16 2023-09-08 常熟华庆汽车部件有限公司 Implementation method of flexible coating automatic control system based on industrial Internet of things
CN115473802A (en) * 2022-09-13 2022-12-13 重庆紫光华山智安科技有限公司 Node management method, system, device and storage medium
CN115473802B (en) * 2022-09-13 2024-02-23 重庆紫光华山智安科技有限公司 Node management method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113504980A (en) Node switching method in distributed computation graph, electronic device and readable storage medium
EP3522449B1 (en) Service state transition method and device
WO2019218673A1 (en) Information processing method, terminal, and readable storage medium
CN110704161B (en) Virtual machine creation method and device and computer equipment
CN109683818B (en) Data storage method, device and storage medium
CN108616598B (en) Data synchronization method and device and distributed storage system
CN114780049A (en) Screen projection display method and device, electronic equipment and storage medium
CN111061550A (en) Task processing method, device, equipment and storage medium
CN105307103A (en) Communication apparatus and method for controlling communication apparatus
CN110753040A (en) Request processing method and device
CN112286622A (en) Virtual machine migration processing and strategy generating method, device, equipment and storage medium
US20110022995A1 (en) Circuit design information generating equipment, function execution system, and memory medium storing program
CN112311985B (en) Multi-shooting processing method and device and storage medium
CN109525470B (en) Network access method and network access system
CN112363815A (en) Redis cluster processing method and device, electronic equipment and computer readable storage medium
US8893132B2 (en) Information processing apparatus
CN104283950A (en) Service request handling method, device and system
CN109379449A (en) Data connecting method, device, terminal and storage medium
CN116048490B (en) Business event processing method, device, computer equipment and storage medium
CN109165099B (en) Electronic equipment, memory copying method and device
JP5791524B2 (en) OS operating device and OS operating program
CN117311938A (en) Application running method, device, computer equipment and storage medium
CN113453382A (en) Network connection method, network connection device, electronic device, and storage medium
CN112669123A (en) Order management method and device for vehicle owner platform
CN115674170A (en) Robot control method, robot control device, robot, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination