CA3152842A1 - Real-time communication method and apparatus for distributed system, and distributed system - Google Patents

Real-time communication method and apparatus for distributed system, and distributed system Download PDF

Info

Publication number
CA3152842A1
CA3152842A1 CA3152842A CA3152842A CA3152842A1 CA 3152842 A1 CA3152842 A1 CA 3152842A1 CA 3152842 A CA3152842 A CA 3152842A CA 3152842 A CA3152842 A CA 3152842A CA 3152842 A1 CA3152842 A1 CA 3152842A1
Authority
CA
Canada
Prior art keywords
task unit
calculation result
communication task
master node
communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3152842A
Other languages
French (fr)
Inventor
Bangfa DONG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3152842A1 publication Critical patent/CA3152842A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Disclosed are a real-time communication method and apparatus for a distributed system and a distributed system, pertaining to the technical field of distribution. The distributed system comprises a master node and an operation node. An algorithm task unit and a communication task unit are deployed on the operation node. The method comprises: after being started, a communication task unit performing initialization operation, and entering a monitoring state after the initialization operation is completed, wherein the initialization operation comprises initializing a shared memory and a network connection; an algorithm task unit being started after the communication task unit enters the monitoring state, and then processing a calculation task initiated by a master node; the algorithm task unit writing a calculation result of the calculation task into the shared memory; when the communication task unit detects that the calculation result has been stored in the shared memory, the communication task unit reading the calculation result; and the communication task unit returning the calculation result to the master node by means of the network connection. Embodiments of the present invention achieve real-time communication in the entire distributed system to ensure real-time output of an algorithm.

Description

REAL-TIME COMMUNICATION METHOD AND APPARATUS FOR DISTRIBUTED
SYSTEM, AND DISTRIBUTED SYSTEM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of distribution technology, and more particularly to a real-time communicating method within a distributed system, a real-time communicating device within a distributed system, and a distributed system.
Description of Related Art
[0002] With the vigorous development of the artificial intelligence technology, more and more artificial intelligence techniques have come into applications. In the applications of artificial intelligence algorithms, hardware computational resources are also extremely relied besides the performance of the algorithms themselves. Currently a single equipment possesses multiple cores, and the performance is already very strong, but when a project requires computational resources that exceed the limit offerable by a single equipment, a distributed solving scheme is usually employed, namely to allocate the resources required by the computational capabilities to different equipments.
The distributed solving scheme needs to divide the system into various nodes, each bearing certain calculation tasks, and these nodes intercommunicate through network.
Usually when algorithms are operating, such as processing images and coding/decoding videos, these algorithms consume CPU or GPU resources of nodes, and such programs that mainly consume CPU or GPU resources are generally referred to as calculation-intensive tasks. The calculation-intensive tasks can be processed with multiple threads, but the more threads there are, the more time will be spent in switching the tasks, and the lower will be the efficiency in executing the CPU, so such programs cannot use too much threads. A distributed system possesses plural nodes, which are generally classified as master nodes and worker nodes, and network communication should be performed between the master nodes and the worker nodes. When it is required to perform real-time Date Recue/Date Received 2022-02-28 communication between these two types of nodes, if data concurrency is huge, great quantities of IOs will be produced, and such tasks that require great quantities of IOs are generally referred to as TO-intensive tasks. The TO-intensive tasks generally do not consume much CPUs, but would frequently interrupt the CPUs.
[0003] As the inventor found during the process of realizing the present invention, when an algorithm is deployed on a distributed system architecture to respond to outputs in real time, with respect to tasks that are not only calculation-intensive but also network I0-intensive on multi-core nodes, unduly much IOs in the network so frequently interrupts calculation tasks of the algorithm that efficiency in algorithm calculation is severely affected, whereby it is made impossible for the entire distributed system to respond to outputs in real time.
SUMMARY OF THE INVENTION
[0004] In order to overcome the aforementioned technical problems, embodiments of the present invention provide a real-time communicating method within a distributed system, a corresponding device and a distributed system, whereby, under the distributed environment, with respect to tasks that are not only calculation-intensive but also network TO-intensive on multi-core nodes, calculation tasks are separated from communication tasks, so that is guaranteed that the entire distributed system can respond to outputs in real time.
[0005] Specific technical solutions provided by the embodiments of the present invention are as follows:
[0006] According to the first aspect, there is provided a real-time communicating method within a distributed system, the distributed system includes a master node and a worker node, on the worker node are deployed an algorithm task unit and a communication task unit, Date Recue/Date Received 2022-02-28 and the method comprises:
[0007] the communication task unit performing an initialization operation after the communication task unit has been started, and entering a monitor state after having completed the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection;
[0008] the algorithm task unit being started after the communication task unit has entered the monitor state, and processing a calculation task initiated by the master node after the algorithm task unit has been started;
[0009] the algorithm task unit writing a calculation result of the calculation task in the shared memory;
[0010] the communication task unit reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory; and
[0011] the communication task unit returning the calculation result to the master node through the network connection.
[0012] Further, the step of the algorithm task unit being started after the communication task unit has entered the monitor state includes:
[0013] notifying the algorithm task unit through a condition lock when the communication task unit enters the monitor state, so as to make the algorithm task unit started.
[0014] Further, the step of the algorithm task unit writing a calculation result of the calculation task in the shared memory includes:
[0015] the algorithm task unit serializing the calculation result of the calculation task to obtain serialized data, and writing the serialized data in the shared memory; and
[0016] the step of the communication task unit reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory includes:
[0017] the communication task unit obtaining the serialized data from the shared memory when the communication task unit monitors that the serialized data of the calculation result is Date Recue/Date Received 2022-02-28 stored in the shared memory, and deserializing the serialized data to obtain the calculation result.
[0018] Further, the initialization operation further includes creating a message queue, the method further comprises, prior to the step of the communication task unit returning the calculation result to the master node through the network connection:
[0019] the communication task unit adding the calculation result to the message queue; and
[0020] the step of the communication task unit returning the calculation result to the master node through the network connection includes:
[0021] the communication task unit extracting the calculation result from the message queue if the communication task unit receives a calculation result request initiated from the master node, and returning the calculation result to the master node through the network connection.
[0022] Further, the step of the communication task unit extracting the calculation result from the message queue if the communication task unit receives a calculation result request initiated from the master node, and returning the calculation result to the master node through the network connection includes:
[0023] the communication task unit enquiring, if it receives a calculation result request initiated from the master node, in the message queue whether there is any calculation result requested by the calculation result request;
[0024] if yes, returning the calculation result to the master node;
[0025] if not, obstructing the calculation result request, and waking up the calculation result request when there is a new calculation result in the message queue; and
[0026] judging whether the new calculation result is the calculation result requested by the calculation result request, if yes, returning the new calculation result to the master node, if not, continuing to obstruct the calculation result request.
[0027] According to the second aspect, there is provided a real-time communicating device Date Recue/Date Received 2022-02-28 within a distributed system, the distributed system includes a master node and a worker node, the device is located on the worker node, and the device comprises an algorithm task unit and a communication task unit, wherein:
[0028] the communication task unit is employed for performing an initialization operation after the communication task unit has been started, and entering a monitor state after having completed the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection;
[0029] the algorithm task unit is employed for being started after the communication task unit has entered the monitor state, and processing a calculation task initiated by the master node after the algorithm task unit has been started;
[0030] the algorithm task unit is further employed for writing a calculation result of the calculation task in the shared memory; and
[0031] the communication task unit is further employed for reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory, and returning the calculation result to the master node through the network connection.
[0032] Further, the communication task unit is specifically employed for:
[0033] notifying the algorithm task unit through a condition lock when the communication task unit enters the monitor state, so as to make the algorithm task unit started.
[0034] Further, the algorithm task unit is specifically employed for:
[0035] serializing the calculation result of the calculation task to obtain serialized data, and writing the serialized data in the shared memory; and
[0036] the communication task unit is specifically employed for:
[0037] obtaining the serialized data from the shared memory when the communication task unit monitors that the serialized data of the calculation result is stored in the shared memory, and deserializing the serialized data to obtain the calculation result.
Date Recue/Date Received 2022-02-28
[0038] Further, the initialization operation further includes creating a message queue, and the communication task unit is further employed for:
[0039] adding the calculation result to the message queue; and
[0040] extracting the calculation result from the message queue if a calculation result request initiated from the master node is received, and returning the calculation result to the master node through the network connection.
[0041] Further, the communication task unit is specifically employed for:
[0042] enquiring, if it receives a calculation result request initiated from the master node, in the message queue whether there is any calculation result requested by the calculation result request;
[0043] if yes, returning the calculation result to the master node;
[0044] if not, obstructing the calculation result request, and waking up the calculation result request when there is a new calculation result in the message queue; and
[0045] judging whether the new calculation result is the calculation result requested by the calculation result request, if yes, returning the new calculation result to the master node, if not, continuing to obstruct the calculation result request.
[0046] According to the third aspect, there is provided a distributed system, the system comprises a master node and at least one worker node configured to include a real-time communicating device within a distributed system according to any item of the second aspect.
[0047] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects:
[0048] Communication tasks and calculation tasks are separated from each other by deploying an algorithm task unit and a communication task unit on multi-core nodes in a distributed environment, whereby it is not only guaranteed that calculation threads in calculation-Date Recue/Date Received 2022-02-28 intensive tasks are not frequently interrupted by too much network I0s, but also guaranteed that network IOs transmit data in real time, so that it is made possible for the distributed system to communicate in real time to ensure real-time output of algorithms.
[0049] The use of a shared memory between the communication task and the calculation task for communication makes it possible to greatly enhance the efficiency of inter-process communication.
BRIEF DESCRIPTION OF THE DRAWINGS
[0050] To more clearly explain the technical solutions in the embodiments of the present invention, drawings required for use in the following explanation of the embodiments are briefly described below. Apparently, the drawings described below are merely directed to some embodiments of the present invention, while it is further possible for persons ordinarily skilled in the art to base on these drawings to acquire other drawings, and no creative effort will be spent in the process.
[0051] Fig. 1 is a flowchart illustrating a real-time communicating method within a distributed system provided by an embodiment of the present invention;
[0052] Fig. 2 is a block diagram illustrating the structure of a real-time communicating device within a distributed system provided by an embodiment of the present invention; and
[0053] Fig. 3 is a block diagram illustrating the structure of a distributed system provided by an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0054] To make more lucid and clear the objectives, technical solutions and advantages of the Date Recue/Date Received 2022-02-28 present invention, technical solutions in the embodiments of the present invention will be described more clearly and completely below with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the embodiments described below are merely partial, rather than the entire, embodiments of the present invention. All other embodiments achievable by persons ordinarily skilled in the art on the basis of the embodiments in the present invention without creative effort shall all fall within the protection scope of the present invention.
[0055] The distributed solving scheme has been widely applied in the state of the art, for instance, it is required for the server to process multichannel cameras in an unmanned supermarket in the retail sector, and a single equipment cannot satisfy the requirements ¨
if all tasks are entirely concentrated on one equipment, the equipment would become overburdened, so it is absolutely necessary to employ a distributed system consisting of plural equipments, and the plural equipments are classified into master nodes and worker nodes, of which the worker nodes bear great quantities of algorithm tasks that are calculation-intensive tasks, but outputs of these algorithm tasks should be output in real time to the master nodes, so the algorithm tasks contain therein communication tasks, while the network IOs produced by these communication tasks would affect the efficiency of algorithm operations, and make it impossible for the entire distributed system to respond to outputs in real time.
[0056] In view of this, embodiments of the present invention provide a real-time communicating method within a distributed system, the distributed system includes a master node and worker nodes, on each of which worker node are deployed an algorithm task unit and a communication task unit, of which the communication task unit is employed for real-time transmission of data of network IOs (namely for executing communication tasks), and the algorithm task unit is employed for executing a calculation task initiated by the master node and writing the calculation result of the calculation task in a shared memory, whereupon the communication task unit reads the calculation result of the calculation task Date Recue/Date Received 2022-02-28 from the shared memory, and returns the result to the master node. In the embodiments of the present invention, communication tasks and calculation tasks are separated from each other by deploying an algorithm task unit and a communication task unit on multi-core nodes in a distributed environment, whereby it is not only guaranteed that calculation threads in calculation-intensive tasks are not frequently interrupted by too much network I0s, but also guaranteed that network IOs transmit data in real time, so that it is made possible for the distributed system to communicate in real time to ensure real-time output of algorithms. In addition, the use of a shared memory between the communication task and the calculation task for communication makes it possible to greatly enhance the efficiency of inter-process communication.
[0057] In one embodiment, as shown in Fig. 1, there is provided a real-time communicating method within a distributed system, the distributed system includes a master node and a worker node, on the worker node are deployed an algorithm task unit and a communication task unit, and the method can comprise the following steps.
[0058] S 11 - the communication task unit performing an initialization operation after the communication task unit has been started, and entering a monitor state after having completed the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection.
[0059] Specifically, the communication task unit is started on the worker node, the communication task unit starts to perform the initialization operation that includes initializing a network connection and a shared memory, after initialization has been completed, the communication task unit enters a monitor state to monitor the shared memory and the network connection, respectively.
[0060] The shared memory is mainly employed for the communication between the communication task unit and the algorithm task unit on the same and single worker node, Date Recue/Date Received 2022-02-28 and the network connection is mainly employed for the communication between the worker node and the master node.
[0061] As should be noted, since the shared memory would not disappear with the disappearance of the process, before the algorithm task unit of the worker node is started, it is better to clean the region of a previously created shared memory, during actual application, it is possible to perform a cleaning operation after the communication task unit has been started and before the shared memory is initialized.
[0062] S12 - the algorithm task unit being started after the communication task unit has entered the monitor state, and processing a calculation task initiated by the master node after the algorithm task unit has been started.
[0063] In this embodiment, there is a strict demand on the orders to start the communication task unit and the algorithm task unit, as the communication task unit must be started before the algorithm task unit is started.
[0064] In this embodiment, the master node initiates a calculation task to each worker node, and the algorithm task unit on the worker node performs a corresponding algorithm processing on the calculation task initiated by the master node after the algorithm task unit has been started.
[0065] In actual application, the master node can base on the operating state information of various worker nodes to determine a worker node that processes the calculation task, and sends the calculation task to this worker node. The operating state information includes one or more of CPU utilization rate, memory utilization rate, magnetic disk reading/writing and network uplink and downlink.
[0066] S13 - the algorithm task unit writing a calculation result of the calculation task in the Date Recue/Date Received 2022-02-28 shared memory.
[0067] Specifically, the algorithm task unit can write the calculation result of the calculation task according to a preset data structure in the shared memory.
[0068] S14 - the communication task unit reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory.
[0069] Specifically, the communication task unit can read the data in the shared memory periodically or in real time, and starts to obtain the calculation result when it reads the calculation result of the calculation task stored in the shared memory.
[0070] S15 - the communication task unit returning the calculation result to the master node through the network connection.
[0071] Specifically, the communication task unit can proactively return the calculation result of the calculation task to the master node through the network connection, and can also base on a calculation result request of the master node to return the calculation result of the calculation task to the master node through the network connection after the network connection between the master node and the communication task unit on the worker node has been created.
[0072] As should be noted, the worker node in the distributed system in this embodiment must be multi-core; with respect to a single-core CPU machine, the total computational resources thereof are limited, it is therefore not necessary to separate the algorithm task unit from the communication task unit, so such case is not discussed in this embodiment.
[0073] The embodiments of the present invention provide a real-time communicating method within a distributed system, the distributed system includes a master node and worker Date Recue/Date Received 2022-02-28 nodes, on each of which worker node are deployed an algorithm task unit and a communication task unit, of which the communication task unit is employed for real-time transmission of data of network I0s, and the algorithm task unit is employed for executing a calculation task initiated by the master node and writing the calculation result of the calculation task in a shared memory, whereupon the communication task unit reads the calculation result of the calculation task from the shared memory, and returns the result to the master node. In the embodiments of the present invention, communication tasks and calculation tasks are separated from each other by deploying an algorithm task unit and a communication task unit on multi-core nodes in a distributed environment, whereby it is not only guaranteed that calculation threads in calculation-intensive tasks are not frequently interrupted by too much network I0s, but also guaranteed that network IOs transmit data in real time, so that it is made possible for the distributed system to communicate in real time to ensure real-time output of algorithms. In addition, the use of a shared memory between the communication task and the calculation task for communication makes it possible to greatly enhance the efficiency of inter-process communication.
[0074] In one embodiment, the aforementioned step S12 of the algorithm task unit being started after the communication task unit has entered the monitor state can specifically include:
[0075] notifying the algorithm task unit through a condition lock when the communication task unit enters the monitor state, so as to make the algorithm task unit started.
[0076] Specifically, after the communication task unit has completed initialization of the shared memory, it monitors the region of the shared memory, and the communication task unit and the algorithm task unit notify each other via a condition lock. The condition lock is also inter-process, so after the communication task unit has initialized the condition lock, the shared memory will be locked, thereafter the communication task unit monitors the condition lock, and releases the condition lock during the monitoring. After the algorithm task unit has been started, the condition lock should also be locked, if there is no any other Date Recue/Date Received 2022-02-28 process to release the condition lock, the algorithm task unit will not be started, and it is therefore required to firstly start the communication task unit before the algorithm task unit is started.
[0077] In one embodiment, in order to make more general the messages transmitted between the various tasks, the aforementioned step S13 of the algorithm task unit writing a calculation result of the calculation task in the shared memory can specifically include:
[0078] the algorithm task unit serializing the calculation result of the calculation task to obtain serialized data, and writing the serialized data in the shared memory.
[0079] Specifically, serializing the calculation result by the algorithm task unit is to transform the calculation result to an object with a preset data structure, this object is the serialized data, and the preset data structure is for example JSON data structure. After the serialized data has been obtained, the serialized data can be written in the shared memory in the form of a key-value pair, in which key stands for key name, and value stands for key value.
[0080] Correspondingly, step S14 of the communication task unit reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory can specifically include:
[0081] the communication task unit obtaining the serialized data from the shared memory when the communication task unit monitors that the serialized data of the calculation result is stored in the shared memory, and deserializing the serialized data to obtain the calculation result.
[0082] In this embodiment, the transmitted object is made more general by serializing the object for inter-process communication, and message transmission efficiency is enhanced.
[0083] In one embodiment, on the basis of the aforementioned method embodiment, the Date Recue/Date Received 2022-02-28 initialization operation further includes creating a message queue, and the method can further comprise, prior to step S15:
[0084] the communication task unit adding the calculation result to the message queue.
[0085] Different types of message queues can be created in the initialization operation of the communication task unit, and the different types of message queues are employed to store calculation results of different types of calculation tasks, for instance, a first message queue is employed to store a calculation result of an image processing task, a second message queue is employed to store a calculation result of a video processing task, and so on.
[0086] Specifically, the communication task unit adds the calculation result of the calculation task to a message queue to which the type of the calculation task corresponds.
[0087] Correspondingly, step S15 can specifically include:
[0088] the communication task unit extracting the calculation result from the message queue if the communication task unit receives a calculation result request initiated from the master node, and returning the calculation result to the master node through the network connection.
[0089] Specifically speaking, the specific process for realizing step S15 can include the following steps:
[0090] S151 - the communication task unit enquiring, if it receives a calculation result request initiated from the master node, in the message queue whether there is any calculation result requested by the calculation result request, if yes, executing step S152, if not, executing step S153.
[0091] S152 ¨ returning the calculation result to the master node.

Date Recue/Date Received 2022-02-28
[0092] Specifically, the communication task unit returns the calculation result requested by the calculation result request to the master node through the network connection.
[0093] S153 - obstructing the calculation result request, and waking up the calculation result request when there is a new calculation result in the message queue, and executing step S154 after step S153.
[0094] S154 ¨judging whether the new calculation result is the calculation result requested by the calculation result request, if yes, returning the new calculation result to the master node, if not, returning to execute step S153.
[0095] In this embodiment, calculation results of all calculation tasks initiated by the master node are temporarily stored on the worker node, and the worker node sends the calculation result to the master node only when a certain calculation result is required by the master node, thusly, memory load on the master node can be reduced by sharing the memory required for storing the calculation results of the calculation tasks by means of the worker node.
[0096] In one embodiment, as shown in Fig. 2, there is provided a real-time communicating device within a distributed system, the distributed system includes a master node and a worker node, the device is located on the worker node, and the device comprises an algorithm task unit and a communication task unit, wherein:
[0097] the communication task unit 21 is employed for performing an initialization operation after the communication task unit 21 has been started, and entering a monitor state after having completed the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection;
[0098] the algorithm task unit 22 is employed for being started after the communication task unit 21 has entered the monitor state, and processing a calculation task initiated by the master Date Recue/Date Received 2022-02-28 node after the algorithm task unit 22 has been started;
[0099] the algorithm task unit 22 is further employed for writing a calculation result of the calculation task in the shared memory; and
[0100] the communication task unit 21 is further employed for reading the calculation result when the communication task unit 21 monitors that the calculation result is stored in the shared memory, and returning the calculation result to the master node through the network connection.
[0101] Further, the communication task unit 21 is specifically employed for:
[0102] notifying the algorithm task unit 22 through a condition lock when the communication task unit 21 enters the monitor state, so as to make the algorithm task unit started.
[0103] Further, the algorithm task unit 22 is specifically employed for:
[0104] serializing the calculation result of the calculation task to obtain serialized data, and writing the serialized data in the shared memory; and
[0105] the communication task unit 21 is specifically employed for:
[0106] obtaining the serialized data from the shared memory when the communication task unit 21 monitors that the serialized data of the calculation result is stored in the shared memory, and deserializing the serialized data to obtain the calculation result.
[0107] Further, the initialization operation further includes creating a message queue, and the communication task unit 21 is further employed for:
[0108] adding the calculation result to the message queue; and
[0109] extracting the calculation result from the message queue if a calculation result request initiated from the master node is received, and returning the calculation result to the master node through the network connection.
[0110] Further, the communication task unit 21 is specifically employed for:
[0111] enquiring, if it receives a calculation result request initiated from the master node, in the Date Recue/Date Received 2022-02-28 message queue whether there is any calculation result requested by the calculation result request;
[0112] if yes, returning the calculation result to the master node;
[0113] if not, obstructing the calculation result request, and waking up the calculation result request when there is a new calculation result in the message queue; and
[0114] judging whether the new calculation result is the calculation result requested by the calculation result request, if yes, returning the new calculation result to the master node, if not, continuing to obstruct the calculation result request.
[0115] The real-time communicating device within a distributed system provided by the embodiment of the present invention pertains to the same inventive conception as the real-time communicating method within a distributed system provided by an embodiment of the present invention, can execute the real-time communicating method within a distributed system provided by an embodiment of the present invention, possesses corresponding functional modules to execute the real-time communicating method within a distributed system, and achieves advantageous effects. Technical details not comprehensively described in this embodiment can be inferred from the real-time communicating method within a distributed system provided by an embodiment of the present invention, and are not redundantly described in this context.
[0116] In one embodiment, as shown in Fig. 3, there is provided a distributed system, the system comprises a master node 31 and a worker node 32 configured to include a real-time communicating device within a distributed system according to the aforementioned embodiment.
[0117] In addition, an embodiment of the present invention further provides a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the steps of the real-time communicating method within a distributed system according to the aforementioned embodiment is Date Recue/Date Received 2022-02-28 realized when the processor executes the computer program.
[0118] In addition, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program thereon, and the steps of the real-time communicating method within a distributed system according to the aforementioned embodiment is realized when the computer program is executed by a processor.
[0119] As should be clear to persons skilled in the art, the embodiment of the present invention can be embodied as a method, a system or a computer program product.
Accordingly, in the embodiments of the present invention can be employed the form of complete hardware embodiment, complete software embodiment, or embodiment combining software with hardware. Moreover, in the embodiments of the present invention can be employed the form of one or more computer program product(s) implemented on a computer available storage medium (including, but not limited to, a magnetic disk memory, a CD-ROM, an optical memory, etc.) containing computer available program codes.
[0120] The embodiments of the present invention are described with reference to flowcharts and/or block diagrams of the embodied method, device (system), and computer program product in the embodiments of the present invention. As should be understood, it is possible for computer program instructions to realize each flow and/or block in the flowcharts and/or block diagrams, and the combination of flows and/or blocks in the flowcharts and/or block diagrams. These computer program instructions can be supplied to a general computer, a dedicated computer, an embedded processor or the processor of any other programmable data processing device to generate a machine enabling the instructions executed by the computer or the processor of any other programmable data processing device to generate a device for realizing the functions specified in one or more flow(s) of the flowcharts and/or one or more block(s) of the block diagrams.

Date Recue/Date Received 2022-02-28
[0121] These computer program instructions can also be stored in a computer-readable memory capable of guiding a computer or any other programmable data processing device to operate in specific modes enabling the instructions stored in the computer-readable memory to generate a product containing instructing means that realizes the functions specified in one or more flow(s) of the flowcharts and/or one or more block(s) of the block diagrams.
[0122] These computer program instructions can also be loaded to a computer or any other programmable data processing device, enabling to execute a series of operational steps on the computer or the any other programmable device to generate computer-realized processing, so that the instructions executed on the computer or the any other programmable device supply steps for realizing the functions specified in one or more flow(s) of the flowcharts and/or one or more block(s) of the block diagrams.
[0123] Although preferred embodiments in the embodiments of the present invention have been described, it is still possible for persons skilled in the art to make additional modifications and amendments to these embodiments upon learning the basic inventive concept.

Accordingly, the attached Claims are meant to subsume the preferred embodiments and all modifications and amendments that fall within the scope of the embodiments of the present invention.
[0124] Apparently, it is possible for persons skilled in the art to make various modifications and variations to the present invention without departing from the spirit and scope of the present invention. Thusly, should such modifications and variations to the present invention fall within the range of the Claims and equivalent technology of the present invention, the present invention is also meant to cover such modifications and variations.

Date Recue/Date Received 2022-02-28

Claims (10)

What is claimed is:
1. A real-time communicating method within a distributed system, characterized in that the distributed system includes a master node and a worker node, that on the worker node are deployed an algorithm task unit and a communication task unit, and that the method comprises:
the communication task unit performing an initialization operation after the communication task unit has been started, and entering a monitor state after having completed the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection;
the algorithm task unit being started after the communication task unit has entered the monitor state, and processing a calculation task initiated by the master node after the algorithm task unit has been started;
the algorithm task unit writing a calculation result of the calculation task in the shared memory;
the communication task unit reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory; and the communication task unit returning the calculation result to the master node through the network connection.
2. The method according to Claim 1, characterized in that the step of the algorithm task unit being started after the communication task unit has entered the monitor state includes:

notifying the algorithm task unit through a condition lock when the communication task unit enters the monitor state, so as to make the algorithm task unit started.
3. The method according to Claim 1, characterized in that the step of the algorithm task unit writing a calculation result of the calculation task in the shared memory includes:
the algorithm task unit serializing the calculation result of the calculation task to obtain serialized data, and writing the serialized data in the shared memory; and that the step of the communication task unit reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory includes:
the communication task unit obtaining the serialized data from the shared memory when the communication task unit monitors that the serialized data of the calculation result is stored in the shared memory, and deserializing the serialized data to obtain the calculation result.
4. The method according to anyone of Claims 1 to 3, characterized in that the initialization operation further includes creating a message queue, that the method further comprises, prior to the step of the communication task unit returning the calculation result to the master node through the network connection:
the communication task unit adding the calculation result to the message queue; and that the step of the communication task unit returning the calculation result to the master node through the network connection includes:
the communication task unit extracting the calculation result from the message queue if the communication task unit receives a calculation result request initiated from the master node, and returning the calculation result to the master node through the network connection.
5. The method according to Claim 4, characterized in that the step of the communication task unit extracting the calculation result from the message queue if the communication task unit receives a calculation result request initiated from the master node, and returning the calculation result to the master node through the network connection includes:
the communication task unit enquiring, if it receives a calculation result request initiated from the master node, in the message queue whether there is any calculation result requested by the calculation result request;
if yes, returning the calculation result to the master node;
if not, obstructing the calculation result request, and waking up the calculation result request when there is a new calculation result in the message queue; and judging whether the new calculation result is the calculation result requested by the calculation result request, if yes, returning the new calculation result to the master node, if not, continuing to obstruct the calculation result request.
6. A real-time communicating device within a distributed system, characterized in that the distributed system includes a master node and a worker node, that the device is located on the worker node, and that the device comprises an algorithm task unit and a communication task unit, wherein:
the communication task unit is employed for performing an initialization operation after the communication task unit has been started, and entering a monitor state after having completed the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection;
the algorithm task unit is employed for being started after the communication task unit has entered the monitor state, and processing a calculation task initiated by the master node after the algorithm task unit has been started;
the algorithm task unit is further employed for writing a calculation result of the calculation task in the shared memory; and the communication task unit is further employed for reading the calculation result when the communication task unit monitors that the calculation result is stored in the shared memory, and returning the calculation result to the master node through the network connection.
7. The device according to Claim 6, characterized in that the communication task unit is specifically employed for:
notifying the algorithm task unit through a condition lock when the communication task unit enters the monitor state, so as to make the algorithm task unit started.
8. The device according to Claim 6, characterized in that the algorithm task unit is specifically employed for:
serializing the calculation result of the calculation task to obtain serialized data, and writing the serialized data in the shared memory; and that the communication task unit is specifically employed for:
obtaining the serialized data from the shared memory when the communication task unit monitors that the serialized data of the calculation result is stored in the shared memory, and deserializing the serialized data to obtain the calculation result.
9. The device according to anyone of Claims 6 to 8, characterized in that the initialization operation further includes creating a message queue, and that the communication task unit is further employed for:
adding the calculation result to the message queue; and extracting the calculation result from the message queue if a calculation result request initiated from the master node is received, and returning the calculation result to the master node through the network connection.
10. A distributed system, characterized in that the system comprises a master node and at least one worker node configured to include a real-time communicating device within a distributed system according to anyone of Claims 6 to 9.
CA3152842A 2019-08-27 2020-06-24 Real-time communication method and apparatus for distributed system, and distributed system Pending CA3152842A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201910796708.5A CN110633145B (en) 2019-08-27 2019-08-27 Real-time communication method and device in distributed system and distributed system
CN201910796708.5 2019-08-27
PCT/CN2020/097838 WO2021036451A1 (en) 2019-08-27 2020-06-24 Real-time communication method and apparatus for distributed system, and distributed system

Publications (1)

Publication Number Publication Date
CA3152842A1 true CA3152842A1 (en) 2021-03-04

Family

ID=68969217

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3152842A Pending CA3152842A1 (en) 2019-08-27 2020-06-24 Real-time communication method and apparatus for distributed system, and distributed system

Country Status (3)

Country Link
CN (1) CN110633145B (en)
CA (1) CA3152842A1 (en)
WO (1) WO2021036451A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633145B (en) * 2019-08-27 2023-03-31 苏宁云计算有限公司 Real-time communication method and device in distributed system and distributed system
CN115599507A (en) * 2021-07-07 2023-01-13 清华大学(Cn) Data processing method, execution workstation, electronic device and storage medium
CN115981610B (en) * 2023-03-17 2023-06-02 科大国创软件股份有限公司 Comprehensive operation platform of photovoltaic energy storage system based on Lua script

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7958513B2 (en) * 2005-11-17 2011-06-07 International Business Machines Corporation Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment
CN101505306B (en) * 2009-03-23 2012-06-13 烽火通信科技股份有限公司 Inter-node reliable communication method in distributed system
US20120042003A1 (en) * 2010-08-12 2012-02-16 Raytheon Company Command and control task manager
CN103106249B (en) * 2013-01-08 2016-04-20 华中科技大学 A kind of parallel data processing system based on Cassandra
CN103647834B (en) * 2013-12-16 2017-03-22 上海证券交易所 System and method used for processing multi-phase distributed task scheduling
CN104378436A (en) * 2014-11-20 2015-02-25 深圳市远行科技有限公司 Information push system and method based on server push
CN107491355A (en) * 2017-08-17 2017-12-19 山东浪潮商用系统有限公司 Funcall method and device between a kind of process based on shared drive
CN109327509B (en) * 2018-09-11 2022-01-18 武汉魅瞳科技有限公司 Low-coupling distributed streaming computing system of master/slave architecture
CN109819037B (en) * 2019-01-29 2022-02-15 武汉鸿瑞达信息技术有限公司 Method and system for self-adaptive calculation and communication
CN110633145B (en) * 2019-08-27 2023-03-31 苏宁云计算有限公司 Real-time communication method and device in distributed system and distributed system

Also Published As

Publication number Publication date
CN110633145A (en) 2019-12-31
CN110633145B (en) 2023-03-31
WO2021036451A1 (en) 2021-03-04

Similar Documents

Publication Publication Date Title
CA3152842A1 (en) Real-time communication method and apparatus for distributed system, and distributed system
US8291070B2 (en) Determining an operating status of a remote host upon communication failure
US20170185452A1 (en) Apparatus and method for data processing
US10417062B2 (en) Method and apparatus of unloading out of memory processing flow to user space
US9483314B2 (en) Systems and methods for fault tolerant batch processing in a virtual environment
US11061693B2 (en) Reprogramming a field programmable device on-demand
US9348658B1 (en) Technologies for efficient synchronization barriers with work stealing support
CN112035238A (en) Task scheduling processing method and device, cluster system and readable storage medium
WO2018233299A1 (en) Method, apparatus and device for scheduling processor, and medium
US9971708B2 (en) System and method for application migration between docking station and dockable device
US10069674B2 (en) Monitoring file system operations between a client computer and a file server
CN112905334A (en) Resource management method, device, electronic equipment and storage medium
US9507637B1 (en) Computer platform where tasks can optionally share per task resources
CN115964153A (en) Asynchronous task processing method, device, equipment and storage medium
CN113806075A (en) Method, device and equipment for container hot updating CPU core of kubernets cluster and readable medium
CN111611086A (en) Information processing method, information processing apparatus, electronic device, and medium
CN108829516B (en) Resource virtualization scheduling method for graphic processor
CN109698850B (en) Processing method and system
CN113590285A (en) Method, system and equipment for dynamically setting thread pool parameters
GB2607475A (en) Generating a scaling plan for external systems during cloud tenant onboarding/offboarding
US10817400B2 (en) Management apparatus and management method
CN112486638A (en) Method, apparatus, device and storage medium for executing processing task
CN113032098B (en) Virtual machine scheduling method, device, equipment and readable storage medium
US20230063893A1 (en) Simultaneous-multi-threading (smt) aware processor allocation for cloud real-time workloads
JP2018538632A (en) Method and device for processing data after node restart

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228

EEER Examination request

Effective date: 20220228