WO2021036451A1 - 一种分布式系统内实时通讯方法、装置及分布式系统 - Google Patents

一种分布式系统内实时通讯方法、装置及分布式系统 Download PDF

Info

Publication number
WO2021036451A1
WO2021036451A1 PCT/CN2020/097838 CN2020097838W WO2021036451A1 WO 2021036451 A1 WO2021036451 A1 WO 2021036451A1 CN 2020097838 W CN2020097838 W CN 2020097838W WO 2021036451 A1 WO2021036451 A1 WO 2021036451A1
Authority
WO
WIPO (PCT)
Prior art keywords
calculation result
task unit
communication
master node
shared memory
Prior art date
Application number
PCT/CN2020/097838
Other languages
English (en)
French (fr)
Inventor
董邦发
Original Assignee
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁云计算有限公司 filed Critical 苏宁云计算有限公司
Priority to CA3152842A priority Critical patent/CA3152842A1/en
Publication of WO2021036451A1 publication Critical patent/WO2021036451A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Definitions

  • the present invention relates to the field of distributed technology, in particular to a real-time communication method, device and distributed system in a distributed system.
  • this kind of program that mainly consumes CPU or GPU resources is a computationally intensive task.
  • Computing-intensive tasks can be processed in a multi-threaded manner, but the more threads, the more time consumed in task switching, and the lower the efficiency of CPU execution, so this type of program cannot use too many threads.
  • a distributed system has multiple nodes, which are generally divided into Master nodes and Worker nodes, and network communication is required between Master nodes and Worker nodes. When real-time communication is required between these two kinds of nodes, if the amount of data concurrency is large, a large amount of IO will be generated.
  • This kind of task that requires a large amount of IO is generally called IO-intensive task, and IO-intensive task generally does not consume Too many CPUs, but frequently interrupts the CPU.
  • the inventor found that when the algorithm is deployed on the distributed system architecture to respond to output in real time, the task on the multi-core node is both computationally intensive and network IO intensive. Due to the excessive network IO, It will cause the algorithm calculation task to be frequently interrupted and the efficiency of the algorithm calculation will be severe, resulting in the entire distributed system not being able to respond to the output in real time.
  • the embodiments of the present application provide a real-time communication method, device and distributed system in a distributed system.
  • tasks on multi-core nodes are both computationally intensive and network IO intensive.
  • network IO intensive is both computationally intensive and network IO intensive.
  • a real-time communication method in a distributed system includes a master node and a working node, and an algorithm task unit and a communication task unit are deployed on the working node.
  • the method includes:
  • the communication task unit performs an initialization operation after being started, and enters a monitoring state after completing the initialization operation, wherein the initialization operation includes initializing shared memory and network connection;
  • the algorithm task unit is started after the communication task unit enters the monitoring state, and after being started, it processes the calculation task initiated from the master node;
  • the algorithm task unit writes the calculation result of the calculation task into the shared memory
  • the communication task unit reads the calculation result when it monitors that the calculation result is stored in the shared memory
  • the communication task unit returns the calculation result to the master node through the network connection.
  • the algorithm task unit is started after the communication task unit enters the monitoring state, including:
  • the communication task unit When the communication task unit enters the monitoring state, it notifies the algorithm task unit through a conditional lock, so that the algorithm task unit is activated.
  • the writing of the calculation result of the calculation task into the shared memory by the algorithm task unit includes:
  • the algorithm task unit serializes the calculation result of the calculation task to obtain serialized data, and writes the serialized data into the shared memory
  • reading the calculation result includes:
  • the communication task unit monitors the serialized data of the calculation result stored in the shared memory, obtains the serialized data from the shared memory, and deserializes the serialized data , Get the calculation result.
  • the initialization operation further includes creating a message queue, and before the communication task unit returns the calculation result to the master node through the network connection, the method further includes:
  • the communication task unit returning the calculation result to the master node through the network connection includes:
  • the communication task unit receives the calculation result request initiated from the master node, it will take the calculation result from the message queue and return it to the master node through the network connection.
  • the communication task unit receives a calculation result request initiated by the master node, it will take the calculation result from the message queue and return it to the master node through the network connection, including:
  • the communication task unit If the communication task unit receives a calculation result request initiated from the master node, it queries in the message queue whether there is a calculation result requested by the calculation result request;
  • calculation result does not exist, block the calculation result request, and when there is a new calculation result in the message queue, perform wake-up processing on the calculation result request;
  • a real-time communication device in a distributed system includes a master node and a working node, the device is located on the working node, and the device includes an algorithm task unit and a communication task unit, among them:
  • the communication task unit is configured to perform an initialization operation after being started, and enter a monitoring state after completing the initialization operation, wherein the initialization operation includes initializing a shared memory and a network connection;
  • the algorithm task unit is configured to be activated after the communication task unit enters the monitoring state, and to process the computing task initiated from the master node after being activated;
  • the algorithm task unit is also used to write the calculation result of the calculation task into the shared memory
  • the communication task unit is further configured to read the calculation result when the calculation result is stored in the shared memory and return the calculation result to the master node through the network connection.
  • the communication task unit is specifically used for:
  • the algorithm task unit When entering the listening state, the algorithm task unit is notified through a conditional lock, so that the algorithm task unit is started.
  • algorithm task unit is specifically used for:
  • the communication task unit is specifically used for:
  • the serialized data of the calculation result stored in the shared memory is monitored, the serialized data is obtained from the shared memory, and the serialized data is deserialized to obtain the calculation result.
  • the initialization operation further includes creating a message queue
  • the communication task unit is further used for:
  • the calculation result is taken from the message queue and returned to the master node through the network connection.
  • the communication task unit is specifically used for:
  • calculation result does not exist, block the calculation result request, and when there is a new calculation result in the message queue, perform wake-up processing on the calculation result request;
  • a distributed system in a third aspect, includes a master node and at least one working node, and the working node is configured to include the real-time communication device in a distributed system according to any one of the second aspect.
  • FIG. 1 is a flowchart of a real-time communication method in a distributed system according to an embodiment of the present invention
  • FIG. 2 is a structural block diagram of a real-time communication device in a distributed system according to an embodiment of the present invention
  • Fig. 3 is a structural block diagram of a distributed system provided by an embodiment of the present invention.
  • the embodiment of the present invention provides a real-time communication method in a distributed system.
  • the distributed system includes a master node and a working node.
  • an algorithm task unit and a communication task unit are deployed.
  • Real-time data transfer in the network IO that is, used to perform communication tasks
  • the algorithm task unit is used to execute the calculation task initiated by the master node, and the calculation result of the calculation task is written into the shared memory, and the communication task unit reads from the shared memory Take the calculation result of the calculation task and return it to the master node.
  • the embodiment of the present invention realizes the separation of communication tasks and computing tasks by deploying algorithm task units and communication task units on multi-core nodes in a distributed environment, which can prevent the computing threads in the computing-intensive tasks from being too frequent by the network. Interruption can ensure the real-time transmission of data by network IO, so that the distributed system can communicate in real time to ensure the real-time output of the algorithm; in addition, by using shared memory between communication tasks and computing tasks for communication, it can greatly improve inter-process communication s efficiency.
  • a real-time communication method in a distributed system includes a master node and a working node.
  • the working node is equipped with an algorithm task unit and a communication task unit.
  • the method can Including steps:
  • the communication task unit performs an initialization operation after being started, and enters the monitoring state after completing the initialization operation, where the initialization operation includes initializing the shared memory and network connection.
  • the communication task unit is started on the working node, and the communication task unit starts to perform the initialization operation.
  • the initialization operation includes initializing the network connection and the shared memory. After the initialization is completed, it enters the monitoring state to monitor the shared memory and the network connection respectively.
  • the shared memory is mainly used for the communication between the communication task unit and the algorithm task unit on the same working node
  • the network connection is mainly used for the communication between the working node and the master node.
  • the algorithm task unit is started after the communication task unit enters the monitoring state, and after being started, it processes the calculation task initiated from the master node.
  • the master node will initiate a calculation task to each working node, and after the algorithm task unit on the working node is started, the corresponding algorithm processing is performed on the calculation task initiated by the master node.
  • the master node can determine the working node for processing the computing task according to the operating status information of each working node, and send the computing task to the working node.
  • the operating status information includes one or more of CPU usage rate, memory usage rate, disk read and write, and network uplink and downlink.
  • the algorithm task unit writes the calculation result of the calculation task into the shared memory.
  • the algorithm task unit may write the calculation result of the calculation task into the shared memory according to a preset data structure.
  • the communication task unit reads the calculation result when it monitors the calculation result stored in the shared memory.
  • the communication task unit can read the data in the shared memory periodically or in real time, and start to obtain the calculation result when the calculation result of the calculation task stored in the shared memory is read.
  • the communication task unit can actively return the calculation result of the calculation task to the master node through a network connection, or after the master node establishes a network connection with the communication task unit on the working node, the calculation task can be requested according to the calculation result of the master node.
  • the calculation result of is returned to the master node through the network connection.
  • the working nodes in the distributed system in this embodiment must be multi-core. If it is a single-core CPU machine, the total computing resources are limited. It is not necessary to separate the algorithm task unit and the communication task unit, so this implementation Examples are not discussed.
  • the embodiment of the present invention provides a real-time communication method in a distributed system.
  • the distributed system includes a master node and a working node. On each working node, an algorithm task unit and a communication task unit are deployed.
  • the communication task unit is used for network IO.
  • the algorithm task unit is used to execute the calculation task initiated by the master node and write the calculation result of the calculation task into the shared memory.
  • the communication task unit reads the calculation result of the calculation task from the shared memory and returns it to the host node.
  • the embodiment of the present invention realizes the separation of communication tasks and computing tasks by deploying algorithm task units and communication task units on multi-core nodes in a distributed environment, which can prevent the computing threads in the computing-intensive tasks from being too frequent by the network.
  • Interruption can ensure the real-time transmission of data by network IO, so that the distributed system can communicate in real time to ensure the real-time output of the algorithm.
  • shared memory between communication tasks and computing tasks for communication, it can greatly improve inter-process communication s efficiency.
  • the algorithm task unit in the above step S12 is started after the communication task unit enters the listening state, which may specifically include:
  • the communication task unit When the communication task unit enters the monitoring state, it notifies the algorithm task unit through the conditional lock so that the algorithm task unit is started.
  • the communication task unit After the communication task unit completes the initialization of the shared memory, it monitors the shared memory area, and the communication task unit and the algorithm task unit notify each other through a conditional lock.
  • the conditional lock is also cross-process, so after the communication task unit initializes the conditional lock, it will lock the shared memory, and then the communication task unit will monitor the conditional lock, and release the conditional lock during monitoring.
  • the algorithm task unit After the algorithm task unit is started, it also needs to lock the conditional lock. If other processes do not release the conditional lock, the algorithm task unit will not be able to start. Therefore, the communication task unit needs to be started first, and then the algorithm task unit.
  • the algorithm task unit in the above step S13 writes the calculation results of the calculation tasks into the shared memory, which may specifically include:
  • the algorithm task unit serializes the calculation result of the calculation task, obtains the serialized data, and writes the serialized data into the shared memory.
  • the algorithm task unit serializes the calculation result to convert the calculation result into an object with a preset data structure, which is the serialized data, and the preset data structure is, for example, a JSON data structure.
  • the serialized data can be written into the shared memory in the form of key-value pairs, where key refers to the key name, and value refers to the key value.
  • the communication task unit monitors the calculation result stored in the shared memory in step S14, it reads the calculation result, which may specifically include:
  • the communication task unit monitors the serialized data of the calculation result stored in the shared memory, it obtains the serialized data from the shared memory, and deserializes the serialized data to obtain the calculation result.
  • the transmission object is more versatile and the message transmission efficiency is improved.
  • the initialization operation further includes creating a message queue.
  • the method may further include:
  • the communication task unit adds the calculation result to the message queue.
  • different types of message queues can be created in the initialization operation of the communication task unit. Different types of message queues are used to store calculation results of different types of calculation tasks. For example, the first message queue is used to store calculation results of image processing tasks. The second message queue is used to store the calculation results of video processing tasks, and so on.
  • the communication task unit adds the calculation result of the calculation task to the message queue corresponding to the type of the calculation task.
  • step S15 may specifically include:
  • the communication task unit receives the calculation result request initiated by the autonomous node, it will take the calculation result from the message queue and return it to the master node through the network connection.
  • step S15 may include the following steps:
  • step S151 If the communication task unit receives the calculation result request initiated by the autonomous node, it queries in the message queue whether the calculation result requested by the calculation result request exists, if it exists, executes step S152, if it does not exist, executes step S153 .
  • the communication task unit returns the calculation result requested by the calculation result request to the master node through the network connection.
  • step S153 Block the calculation result request, and when there is a new calculation result in the message queue, wake up the calculation result request. After step S153, perform step S154.
  • step S154 Determine whether the new calculation result is the calculation result requested by the calculation result request, and if so, return the new calculation result to the master node; otherwise, return to step S153.
  • the calculation results of all calculation tasks initiated by the master node are temporarily stored on the working node. Only when the master node needs a certain calculation result, the working node sends the algorithm result to the master node. Nodes share the memory required for storage of calculation results of computing tasks, which can reduce the memory burden on the master node.
  • a real-time communication device in a distributed system includes a master node and a working node.
  • the device is located on the working node.
  • the device includes an algorithm task unit and a communication task unit. among them:
  • the communication task unit 21 is used to perform an initialization operation after being started, and enter a monitoring state after the initialization operation is completed, where the initialization operation includes initializing shared memory and network connection;
  • the algorithm task unit 22 is used to be activated after the communication task unit 221 enters the monitoring state, and to process the computing task initiated from the master node after being activated;
  • the algorithm task unit 22 is also used to write the calculation result of the calculation task into the shared memory
  • the communication task unit 21 is also used to read the calculation result when the calculation result is stored in the shared memory and return the calculation result to the master node through the network connection.
  • the communication task unit 21 is specifically used for:
  • the algorithm task unit 22 When entering the listening state, the algorithm task unit 22 is notified through the conditional lock, so that the algorithm task unit is started.
  • algorithm task unit 22 is specifically used for:
  • the communication task unit 21 is specifically used for:
  • the serialized data of the calculation result stored in the shared memory is monitored, the serialized data is obtained from the shared memory, and the serialized data is deserialized to obtain the calculation result.
  • the initialization operation also includes creating a message queue, and the communication task unit 21 is also used for:
  • the calculation result is taken from the message queue and returned to the master node through the network connection.
  • the communication task unit 21 is specifically used for:
  • the real-time communication device in the distributed system provided by the embodiment of the present invention belongs to the same inventive concept as the real-time communication method in the distributed system provided by the embodiment of the present invention, and can perform the real-time communication method in the distributed system provided by the embodiment of the present invention , With corresponding functional modules and beneficial effects for implementing real-time communication methods in distributed systems.
  • the real-time communication method in a distributed system provided by the embodiment of the present invention which will not be repeated here.
  • a distributed system is also provided.
  • the system includes a master node 31 and a working node 32.
  • the working node 32 is configured to include the real-time Communication device.
  • an embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the computer program to implement real-time in the distributed system of the foregoing embodiment. The steps of the communication method.
  • an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the real-time communication method in the distributed system of the foregoing embodiment are implemented.
  • the embodiments in the embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the embodiments of the present invention may adopt a form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明公开一种分布式系统内实时通讯方法、装置及分布式系统,属于分布式技术领域,分布式系统包括主节点和工作节点,工作节点上部署有算法任务单元和通讯任务单元,方法包括:通讯任务单元在被启动后执行初始化操作,并在完成初始化操作后进入监听状态,其中,初始化操作包括初始化共享内存和网络连接;算法任务单元在通讯任务单元进入监听状态后被启动,并在被启动后对来自主节点发起的计算任务进行处理;算法任务单元将计算任务的计算结果写入共享内存中;通讯任务单元在监听到共享内存中存入计算结果时,读取计算结果;通讯任务单元将计算结果通过网络连接返回给主节点。本发明实施例能够使得整个分布式系统内的实时通讯以保证算法实时输出。

Description

一种分布式系统内实时通讯方法、装置及分布式系统 技术领域
本发明涉及分布式技术领域,特别涉及一种分布式系统内实时通讯方法、装置及分布式系统。
背景技术
随着人工智能技术的蓬勃发展,越来越多的人工智能技术已经开始应用落地。人工智能算法在落地过程中,除了算法本身的性能,也非常依赖硬件计算资源。当前单个设备拥有多核,性能已经很强,但是当项目需要的算力资源超过单个设备能够提供的极限时,通常会采用分布式解决方案,即将算力需要的资源分配到不同的设备上。分布式解决方案需要将系统分成各个节点,每个节点承担一定的计算任务,这些节点通过网络进行通讯。通常算法在运行时,例如图像处理,视频编解码,这类算法消耗节点的CPU或GPU资源,一般称这种主要消耗CPU或GPU资源的程序为计算密集型任务。计算密集型任务可以采用多线程方式进行处理,但是线程越多,消耗在任务切换的时间越多,CPU执行的效率越低,所以这类程序不能使用太多的线程。分布式系统拥有多个节点,一般分为Master节点和Worker节点,Master节点和Worker节点之间需要进行网络通讯。当这两种节点之间需要进行实时通讯时,如果数据并发量很大,就会产生大量的IO,这种需要大量IO的任务一般称为IO密集型任务,IO密集型任务一般不会消耗太多CPU,但是会频繁地使CPU发生中断。
在实现本发明的过程中,发明人发现:当算法部署在分布式系统架构上实时响应输出时,对于多核节点上任务既是计算密集型又是网络IO密集型任务,由于网络过多的IO将会使得算法计算任务被频繁中断而严重算法计算的效率,从而导致整个分布式系统不能够实时响应输出。
发明内容
为了解决上述技术问题,本申请实施例提供了一种分布式系统内实时通讯方法、装置及分布式系统,在分布式环境下,对于多核节点上的任务既是计算密集型又是网络IO密集型时,通过将计算任务和通讯任务进行分离,保证了整个分布式系统能够实时响应输出。
本发明实施例提供的具体技术方案如下:
第一方面,提供了一种分布式系统内实时通讯方法,所述分布式系统包括主节点和工作节点,所述工作节点上部署有算法任务单元和通讯任务单元,所述方法包括:
所述通讯任务单元在被启动后执行初始化操作,并在完成所述初始化操作后进入监听状态,其中,所述初始化操作包括初始化共享内存和网络连接;
所述算法任务单元在所述通讯任务单元进入监听状态后被启动,并在被启动后对来自所述主节点发起的计算任务进行处理;
所述算法任务单元将所述计算任务的计算结果写入所述共享内存中;
所述通讯任务单元在监听到所述共享内存中存入所述计算结果时,读取所述计算结果;
所述通讯任务单元将所述计算结果通过所述网络连接返回给所述主节点。
进一步地,所述算法任务单元在所述通讯任务单元进入监听状态后被启动,包括:
所述通讯任务单元在进入监听状态时,通过条件锁通知所述算法任务单元,以使所述算法任务单元被启动。
进一步地,所述算法任务单元将所述计算任务的计算结果写入所述共享内存中,包括:
所述算法任务单元对所述计算任务的计算结果进行序列化处理,得到序列化数据,将所述序列化数据写入所述共享内存中;
所述通讯任务单元在监听到所述共享内存中存入所述计算结果时,读取所述计算结果,包括:
所述通讯任务单元在监听到所述共享内存中存入所述计算结果的序列化数据时,从所述共享内存中获取所述序列化数据,并对所述序列化数据进行反序列化处理,得到所述计算结果。
进一步地,所述初始化操作还包括创建消息队列,所述通讯任务单元将所述计算结果通过所述网络连接返回给所述主节点步骤之前,所述方法还包括:
所述通讯任务单元将所述计算结果添加至所述消息队列中;
所述通讯任务单元将所述计算结果通过所述网络连接返回给所述主节点,包括:
所述通讯任务单元若接收到来自所述主节点发起的计算结果请求,则从所述消息队列中取出所述计算结果,并通过所述网络连接返回给所述主节点。
进一步地,所述通讯任务单元若接收到来自所述主节点发起的计算结果请求,则从所述消息队列中取出所述计算结果,并通过所述网络连接返回给所述主节点,包括:
所述通讯任务单元若接收到来自所述主节点发起的计算结果请求,则在所述消息队列中进行查询是否存在所述计算结果请求所请求的计算结果;
若存在所述计算结果,则将所述计算结果返回给所述主节点;
若不存在所述计算结果,则对所述计算结果请求进行阻塞,并在所述消息队列中有新的计算结果时,对所述计算结果请求进行唤醒处理;
判断所述新的计算结果是否为所述计算结果请求所请求的计算结果,若是,则将所述新的计算结果返回给所述主节点,否则,继续对所述计算结果请求进行阻塞。
第二方面,提供了一种分布式系统内实时通讯装置,所述分布式系统包括主节点和工作节点,所述装置位于所述工作节点上,所述装置包括算法任务单元和通讯任务单元,其中:
所述通讯任务单元,用于在被启动后执行初始化操作,并在完成所述初始化操作后进入监听状态,其中,所述初始化操作包括初始化共享内存和网络连接;
所述算法任务单元,用于在所述通讯任务单元进入监听状态后被启动,并在被启动后对来自所述主节点发起的计算任务进行处理;
所述算法任务单元,还用于将所述计算任务的计算结果写入所述共享内存中;
所述通讯任务单元,还用于在监听到所述共享内存中存入所述计算结果时,读取所述计算结果,并将所述计算结果通过所述网络连接返回给所述主节点。
进一步地,所述通讯任务单元具体用于:
在进入监听状态时,通过条件锁通知所述算法任务单元,以使所述算法任务单元被启动。
进一步地,所述算法任务单元具体用于:
对所述计算任务的计算结果进行序列化处理,得到序列化数据,将所述序列化数据写入所述共享内存中;
所述通讯任务单元具体用于:
在监听到所述共享内存中存入所述计算结果的序列化数据时,从所述共享内存中获取所述序列化数据,并对所述序列化数据进行反序列化处理,得到所述计算结果。
进一步地,所述初始化操作还包括创建消息队列,所述通讯任务单元还用于:
将所述计算结果添加至所述消息队列中;
若接收到来自所述主节点发起的计算结果请求,则从所述消息队列中取出所述计算结果,并通过所述网络连接返回给所述主节点。
进一步地,所述通讯任务单元具体用于:
若接收到来自所述主节点发起的计算结果请求,则在所述消息队列中进行查询是否存在所述计算结果请求所请求的计算结果;
若存在所述计算结果,则将所述计算结果返回给所述主节点;
若不存在所述计算结果,则对所述计算结果请求进行阻塞,并在所述消息队列中有新的计算结果时,对所述计算结果请求进行唤醒处理;
判断所述新的计算结果是否为所述计算结果请求所请求的计算结果,若是,则将所述新的计算结果返回给所述主节点,否则,继续对所述计算结果请求进行阻塞。
第三方面,提供了一种分布式系统,所述系统包括主节点和至少一个工作节点,所述工作节点被配置为包括如第二方面任意一项所述的分布式系统内实时通讯装置。
本发明实施例提供的技术方案带来的有益效果是:
1.在分布式环境多核节点上通过部署算法任务单元和通讯任务单元,实现对通讯任务和计算任务进行分离,既能够避免计算密集型任务内的计算线程不被网络过多的IO频繁中断,又能够保证网络IO实时传递数据,从而使得分布式系统能够实时通讯以保证算法实时输出。
2.通过在通讯任务和计算任务之间采用共享内存方式进行通讯,因此能够极大地提高进程间通讯的效率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种分布式系统内实时通讯方法的流程图;
图2是本发明实施例提供的一种分布式系统内实时通讯装置的结构框图;
图3是本发明实施例提供的一种分布式系统的结构框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
现有技术中,分布式解决方案已被广泛应用,例如零售领域中的无人超市,需要服务器对多路摄像头进行处理,而单台设备并不能满足需求,如果所有任务全部集中在一台设备上时,设备将不堪重负,因此必须采用由多台设备形成的分布式系统,多台设备中分为主节点和工作节点,工作节点承担大量的算法任务,这些任务是计算密集型任务,但是这些算法任务的输出需要实时输出到主节点上,因此在算法任务内包含了通讯任务,而这些通讯任务产生的网络IO会影响算法运行的效率,使得整个分布式系统不能够实时响应输出。
为此,本发明实施例提供了一种分布式系统内实时通讯方法,分布式系统包括主节点和工作节点,在每一个工作节点上均部署有算法任务单元和通讯任务单元,通讯任务单元用于网络IO的实时传递数据(即用于执行通讯任务),算法任务单元用于执行主节点发起的计算任务,并将计算任务的计算结果写入共享内存中,通讯任务单元从共享内存中读取计算任务的计算结果,并返回给主节点。本发明实施例在分布式环境多核节点上通过部署算法任务单元和通讯任务单元,实现对通讯任务和计算任务进行分离,既能够避免计算密集型任务内的计算线程不被网络过多的IO频繁中断,又能够保证网络IO实时传递数据,使得分布式系统能够实时通讯以保证算法实时输出;另外,通过在通讯任务和计算任务之间采用共享内存方式进行通讯,因此能够极大地提高进程间通讯的效率。
在一个实施例中,如图1所示,提供了一种分布式系统内实时通讯方法,分布式系统包括主节点和工作节点,工作节点上部署有算法任务单元和通讯任务单元,该方法可以包括步骤:
S11,通讯任务单元在被启动后执行初始化操作,并在完成初始化操作后进 入监听状态,其中,初始化操作包括初始化共享内存和网络连接。
具体地,在工作节点上启动通讯任务单元,通讯任务单元开始执行初始化操作,初始化操作包括初始化网络连接和共享内存,初始化完成后,进入监听状态,分别对共享内存和网络连接进行监听。
其中,共享内存主要用于同一个工作节点上的通讯任务单元与算法任务单元之间的通讯,网络连接主要用于工作节点和主节点之间的通讯。
需要说明的是,由于共享内存不随进程消失而消失,因此在启动工作节点的算法任务单元之前,最好先清理之前创建的共享内存区域,在实际应用中,可以在通讯任务单元启动后,初始化共享内存之前做一次清理操作。
S12,算法任务单元在通讯任务单元进入监听状态后被启动,并在被启动后对来自主节点发起的计算任务进行处理。
本实施例中,对通讯任务单元和算法任务单元的启动顺序有严格要求,必须先启动通讯任务单元,然后再启动算法任务单元。
本实施例中,主节点会向各个工作节点发起计算任务,工作节点上的算法任务单元启动后,对主节点发起的计算任务执行相应的算法处理。
在实际应用中,主节点可以根据各个工作节点的运行状态信息,确定出进行处理计算任务的工作节点,将计算任务发送到该工作节点上。其中,运行状态信息包括CPU使用率、内存使用率、磁盘读写和网络上下行中的一种或多种。
S13,算法任务单元将计算任务的计算结果写入共享内存中。
具体地,算法任务单元可以将计算任务的计算结果按照预设数据结构写入共享内存中。
S14,通讯任务单元在监听到共享内存中存入计算结果时,读取计算结果。
具体地,通讯任务单元可以定时地或实时地读取共享内存中的数据,当读取到共享内存中存入计算任务的计算结果时,开始获取计算结果。
S15,通讯任务单元将计算结果通过网络连接返回给主节点。
具体地,通讯任务单元可以主动将计算任务的计算结果通过网络连接返回 给主节点,也可以在主节点与工作节点上的通讯任务单元建立网络连接后,根据主节点的计算结果请求将计算任务的计算结果通过网络连接返回给主节点。
需要说明的是,本实施例中的分布式系统中的工作节点必须是多核的,如果是单核CPU机器,总的计算资源有限,没有必要将算法任务单元和通讯任务单元分开,故本实施例不作讨论。
本发明实施例提供了一种分布式系统内实时通讯方法,分布式系统包括主节点和工作节点,在每一个工作节点上均部署有算法任务单元和通讯任务单元,通讯任务单元用于网络IO的实时传递数据,算法任务单元用于执行主节点发起的计算任务,并将计算任务的计算结果写入共享内存中,通讯任务单元从共享内存中读取计算任务的计算结果,并返回给主节点。本发明实施例在分布式环境多核节点上通过部署算法任务单元和通讯任务单元,实现对通讯任务和计算任务进行分离,既能够避免计算密集型任务内的计算线程不被网络过多的IO频繁中断,又能够保证网络IO实时传递数据,使得分布式系统能够实时通讯以保证算法实时输出,另外,通过在通讯任务和计算任务之间采用共享内存方式进行通讯,因此能够极大地提高进程间通讯的效率。
在一个实施例中,上述的步骤S12中算法任务单元在通讯任务单元进入监听状态后被启动,具体可以包括:
通讯任务单元在进入监听状态时,通过条件锁通知算法任务单元,以使算法任务单元被启动。
具体地,通讯任务单元完成对共享内存的初始化之后,会监听共享内存区域,通讯任务单元和算法任务单元通过条件锁进行相互通知。条件锁也是跨进程的,所以通讯任务单元初始化条件锁之后,会锁住共享内存,接下来通讯任务单元会对条件锁进行监听,监听的时候会将条件锁释放。算法任务单元在启动过后,也需要锁住条件锁,如果其他进程没有释放条件锁,算法任务单元将没法启动,因此需要先启动通讯任务单元,然后再启动算法任务单元。
在一个实施例中,为了使各个任务间传递的消息更加通用,上述的步骤S13 中算法任务单元将计算任务的计算结果写入共享内存中,具体可以包括:
算法任务单元对计算任务的计算结果进行序列化处理,得到序列化数据,将序列化数据写入共享内存中。
具体地,算法任务单元将计算结果进行序列化处理是将计算结果转换为具有预设数据结构的对象,该对象即为序列化数据,预设数据结构例如JSON数据结构。在得到序列化数据后,可以将序列化数据以键值对(key-value)的形式写入共享内存中,其中,key指键名,value指键值。
相应地,步骤S14中通讯任务单元在监听到共享内存中存入计算结果时,读取计算结果,具体可以包括:
通讯任务单元在监听到共享内存中存入计算结果的序列化数据时,从共享内存中获取序列化数据,并对序列化数据进行反序列化处理,得到计算结果。
本发明实施例中,通过采用对象序列化方式进行进程间通讯,使得传输对象更加通用,提高消息传输效率。
在一个实施例中,在上述方法实施例的基础上,初始化操作还包括创建消息队列,步骤S15之前,方法还可以包括:
通讯任务单元将计算结果添加至消息队列中。
其中,在通讯任务单元初始化操作中可以创建不同类型的消息队列,不同类型的消息队列用于存储不同类型的计算任务的计算结果,例如第一消息队列用于存储图像处理任务的计算结果,第二消息队列用于存储视频处理任务的计算结果,等等。
具体地,通讯任务单元将计算任务的计算结果添加至计算任务的类型对应的消息队列中。
相应地,步骤S15具体可以包括:
通讯任务单元若接收到来自主节点发起的计算结果请求,则从消息队列中取出计算结果,并通过网络连接返回给主节点。
具体来说,步骤S15的具体实现过程可以包括以下步骤:
S151,通讯任务单元若接收到来自主节点发起的计算结果请求,则在消息队列中进行查询是否存在计算结果请求所请求的计算结果,若存在,则执行步骤S152,若不存在,则执行步骤S153。
S152,将计算结果返回给主节点。
具体地,通讯任务单元通过网络连接将计算结果请求所请求的计算结果返回给主节点。
S153,对计算结果请求进行阻塞,并在消息队列中有新的计算结果时,对计算结果请求进行唤醒处理,在步骤S153之后,执行步骤S154。
S154,判断新的计算结果是否为计算结果请求所请求的计算结果,若是,则将新的计算结果返回给主节点,否则,则返回执行步骤S153。
本实施例中,对主节点发起的所有的计算任务的计算结果全部暂存在工作节点上,只有当主节点需要某个计算结果时,工作节点才将算法结果发送给主节点,如此,通过由工作节点分担计算任务的计算结果存储所需要的内存,能够减少主节点上的内存负担。
在一个实施例中,如图2所示,提供了一种分布式系统内实时通讯装置,分布式系统包括主节点和工作节点,装置位于工作节点上,装置包括算法任务单元和通讯任务单元,其中:
通讯任务单元21,用于在被启动后执行初始化操作,并在完成初始化操作后进入监听状态,其中,初始化操作包括初始化共享内存和网络连接;
算法任务单元22,用于在通讯任务单元221进入监听状态后被启动,并在被启动后对来自主节点发起的计算任务进行处理;
算法任务单元22,还用于将计算任务的计算结果写入共享内存中;
通讯任务单元21,还用于在监听到共享内存中存入计算结果时,读取计算结果,并将计算结果通过网络连接返回给主节点。
进一步地,通讯任务单元21具体用于:
在进入监听状态时,通过条件锁通知算法任务单元22,以使算法任务单元 被启动。
进一步地,算法任务单元22具体用于:
对计算任务的计算结果进行序列化处理,得到序列化数据,将序列化数据写入共享内存中;
通讯任务单元21具体用于:
在监听到共享内存中存入计算结果的序列化数据时,从共享内存中获取序列化数据,并对序列化数据进行反序列化处理,得到计算结果。
进一步地,初始化操作还包括创建消息队列,通讯任务单元21还用于:
将计算结果添加至消息队列中;
若接收到来自主节点发起的计算结果请求,则从消息队列中取出计算结果,并通过网络连接返回给主节点。
进一步地,通讯任务单元21具体用于:
若接收到来自主节点发起的计算结果请求,则在消息队列中进行查询是否存在计算结果请求所请求的计算结果;
若存在计算结果,则将计算结果返回给主节点;
若不存在计算结果,则对计算结果请求进行阻塞,并在消息队列中有新的计算结果时,对计算结果请求进行唤醒处理;
判断新的计算结果是否为计算结果请求所请求的计算结果,若是,则将新的计算结果返回给主节点,否则,继续对计算结果请求进行阻塞。
本发明实施例提供的分布式系统内实时通讯装置,与本发明实施例所提供的分布式系统内实时通讯方法属于同一发明构思,可执行本发明实施例所提供的分布式系统内实时通讯方法,具备执行分布式系统内实时通讯方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本发明实施例提供的分布式系统内实时通讯方法,此处不再加以赘述。
在一个实施例中,如图3所示,还提供了一种分布式系统,所述系统包括主节点31和工作节点32,工作节点32被配置为包括如上述实施例的分布式系统 内实时通讯装置。
此外,本发明实施例还提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述实施例的分布式系统内实时通讯方法的步骤。
此外,本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述实施例的分布式系统内实时通讯方法的步骤。
本领域内的技术人员应明白,本发明实施例中的实施例可提供为方法、系统、或计算机程序产品。因此,本发明实施例中可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例中可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明实施例中是参照根据本发明实施例中实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程 或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明实施例中的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明实施例中范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (10)

  1. 一种分布式系统内实时通讯方法,其特征在于,所述分布式系统包括主节点和工作节点,所述工作节点上部署有算法任务单元和通讯任务单元,所述方法包括:
    所述通讯任务单元在被启动后执行初始化操作,并在完成所述初始化操作后进入监听状态,其中,所述初始化操作包括初始化共享内存和网络连接;
    所述算法任务单元在所述通讯任务单元进入监听状态后被启动,并在被启动后对来自所述主节点发起的计算任务进行处理;
    所述算法任务单元将所述计算任务的计算结果写入所述共享内存中;
    所述通讯任务单元在监听到所述共享内存中存入所述计算结果时,读取所述计算结果;
    所述通讯任务单元将所述计算结果通过所述网络连接返回给所述主节点。
  2. 根据权利要求1所述的方法,其特征在于,所述算法任务单元在所述通讯任务单元进入监听状态后被启动,包括:
    所述通讯任务单元在进入监听状态时,通过条件锁通知所述算法任务单元,以使所述算法任务单元被启动。
  3. 根据权利要求1所述的方法,其特征在于,所述算法任务单元将所述计算任务的计算结果写入所述共享内存中,包括:
    所述算法任务单元对所述计算任务的计算结果进行序列化处理,得到序列化数据,将所述序列化数据写入所述共享内存中;
    所述通讯任务单元在监听到所述共享内存中存入所述计算结果时,读取所述计算结果,包括:
    所述通讯任务单元在监听到所述共享内存中存入所述计算结果的序列化数据时,从所述共享内存中获取所述序列化数据,并对所述序列化数据进行反序列化处理,得到所述计算结果。
  4. 根据权利要求1至3任意一项所述的方法,其特征在于,所述初始化操作还包括创建消息队列,所述通讯任务单元将所述计算结果通过所述网络连接返回给所述主节点步骤之前,所述方法还包括:
    所述通讯任务单元将所述计算结果添加至所述消息队列中;
    所述通讯任务单元将所述计算结果通过所述网络连接返回给所述主节点,包括:
    所述通讯任务单元若接收到来自所述主节点发起的计算结果请求,则从所述消息队列中取出所述计算结果,并通过所述网络连接返回给所述主节点。
  5. 根据权利要求4所述的方法,其特征在于,所述通讯任务单元若接收到来自所述主节点发起的计算结果请求,则从所述消息队列中取出所述计算结果,并通过所述网络连接返回给所述主节点,包括:
    所述通讯任务单元若接收到来自所述主节点发起的计算结果请求,则在所述消息队列中进行查询是否存在所述计算结果请求所请求的计算结果;
    若存在所述计算结果,则将所述计算结果返回给所述主节点;
    若不存在所述计算结果,则对所述计算结果请求进行阻塞,并在所述消息队列中有新的计算结果时,对所述计算结果请求进行唤醒处理;
    判断所述新的计算结果是否为所述计算结果请求所请求的计算结果,若是,则将所述新的计算结果返回给所述主节点,否则,继续对所述计算结果请求进行阻塞。
  6. 一种分布式系统内实时通讯装置,其特征在于,所述分布式系统包括主节点和工作节点,所述装置位于所述工作节点上,所述装置包括算法任务单元和通讯任务单元,其中:
    所述通讯任务单元,用于在被启动后执行初始化操作,并在完成所述初始化操作后进入监听状态,其中,所述初始化操作包括初始化共享内存和网络连接;
    所述算法任务单元,用于在所述通讯任务单元进入监听状态后被启动,并在被启动后对来自所述主节点发起的计算任务进行处理;
    所述算法任务单元,还用于将所述计算任务的计算结果写入所述共享内存中;
    所述通讯任务单元,还用于在监听到所述共享内存中存入所述计算结果时,读取所述计算结果,并将所述计算结果通过所述网络连接返回给所述主节点。
  7. 根据权利要求6所述的装置,其特征在于,所述通讯任务单元具体用于:
    在进入监听状态时,通过条件锁通知所述算法任务单元,以使所述算法任务单元被启动。
  8. 根据权利要求6所述的装置,其特征在于,所述算法任务单元具体用于:
    对所述计算任务的计算结果进行序列化处理,得到序列化数据,将所述序列化数据写入所述共享内存中;
    所述通讯任务单元具体用于:
    在监听到所述共享内存中存入所述计算结果的序列化数据时,从所述共享内存中获取所述序列化数据,并对所述序列化数据进行反序列化处理,得到所述计算结果。
  9. 根据权利要求6至8任意一项所述的装置,其特征在于,所述初始化操作还包括创建消息队列,所述通讯任务单元还用于:
    所述通讯任务单元将所述计算结果添加至所述消息队列中;
    若接收到来自所述主节点发起的计算结果请求,则从所述消息队列中取出所述计算结果,并通过所述网络连接返回给所述主节点。
  10. 一种分布式系统,其特征在于,所述系统包括主节点和至少一个工作节点,所述工作节点被配置为包括如权利要求6至9任意一项所述的分布式系统内实时通讯装置。
PCT/CN2020/097838 2019-08-27 2020-06-24 一种分布式系统内实时通讯方法、装置及分布式系统 WO2021036451A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3152842A CA3152842A1 (en) 2019-08-27 2020-06-24 Real-time communication method and apparatus for distributed system, and distributed system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910796708.5 2019-08-27
CN201910796708.5A CN110633145B (zh) 2019-08-27 2019-08-27 一种分布式系统内实时通讯方法、装置及分布式系统

Publications (1)

Publication Number Publication Date
WO2021036451A1 true WO2021036451A1 (zh) 2021-03-04

Family

ID=68969217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097838 WO2021036451A1 (zh) 2019-08-27 2020-06-24 一种分布式系统内实时通讯方法、装置及分布式系统

Country Status (3)

Country Link
CN (1) CN110633145B (zh)
CA (1) CA3152842A1 (zh)
WO (1) WO2021036451A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115981610A (zh) * 2023-03-17 2023-04-18 科大国创软件股份有限公司 一种基于Lua脚本实现的光伏储能系统的综合运算平台

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633145B (zh) * 2019-08-27 2023-03-31 苏宁云计算有限公司 一种分布式系统内实时通讯方法、装置及分布式系统
CN115599507A (zh) * 2021-07-07 2023-01-13 清华大学(Cn) 数据处理方法、执行工作站、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101300551A (zh) * 2005-11-17 2008-11-05 国际商业机器公司 对称的多处理集群环境中的进程间的通信
CN101505306A (zh) * 2009-03-23 2009-08-12 烽火通信科技股份有限公司 一种分布式系统中的节点间可靠通信方法
US20120042003A1 (en) * 2010-08-12 2012-02-16 Raytheon Company Command and control task manager
CN109819037A (zh) * 2019-01-29 2019-05-28 武汉鸿瑞达信息技术有限公司 一种自适应计算与通信的方法和系统
CN110633145A (zh) * 2019-08-27 2019-12-31 苏宁云计算有限公司 一种分布式系统内实时通讯方法、装置及分布式系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106249B (zh) * 2013-01-08 2016-04-20 华中科技大学 一种基于Cassandra的数据并行处理系统
CN103647834B (zh) * 2013-12-16 2017-03-22 上海证券交易所 一种用于处理多阶段分布式任务调度的系统及方法
CN104378436A (zh) * 2014-11-20 2015-02-25 深圳市远行科技有限公司 一种基于服务器推送的信息推送系统及推送方法
CN107491355A (zh) * 2017-08-17 2017-12-19 山东浪潮商用系统有限公司 一种基于共享内存的进程间功能调用方法及装置
CN109327509B (zh) * 2018-09-11 2022-01-18 武汉魅瞳科技有限公司 一种主/从架构的低耦合的分布式流式计算系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101300551A (zh) * 2005-11-17 2008-11-05 国际商业机器公司 对称的多处理集群环境中的进程间的通信
CN101505306A (zh) * 2009-03-23 2009-08-12 烽火通信科技股份有限公司 一种分布式系统中的节点间可靠通信方法
US20120042003A1 (en) * 2010-08-12 2012-02-16 Raytheon Company Command and control task manager
CN109819037A (zh) * 2019-01-29 2019-05-28 武汉鸿瑞达信息技术有限公司 一种自适应计算与通信的方法和系统
CN110633145A (zh) * 2019-08-27 2019-12-31 苏宁云计算有限公司 一种分布式系统内实时通讯方法、装置及分布式系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115981610A (zh) * 2023-03-17 2023-04-18 科大国创软件股份有限公司 一种基于Lua脚本实现的光伏储能系统的综合运算平台
CN115981610B (zh) * 2023-03-17 2023-06-02 科大国创软件股份有限公司 一种基于Lua脚本实现的光伏储能系统的综合运算平台

Also Published As

Publication number Publication date
CN110633145B (zh) 2023-03-31
CN110633145A (zh) 2019-12-31
CA3152842A1 (en) 2021-03-04

Similar Documents

Publication Publication Date Title
WO2021036451A1 (zh) 一种分布式系统内实时通讯方法、装置及分布式系统
CN108647104B (zh) 请求处理方法、服务器及计算机可读存储介质
US8112559B2 (en) Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment
US8436720B2 (en) Monitoring operating parameters in a distributed computing system with active messages
KR102199275B1 (ko) 분산 컴퓨팅 시스템에서의 적응적 리소스 관리
US8516487B2 (en) Dynamic job relocation in a high performance computing system
JP2010061648A (ja) ハイブリッド・コンピューティング環境におけるデータ処理のための方法、装置、およびプログラム
US20180341555A1 (en) Data processing method, data processing system, and computer program product
US20120297216A1 (en) Dynamically selecting active polling or timed waits
US11392414B2 (en) Cooperation-based node management protocol
US20150172160A1 (en) Monitoring file system operations between a client computer and a file server
US8631086B2 (en) Preventing messaging queue deadlocks in a DMA environment
US10459771B2 (en) Lightweight thread synchronization using shared memory state
US11748164B2 (en) FAAS distributed computing method and apparatus
US10437754B1 (en) Diagnostic fault management controller for distributed computing
US20190243673A1 (en) System and method for timing out guest operating system requests from hypervisor level
US11243800B2 (en) Efficient virtual machine memory monitoring with hyper-threading
US11061730B2 (en) Efficient scheduling for hyper-threaded CPUs using memory monitoring
WO2023280208A1 (zh) 数据处理方法、执行工作站、电子设备和存储介质
KR102026333B1 (ko) 분산 파일 시스템에 대한 태스크 처리 방법
US8291419B2 (en) Fault tolerant system for execution of parallel jobs
WO2015180111A1 (zh) 一种管理系统资源的方法、装置及设备
CN116257471A (zh) 一种业务处理方法及装置
JP2018538632A (ja) ノードの再起動後にデータを処理する方法及びデバイス
US10713103B2 (en) Lightweight application programming interface (API) creation and management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20857947

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3152842

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20857947

Country of ref document: EP

Kind code of ref document: A1