CN118265973A

CN118265973A - Message processing method and device

Info

Publication number: CN118265973A
Application number: CN202180104290.2A
Authority: CN
Inventors: 欧阳伟龙; 胡粤麟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2024-06-28
Also published as: WO2023092415A1

Abstract

The application discloses a message processing method and a message processing device, which are used for improving the resource utilization rate of a data processing system. In one embodiment, the first processing unit processes the first event message to obtain a second event message; the first event message is received by the first processing unit or the first event message is generated by the first processing unit based on a processing request of the application; the first processing unit sends the second event message to the second processing unit according to the context information, wherein the context information comprises the route information from the first processing unit to the second processing unit, and the context information is generated based on the processing request of the application program; wherein the first processing unit may be an engine or an accelerator; the second processing unit may also be an engine or an accelerator; the first processing unit is different from the second processing unit. In the method, the transmission of the event message between different processing units is realized based on the context information, so that the processing performance of the system can be improved.

Description

Message processing method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a message processing method and device.

Background

The clock frequency of the high performance processor (Central Processing Unit, CPU) has not changed much, and performance is slowly improved. In terms of power consumption, power consumption per square centimeter changes from tens of milliwatts to around one watt, and reaches a limit, limiting improvement in performance.

In order to improve the performance of the CPU, the industry hopes to merge the general computing power of the CPU and the acceleration computing power of the specialized computing chip to perform heterogeneous computing. Typically, heterogeneous computing tasks are scheduled by a CPU, and heterogeneous computing resources need to wait for the CPU to move data up, and performance bottlenecks exist in scheduling utilization of the heterogeneous resources by the data processing system.

Therefore, the message processing method is provided to solve the problem of low resource utilization rate of the data processing system when heterogeneous resources are scheduled, and has practical significance.

Disclosure of Invention

The embodiment of the application provides a message processing method and a message processing device, which are used for improving the resource utilization rate of a data processing system.

In a first aspect, a message processing method is provided, including:

The first processing unit processes a first event message to obtain a second event message, wherein the first event message is received by the first processing unit or is generated by the first processing unit based on a processing request of an application program;

The first processing unit sends the second event message to a second processing unit according to context information, wherein the context information comprises route information from the first processing unit to the second processing unit, and the context information is generated based on a processing request of the application program;

the first processing unit is a first engine, the second processing unit is a second accelerator, or the first processing unit is a first accelerator, the second processing unit is a second engine, or the first processing unit is a first engine, the second processing unit is a second engine, or the first processing unit is a first accelerator, and the second processing unit is a second accelerator.

The application provides a method, comprising the following steps: the first processing unit processes the first event message to obtain a second event message; the first event message is received by the first processing unit or the first event message is generated by the first processing unit based on a processing request of the application; the first processing unit sends the second event message to the second processing unit according to the context information, wherein the context information comprises the route information from the first processing unit to the second processing unit, and the context information is generated based on the processing request of the application program; wherein the first processing unit may be an engine or an accelerator; the second processing unit may also be an engine or an accelerator; the first processing unit is different from the second processing unit. In the method, because the transmission of the event message between different processing units is realized based on the context information, compared with the transmission scheduling of the event message by adopting a scheduling mode (such as scheduling the message by using a scheduler, etc.), the implementation mode can avoid the performance bottleneck caused by the transmission scheduling, and further can improve the processing performance of the system.

In one possible design, the first processing unit sends the second event message to a second processing unit according to the context information, including:

The first processing unit sends the second event message to an event queue corresponding to the second processing unit according to the routing information;

The second processing unit obtains the second event message from the event queue.

Through the design, information is transmitted between different processing units based on the event queue, for example, the thread can send data to be processed by the accelerator to the event queue corresponding to the accelerator through the event information, so that the corresponding accelerator processes the event information, the coupling degree between the thread and the accelerator is reduced, the flexibility of resource allocation is improved, and the resource utilization rate of the data processing process is improved.

In one possible design, the second event message includes a target event queue identification, which is a queue identification of an event queue corresponding to the second processing unit.

Through the design, the target message queue identifier can be added into the message according to the context information, so that the routing transmission of the message is realized based on the event queue, and compared with a traditional bus, the data communication among the dynamically scheduled computing resources can be realized, the forwarding efficiency is higher, and the resource utilization rate of the data processing process is further improved.

In one possible design, the routing information further includes a target routing domain, where the target routing domain is used to indicate a target server, and the target server is different from an origin server, and the origin server is a server where the first processing unit is located.

With the above design, the routing information further includes a target routing domain for indicating the target server, so that the target server may be different from the source server. The method can form a communication link in a cross-route domain mode, can construct a cross-route domain communication link network, and has better scheduling elasticity and expandability.

In one possible design, the second processing unit is a second accelerator; the first processing unit sends the second event message to a second processing unit according to the context information, and the method comprises the following steps:

The first processing unit sends the second event message to an event queue corresponding to an accelerator pool according to the routing information, wherein the accelerator pool comprises a plurality of accelerators, and the types of the accelerators are the same; determining the second accelerator from the plurality of accelerators according to the states of the plurality of accelerators;

and sending the second event message to the second accelerator.

Through the design, the accelerator sends the event message through the accelerator pool, the event distributor of the accelerator pool and the event queue of the accelerator pool, and the resource scheduling mechanism for sharing the accelerator is provided, so that the processing performance of the system can be improved.

In one possible design, before the first processing unit receives the first event message, the method further includes:

Receiving a processing request from an application program;

Determining computing resources according to the processing request of the application program, wherein the computing resources comprise the first processing unit and the second processing unit;

and generating the context information according to the processing request of the application program.

By the design, the dynamic allocation of the computing resources is performed based on the event triggering, the context is generated (namely the session is created), a real-time dynamic scheduling resource mechanism based on the event triggering is provided, and further the data communication among the dynamically scheduled computing resources can be realized, and the resource utilization rate is higher.

In one possible design, the first processing unit or the second processing unit is selected from a plurality of processing units based on state information of the plurality of processing units when a processing request of the application is received, the state information of the processing units including network topology performance.

Through the design, when computing resource allocation, the hardware state information of the hardware (threads, accelerators and the like) is acquired, and the optimal hardware is allocated according to the current hardware state, so that the allocated computing resource is more reasonable, wherein the hardware state information comprises network topology performance, and the optimal hardware can be the hardware with the optimal allocation current performance or the hardware with the optimal allocation performance. The method can trigger real-time dynamic scheduling of resources based on the event corresponding to the received processing request, and avoid resource waste, thereby further improving the system performance.

In one possible design, after receiving the processing request from the application program, the method further includes:

Determining at least two tasks included in the processing request;

Creating at least two threads corresponding to the at least two tasks;

And loading the at least two threads to at least two engines for running, wherein different threads run on different engines.

Through the design, based on event triggering, task division is performed, and threads corresponding to different tasks are distributed to different engines to run, so that the system performance can be improved, and the utilization rate of computing resources can be improved.

In one possible design, the determining at least two tasks included in the processing request includes:

Acquiring the semantics of the processing request, wherein the semantics of the processing request comprise at least two task semantics;

and determining a corresponding task according to each task semantic in the at least two task semantics.

Through the design, a plurality of tasks belonging to the processing request can be constructed based on the semantics of the processing request, different tasks have different task semantics, the calculation task can be dynamically created according to the real-time event, the complex calculation task is efficiently split into a plurality of tasks, the implementation is simple and easy, and the resource waste is reduced.

In one possible design, the method further comprises:

Releasing a first thread, wherein the first thread is one of the at least two threads;

And if the first thread is released, the wireless program on the engine where the first thread is released already runs, and the engine where the first thread is released is closed.

Through the design, the method can stop threads or close corresponding hardware according to the needs, can realize near zero standby power consumption, and ensures the low power consumption of the message processing method.

In one possible design, the processing request is used for requesting to acquire target data, where the target data is stored in a memory of the second server; the computing resource for executing the processing request further includes a third processing unit and a fourth processing unit; the at least two engines include the first processing unit, the second processing unit, and the third processing unit; the fourth processing unit is an accelerator; the first event message and the second event message comprise identifications of the target data, the first processing unit and the second processing unit are located in a first server, and the third processing unit and the fourth processing unit are located in a second server; the context also includes routing information of the second processing unit to the third processing unit, the third processing unit to the fourth processing unit;

After the first processing unit sends the second event message to the second processing unit according to the context, the method further comprises:

the second processing unit encapsulates the second event message based on the second event message to generate a third event message;

The second processing unit sends the third event message to the third processing unit located at the second server according to the context;

The third processing unit decapsulates the third event message based on the third event message to obtain a fourth event message, and sends the fourth event message to the fourth processing unit according to the context;

The fourth processing unit obtains the identification of the target data from the received fourth event message, obtains the target data from the memory of the second server according to the identification of the target data, and obtains the fifth event message according to the target data; the fifth event message is used to send the target data to the first server.

Through the design, the method for acquiring the target data stored in the shared memory is provided, the corresponding memory address is acquired through the identification of the target data, and the target data is acquired from the shared memory according to the memory address.

In one possible design, the context information further includes operational configuration information;

The first processing unit processes the first event message to obtain a second event message, including:

and the first processing unit processes the first event message according to the operation configuration information to obtain a second event message.

Through the design, the context also comprises operation configuration information (such as bit width, point number and the like), so that the processing unit can process according to the operation configuration information, and a corresponding processing mechanism can be automatically triggered after the event message is received, thereby improving the high-energy efficiency advantage of event driving and improving the resource utilization rate.

In one possible design, the first event message and the second event message include an identification of the context information, where the identification of the context information is used to obtain the context information.

Through the design, the event message comprises the identifier (CID) of the context information, and the identifier of the context information is used for indicating the context information of the application program, so that the processing unit can quickly and efficiently acquire the corresponding operation configuration information or the routing information, and the resource utilization rate of the data processing process is improved.

In one possible design, the second event message includes:

A message attribute information field including event message routing information, where the event message routing information includes a target event queue identifier, where the target event queue identifier is a queue identifier of an event queue corresponding to the second processing unit;

A message length field including total length information of the second event message;

A data field comprising a payload of the second event message.

In one possible design, the data field includes a first event information field including at least one of:

A routing scope, an identification of the context information, a source message queue identification, or a custom attribute, the routing scope including at least one routing domain.

In one possible design, the data field includes a second event information field including custom information for the application layer.

By the above design, a frame structure of the event message is defined, and the frame structure may sequentially include, starting from the outermost layer: the application further enables the scheme provided by the application to be flexibly applied to different application scenes, improves the adaptability in data processing and improves the data forwarding efficiency.

In one possible design, the method further comprises:

acquiring resource configuration information of the application program, wherein the resource configuration information comprises one or more of the number of engines and the type of accelerators or the number of accelerators;

Determining an engine used by the application program according to the resource configuration information, wherein the engine used by the application program comprises the first engine and/or the second engine;

and determining an accelerator used by the application program according to the resource configuration information, wherein the accelerator used by the application program comprises the first accelerator and/or the second accelerator.

Through the design, the resource configuration information of the application program can be obtained according to the received processing request, the accelerators and engines used by the application program are determined, the resource configuration information comprises but is not limited to the number of engines, the types of the accelerators and the number of accelerators, and the engines and the accelerators used by the application program can be selected according to the resource configuration information and the resource state of the candidate computing resource, so that real-time dynamic allocation of the resource state is adapted immediately, the performance requirement is guaranteed, and low power consumption is guaranteed.

In one possible design, the first processing unit is a first engine; the second processing unit is a second accelerator; the first unit sends the second event message to an event queue corresponding to the second processing unit, including:

the first engine executes a first recompilation instruction of the second accelerator to send the second event message to an event queue corresponding to the second accelerator; the first recompilation instruction is obtained by loading the second accelerator, distributing the identifier of the event queue corresponding to the second accelerator, and modifying the machine code of the second accelerator according to the identifier of the event queue corresponding to the second accelerator; and when the first recompilation instruction is executed, the first engine sends the second event message to an event queue corresponding to the second accelerator.

By the design, the instruction set of the accelerator is modified according to the identification of the event queue of the accelerator, and when the instruction in the modified instruction set is executed by the thread running on the engine, the event queue of the engine sends an event message, for example, the identification of the second event queue can be allocated to the second accelerator in response to the second accelerator being loaded; according to the identification of the second event queue, the instruction set of the second accelerator is modified, when the modified instruction set is executed by the first thread on the first engine, the first thread sends a second event message to the second event queue, and the method replaces the instruction of the accelerator by the identification of the event queue, so that when different accelerators are continuously expanded, the microengines can be reused without modification.

In a second aspect, an embodiment of the present application further provides a message processing apparatus, including:

The first operation module is used for: processing a first event message through a first processing unit to obtain a second event message, wherein the first event message is received by the first processing unit or is generated by the first processing unit based on a processing request of an application program;

Sending, by the first processing unit, the second event message to a second processing unit according to context information, the context information including routing information of the first processing unit to the second processing unit, the context information being generated based on a processing request of the application;

In a third aspect, the present application provides an embodiment providing a message processing apparatus, comprising a processor and a memory,

The memory is used for storing executable programs;

The processor is configured to execute a computer executable program in a memory, such that the method according to any of the first aspects is performed.

In a fourth aspect, the present application provides an embodiment providing a computer readable storage medium storing a computer executable program which, when invoked by a computer, causes the computer to perform the method according to any one of the first aspects.

In a fifth aspect, an embodiment of the present application further provides a chip, including: logic circuitry and an input-output interface for receiving code instructions or information, the logic circuitry for executing the code instructions or in accordance with the information to perform the method of any of the first aspects.

In a sixth aspect, an embodiment of the present application further provides a data processing system, including a message processing apparatus according to the second aspect.

In a seventh aspect, embodiments of the present application also provide a computer program product comprising computer instructions which, when executed by a computing device, may perform a method as claimed in any of the first aspects.

The technical effects that any one of the second aspect and any one of the seventh aspect may achieve may be referred to the description of the technical effects that the corresponding implementation of the first aspect and any one of the first aspect may achieve, and the detailed description is not repeated here.

Drawings

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2 is a flow chart of a micro-engine processing an instruction pipeline according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a method for implementing semantic driven data sharing according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a gating pattern of an accelerator pool according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a multicast mode of an accelerator pool according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a multi-routing domain high-elasticity network according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an asynchronous interface design of a high-resilience network according to an embodiment of the present application;

fig. 8 is a schematic diagram of a basic structure of a frame transmitted by a high-elasticity network according to an embodiment of the present application;

fig. 9 is a schematic diagram of a structure of a subframe of a high-elasticity network transmission according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a high dynamic operating system according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a design scheme of intelligent edge computing according to an embodiment of the present application;

fig. 12 is a flow chart of a message processing method according to an embodiment of the present application;

FIG. 13 is a schematic diagram of computing resource call for edge intelligent computing according to an embodiment of the present application;

Fig. 14 is a schematic diagram of a design of a video call according to an embodiment of the present application;

FIG. 15 is a schematic diagram illustrating a computing resource call for a video call according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a semantic definition sharing data mechanism of a supercomputer provided in embodiments of the present application;

FIG. 17 is a schematic diagram of a design of a supercomputer server according to embodiments of the present application;

FIG. 18 is a schematic diagram of a computing resource call of a supercomputer provided in embodiments of the present application;

Fig. 19 is a schematic structural diagram of a message processing apparatus according to an embodiment of the present application;

Fig. 20 is a schematic structural diagram of a message processing apparatus according to an embodiment of the present application;

fig. 21 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. At least one term "or the like, as used herein, refers to any combination of these terms, including any combination of single term(s) or plural term(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

It should be noted that in embodiments of the present application, like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art. Some terms and terminology related to the embodiments of the present application are explained below.

(1) Application program: an application refers to a computer program that, when operating in a user mode, can interact with a user to perform a particular task or tasks, and has a visual user interface.

(2) Heterogeneous calculation: heterogeneous computation is a new computation mode which fuses the general computation power of a CPU and the directional acceleration computation power of a professional chip together, and finally, the unification of performance, power consumption and flexibility is achieved.

(3) An accelerator: heterogeneous computing uses different types of processors to handle different types of computing tasks. Common computing units include CPUs, ASICs (Application-SPECIFIC INTEGRATED circuits), GPUs (Graphics Processing Unit, image processing units/accelerators), NPUs (Neural Processing Unit, neural network processing units/accelerators), FPGAs (Field Programmable GATE ARRAYS, programmable logic arrays), and the like. The accelerator refers to a specialized chip such as ASIC, GPU, NPU, FPGA described above. In the heterogeneous computing architecture, a CPU is responsible for scheduling and serial tasks with complex logic, and an accelerator is responsible for tasks with high parallelism, so that computing acceleration is realized. For example, in an embodiment of the present application, fp32 accelerator is an accelerator responsible for fp32 floating point operations.

(4) Events: an event is an operation that can be recognized by a control, such as pressing a ok button, selecting a radio button or a check box. Each control has events that can be recognized by itself, such as loading of frames, single click, double click, etc., text change events for edit boxes (text boxes), etc.

(5) An engine: the engine referred to in the embodiments of the present application refers to a fusion computing microengine (Convergent Process Engine, XPU), which may also be referred to as a microengine. A microengine is a processing unit that is used to process a pipeline of instructions. Wherein the pipeline is dynamically expandable. The microengines may support CPU, GPU, NPU or other computing tasks, processes, or threads required for heterogeneous computing.

(6) Thread: a thread is the smallest unit that an operating system can perform operational scheduling. It is included in the process and is the actual unit of operation in the process. One thread refers to a single sequential control flow in a process, and multiple threads can be concurrent in a process, each thread executing different tasks in parallel. Multiple threads in the same process will share all system resources in the process, such as virtual address space, file descriptors, signal processing, and so forth. But a plurality of threads in the same process have respective call stacks, own register environments and own threads are stored locally.

(7) Event queue: in the embodiment of the application, the event queue is a container for storing the message in the transmission process of the message. The event queue may be viewed as a linked list of event messages.

(8) Network topology performance: network topology performance refers to the link relationships, throughput, available routes, available bandwidth, latency, etc. of the network topology. Network topology refers to the physical layout of the various hardware or devices interconnected by transmission media, particularly where the hardware is distributed and how the cables pass through them.

(9) Application layer: the application layer mainly provides an application interface for the system.

(10) Network layer: the network layer is mainly responsible for defining a logic address and realizing the forwarding process of data from a source to a destination.

Based on the description in the background art, the clock frequency of the high-performance processor is not changed greatly, and the performance is improved slowly. In terms of power consumption, power consumption per square centimeter changes from tens of milliwatts to around one watt, and reaches a limit, limiting improvement in performance.

For easy understanding, technical features related to the embodiments of the present application will be described first.

Referring to FIG. 1, a data processing system 100 has five core network elements: fusion compute microengines (Convergent Process Engine, XPU), semantic driven data sharing (Semantic-DRIVEN DATA SHARING, SDS), semantic driven accelerator pools (Semantic-Driven Accelerator, SDA), highly elastic routing networks (Ultra Elastic Network over Chip, UEN), and highly dynamic operating systems (High-dynamic Operating System, HOS). The high-elasticity routing network is used for realizing high-speed interconnection of the micro engine, the accelerator and the event queue, and supporting the horizontal expansion of the performance and the capacity of the system; the high dynamic operating system is used for realizing flexible scheduling of resources and allocation of computing tasks. In the following embodiments of the present application, the converged computing microengines may also be referred to simply as microengines, and the microengines and accelerators may be referred to as processing units. Typically, a processing unit may be a microengine or an accelerator, unless otherwise specified.

The following briefly describes the structure of data processing system 100 of FIG. 1 to facilitate a more clear understanding of embodiments of the present application. The technical features of each core network element in fig. 1 are described below.

(1) Fusion computing micro-engine (XPU).

The fused compute microengine is a processing unit that is used to process a pipeline of instructions. Wherein the pipeline is dynamically expandable. The microengines may support computing tasks, processes, or threads required for heterogeneous computations, such as CPUs, GPUs (Graphics Processing Unit, image processing units/accelerators), NPUs (Neural Processing Unit, neural network processing units/accelerators), and the like.

For application, the micro-engine in the embodiment of the application is similar to a hardened container or a thread processor, and can dynamically allocate the corresponding micro-engine according to the load requirements of the computing tasks of different service scenes, thereby ensuring the computing power and the optimized time delay required by the service.

It should be noted that the microengines in embodiments of the present application replace different instructions by event queue IDs (Identity Document, identification numbers) when processing the pipeline of instructions.

The micro-engine processes the pipeline of instructions, and the specific process may be: after adding a new accelerator, the system allocates a corresponding event queue ID number, wherein if the program corresponding to the new accelerator is installed in the system for the first time, the program is recompiled once through a just-in-time compiler, and the machine code of the program is replaced by an instruction in a general format for sending a message to the event queue. When an accelerator program is loaded into the micro-engine, the micro-engine responds to the accelerator instruction corresponding to the accelerator program and sends the data to be processed to the corresponding event queue.

Taking fp32 accelerators as an example, as shown in FIG. 2, when an fp32 accelerator is added, the system assigns an event queue number EQ-ID1 to the fp32 accelerator. Assuming that the program corresponding to the fp32 accelerator is installed in the data processing system for the first time, the program corresponding to the fp32 accelerator is recompiled once by the just-in-time compiler, and the machine code "fp32rx, ax, bx" of fp32 is replaced with an instruction in a common format for messaging to an event queue as shown in table 1:

TABLE 1

Wherein, "Insteq EQ-ID1, v" contained in the contents shown in Table 1 means that a message containing data "v" is transmitted to the event queue having the event queue number EQ-ID 1.

After the fp32 program corresponding to the fp32 accelerator shown in FIG. 2 is loaded into micro-engine XPU-ID1, the micro-engine responds to the accelerator instruction corresponding to the fp32 program to send the data to be processed to event queue EQ-ID1, and then waits for the return result of event queue EQ-ID1 to be written back into the register or memory, thus completing one fp32 floating point operation.

(2) Semantically driven data sharing (SDS).

The semantic driven data sharing is used for continuously transmitting data and context information through the event queue, so that data sharing across computing resources in the data processing system is realized. The computing resources may be fusion computing microengines, accelerators, etc.

In the embodiment of the application, an asynchronous circuit or an asynchronous NOC (network On Chip) is adopted to realize the transceiver of the event message, and after the complete event message is received, an event trigger corresponding processing mechanism such as FFT (fast Fourier transform ), floating point calculation and the like is automatically adopted.

It should be noted that, in the embodiment of the present application, the context information may also be referred to as a context; correspondingly, the identification of the context information may also be referred to as the identification of the context, or simply as the context identification.

FIG. 3 illustrates a schematic diagram of implementing semantic driven data sharing provided by one embodiment of the present application. With reference to FIG. 3, in order to enable data sharing across computing resources within a data processing system, the context of the data sharing is defined by the application layer during software development. After the data session is created, the first computing resource builds an event message block according to the semantic configuration instruction, and sends an event message to an event queue of a next second computing resource corresponding to the first computing resource through the event queue of the first computing resource, so that when the event queue of the second computing resource receives the event message, the second computing resource is automatically triggered to process the event message.

In the implementation, if there is a next computing resource corresponding to the second computing resource, after the computation is completed, the second computing resource directly constructs an event message from the processing result and sends the event message to the next computing resource corresponding to the second computing resource through a sending queue.

Taking the speech FFT transformation as an example, as shown in fig. 3, a data session from ADC (Analog-to-digital converter, analog-to-digital conversion), FFT accelerator to framer Framer is created by applying a scheduler, thereby obtaining a data sharing context; the data session may be decomposed by a compiler or acceleration library to obtain semantic configuration instructions for the respective computing resources that are contextually relevant, such as the semantic configuration instructions for the ADC, FFT accelerator, and framer of fig. 3.

After the data session is created, the ADC builds event information according to the configuration information, and then sends the event information to a designated FFT queue through the event queue of the ADC; when the event queue of the FFT accelerator receives the event message sent by the event queue of the ADC, the FFT accelerator is automatically triggered to calculate a data block in the received event message, and after the calculation is completed, an event message block is directly constructed by a calculation result and the event message constructed according to the calculation result is sent to a framing device through a sending queue; when the event queue of the framing device receives the event message constructed according to the calculation result, the framing device is automatically triggered to carry out corresponding protocol analysis on the data block of the event message constructed according to the calculation result.

If the FFT accelerator needs to perform double-precision computation, the event message request may also be sent to the FP32 accelerator to perform double-precision computation according to the same mechanism as described above.

As an example, as shown in fig. 3, assuming that the FFT accelerator needs to perform double-precision computation, an event message block may be constructed from a data packet that needs to perform double-precision computation, and then an event message may be sent to an event queue of the FP32 accelerator through its own event queue; when the event queue of the FP32 accelerator receives the event message sent by the event queue of the FFT accelerator, the FP32 accelerator is automatically triggered to calculate the data block in the received event message, after the calculation is completed, an event message block is directly constructed by the double-precision calculation result, and the event message constructed according to the double-precision calculation result is sent to the FFT accelerator through the self sending queue; when the event queue of the FFT accelerator receives the event message sent by the event queue of the FP32 accelerator, the event message block can be constructed according to the processing result after further processing is carried out on the received event message, and the event message constructed by the FFT accelerator according to the calculation result is sent to the framing device through the sending queue; when the event queue of the framing device receives the event message constructed by the FFT accelerator according to the calculation result, the framing device is automatically triggered to carry out corresponding protocol analysis on the data block of the event message constructed by the FFT accelerator according to the calculation result.

Similar to the accelerator cascade connection shown in fig. 3, in some embodiments of the present application, a thread may send an event message to one accelerator a for processing, where the accelerator a generates a new event message according to a processing result, and sends the new event message to another accelerator B for processing, and after the accelerator B finishes processing, the accelerator B transfers the event message to a next unit of the accelerator B.

In some alternative embodiments, a data processing system includes a first processing unit that is a first accelerator and a second processing unit that is a second accelerator; the process of the data processing system for processing the message comprises the following steps: the first accelerator receives the first event message, the first accelerator processes the first event message to obtain the second event message, and the first accelerator sends the second event message to the second accelerator according to the context information, wherein the context information comprises the route information from the first accelerator to the second accelerator, and the context information is generated based on the processing request of the application program.

Illustratively, taking the first processing unit as the first sub-accelerator task1_a and the second processing unit as the second sub-accelerator task2_b as an example, in one embodiment, a data session from the first thread, the first sub-accelerator task1_a, the second sub-accelerator task2_b, and the second thread to the second accelerator may also be created by the application scheduler, thereby obtaining a data sharing context CID0 (which includes routing information of the event message in this context). After creating the data session, the first sub-accelerator Task1_a may obtain an event message mes.a_1 (referred to herein as a first event message) sent by the first thread, process the event message mes.a_1 to obtain an event message mes.a_2 (referred to herein as a second event message for distinction from the first event message), and send the event message mes.a_2 to the second sub-accelerator Task2_b according to the routing information in the context (e.g., set the destination event queue identifier of the event message mes.a_2 to the identifier of the event queue corresponding to the second sub-accelerator Task2_b according to the routing information in the context). Thereafter, similar to the previous process, the second sub-accelerator task2_b may receive the event message mes.a_2, process the event message mes.a_2 to obtain the event message mes.a_3, and send the event message mes.a_3 to a subsequent second thread according to the routing information in the context.

In one implementation, if an indication is received from the application layer to delete the data session, the data session is deleted.

In one implementation, if the application layer does not delete the data session, the data session persists.

Illustratively, in FIG. 3, if the system configuration requires tearing down the session, the software needs to be instructed to actively delete the data session and recover the corresponding resources.

(3) Semantic driven accelerator pool (SDA).

The semantic driven acceleration pool provides a resource scheduling mechanism for accelerators. The fusion calculation micro-engine or the accelerator communicates outwards through the event queue so as to realize acceleration processing of the specific function request.

For example, the FP32 accelerator corresponds to a specific function of "floating point calculation", and communicates to the outside through an event queue. The system may communicate with the FP32 accelerator via the FP32 accelerator's event queue, requesting acceleration processing of floating point calculations corresponding to the FP32 accelerator in fig. 4.

The working principle of the resource scheduling mechanism of the semantic driven accelerator pool is as follows:

A set of accelerators is determined to form a shared accelerator pool according to an SOC (System on Chip) Chip plan, and the shared accelerator pool is provided with a matched event distributor and an accelerator pool event queue. In the following embodiments of the present application, the accelerator pool event queue may be referred to simply as a pool queue.

There are two types of event messages from pool queues to accelerators, one is a gating mode to implement one more for the accelerator, see fig. 4; the other is a multicast mode to achieve one-in-multiple-out at the time of accelerator selection, see fig. 5.

The following describes the calling modes of the two accelerators respectively:

In the gating mode, when the system requests acceleration, event messages can be directly sent to a pool queue to request, and an accelerator is not required to be specified; when the pool queue has event information, the event distributor is automatically triggered to select one accelerator in the shared accelerator pool to process the event information through RR arbitration according to the idle state of the accelerators, then a gating circuit is triggered to open the circuit connection between the pool queue and the accelerators, and meanwhile, the event information is transmitted from the pool queue to the accelerators when the event information is sent to the pool queue and the accelerators.

In the multicast mode, when the system requests a plurality of accelerators of the same type at the same time, the request can be directly sent to the pool queue without specific specification of the accelerator; when the pool queue has event information, the event distributor is automatically triggered according to the configuration information of the multicast acceleration request and detects the corresponding accelerator in an idle state, a plurality of accelerators are simultaneously gated, the circuit connection between the pool queue and the accelerator is opened, the message of the read event is sent to the pool queue and the accelerator, and the event information is simultaneously transmitted to the accelerator from the pool queue.

In some alternative embodiments, the second processing unit is a second accelerator; the first processing unit sends a second event message to the second processing unit according to the context information, and the method comprises the following steps: the first processing unit sends the second event message to an event queue corresponding to an accelerator pool according to the routing information, wherein the accelerator pool comprises a plurality of accelerators, and the types of the accelerators are the same; determining a second accelerator from the plurality of accelerators based on the states of the plurality of accelerators; the second event message is sent to the second accelerator.

Specifically, the data processing system comprises a first processing unit and a second processing unit, wherein the second processing unit is a second accelerator. The first processing unit of the data processing system sends the second event message to the second processing unit according to the context information, specifically, the method is realized through the following steps: the first processing unit sends a second event message to an event queue corresponding to an accelerator pool according to the routing information included in the context information, wherein the accelerator pool comprises a plurality of accelerators, the plurality of accelerators comprise second accelerators, and the types of the plurality of accelerators are the same; the event distributor selects a second accelerator from the accelerator pool according to the states of the accelerators in the accelerator pool; the event distributor sends a second event message in the event queue corresponding to the accelerator pool to a second accelerator.

Illustratively, taking the second processing unit as FP32 accelerator 1 in fig. 4 as an example, the first processing unit of the data processing system may send the event message info.i to an event queue corresponding to the FP32 pool according to the routing information included in the context, where the FP32 pool includes at least one accelerator, where the FP32 accelerator 1 is included, and where the at least one accelerator is of the same type; the event distributor corresponding to the FP32 pool selects an FP32 accelerator 1 from the FP32 pool according to the state of the accelerators in the FP32 pool; the event distributor sends the event message info.i in the event queue corresponding to the FP32 pool to the FP32 accelerator 1.

In some embodiments of the application, context-based multicast event message processing may be performed. Specifically, the context may set a multicast mode, and the thread or the accelerator may start the multicast function according to the multicast mode set by the context through an event queue of the thread or the accelerator to copy an event message that needs to be processed downstream, and send a plurality of next-stage processing units, where the units may be threads or accelerators, or may be an application/CPU.

(4) High resilience network (UEN).

The highly elastic network provides an interconnection mechanism that is elastically schedulable. The high-elasticity network can realize a common physical connection infrastructure of a plurality of fusion calculation micro-engines and a plurality of accelerators in a single system on a chip (SOC), is also called a single routing domain, and is also a unified bearing layer of management and control channels such as event messages, task management of the micro-engines, configuration management of the accelerators and the like; still implementing cascading and routing across inter-SOC fusion compute microengines and accelerators, also known as multi-routing domains, as in fig. 6.

The embodiment of the application provides a high-elasticity network, wherein a router and a computing resource can be directly connected, and the computing resource can be a fusion computing micro engine, an accelerator and the like; each computing resource integrates a back-to-back connection of a transceiver with a router, either with synchronous or asynchronous interface designs.

In one embodiment of the present application, when each computing resource is to integrate a back-to-back connection of a transceiver with a router, an asynchronous interface design is adopted, and referring to fig. 7, because different microengines and accelerators may operate at different dominant frequencies, the connection manner can significantly reduce blocking and timeout of the high-elasticity network in transmitting and receiving data.

In a highly elastic network, a transceiver transmits and receives data in the form of frames or messages, and the transceiver can send messages to a router or receive messages from the router. The basic structure of a frame transmitted by a high-elasticity network is shown in fig. 8.

After receiving the message, the router searches a corresponding routing table to find a corresponding outlet port according to the corresponding frame and the corresponding destination port number, and sends the message to the port; if a plurality of ports send to one port, the messages need to be sent one by adopting fair arbitration.

In an embodiment of the present application, the unexpanded frames transmitted by the high-elasticity network are referred to as "basic frames". The structure of the basic frame transmitted by the high-elasticity network supports dynamic expansion according to application scenes so as to adapt to data formats with different semantics.

In one embodiment, the frames transmitted by the high resilience network are defined in an extended format of a KLV (Key-Length-Value).

Wherein,

Key field, located at the forefront in frame structure, for describing the attribute name of the field, which can be fixed length or application contract;

a Length field, followed by a Key field, for describing the Length of the field, which may be a fixed Length or an application-conventionable;

the Value field, followed by the Length field, is used to carry the data to be transmitted, and the Length is agreed by the Length field.

The following embodiments refer to the frames obtained after expansion as "subframes", and fig. 9 provides a schematic format of a subframe of a high-elasticity network according to an embodiment of the present application.

The subframes are defined hierarchically, the bottom layer is a network subframe, a system subframe is arranged on the network subframe, then an application subframe is arranged, each layer can be defined independently, but the transmission sequence is strictly as follows: the network sub-frame is first followed by the system sub-frame and then the sub-frame is applied. The network subframes and system subframes are predefined, and application subframes may be agreed upon by a developer or by the accelerator design.

In one embodiment of the application, the predefining of the system subframes is of the following type:

Key=0, representing a routing range, and the data field of the subframe is a routing field ID where the destination is located;

Key=1, representing a context session, the data field of the subframe being the data session ID to which the frame belongs;

Key=2, representing the source routing address, the data field of the subframe is the queue ID from which the frame is sent, and if the subframe is transmitted across fields, the routing range needs to be carried in the subframe;

Key=3, representing an operating system custom subframe whose data field is data transmitted by operating system services, such as: configuration data, program images, etc. In the subframe, the operating system may agree on its own "Sun Zhen", where "Sun Zhen" may also follow the KLV format, so that the network may participate in frame parsing, and forwarding efficiency is improved.

Key=4, which represents an application layer custom subframe, the data field of the subframe is data shared between applications, and in the subframe, "Sun Zhen" of the application can be agreed between the applications, wherein "Sun Zhen" of the application can also follow the KLV format so that the network can participate in frame parsing, and forwarding efficiency is improved.

(5) High dynamic operating system (HOS).

A highly dynamic operating system provides a resource scheduling and messaging mechanism. The resource scheduling and the message communication mechanism enable an application developer and a hardware developer to be better designed cooperatively, and can be decoupled mutually, so that the system can realize interoperation as long as the semanteme is agreed, and has high dynamic computing capacity of on-demand reconstruction and on-demand scheduling facing to a high dynamic environment.

FIG. 10 is a schematic diagram showing the composition of a high dynamic operating system that provides mainly three main services: a semantic driven computing service, a semantic driven data service, a semantic driven session service.

The main functions of the three main services are presented below:

1) A semantically driven computing service.

The main functions of the semantic driven computing service include: acceleration pool management, route management, just-in-time compilation, and computation management.

The acceleration pool management means that the high dynamic operating system discovers all connected accelerator pools on hardware and the network positions where the accelerator pools support semantics and are located, registers the semantics, the positions and the number of the accelerators, takes the accelerator pools as input parameters of instant compiling and dynamic routing, and exposes a semantic accelerator list to an application layer, a semantic driving session service and a semantic driving data service.

The route management means that the high dynamic operating system discovers all connected route networks and route domains on hardware, establishes a global route table of the system, comprises a route domain list, a route port list of each route domain and unit types (including accelerators, micro engines, routers and the like) connected with the ports, and takes the route list, the route port list and the unit types as input parameters of instant compiling and calculation management. Wherein each accelerator or accelerator pool has a port number of its connected router, i.e. an event queue number or a destination port number of an event message.

The just-in-time compiling means that the high dynamic operating system establishes a compiling mapping table of a semantic accelerator instruction to an event queue according to a semantic accelerator and a global routing table of accelerator management and routing management, and the format of the compiling mapping table is shown in table 2. The compiling mapping table is used as a checking list for judging whether to start just-in-time compiling when an operating system calculates and manages a loading thread or a program.

TABLE 2

Semantic accelerator instructions	Semantic accelerator/pool name	Event queue number	Data format
Fp32	Floating point computing	EQ-ID1	(ax,bx,cx)
FFT	Fourier transform	EQ-ID2	(ax[],bx[],cx[])
…	…	…	…

The computing management refers to that the high-dynamic operating system regards the micro-engine as a thread processor or a container, provides a corresponding resource application API (Application Programming Interface, application program interface) interface for an application, enables the application to dynamically create threads or tasks, exerts high-dynamic computing capacity of massive multi-thread multi-task parallel computing, and simultaneously exposes an interface API of the task created by the micro-engine to an application layer.

2) Semantically driven data services.

The main functions of the semantically driven data service include: semantic data indexing, data management, memory allocation, semantic addressing mapping.

The semantic data index is a service for creating a structured memory sharing data index provided by a high-dynamic operating system, replaces a global address table of a page plus an offset address and metadata management thereof, and externally distributes semantic information, so that the semantic data index is more suitable for massive data sharing in many-core architecture, high-performance computing, super computing and other scenes.

The data management refers to that the high dynamic operating system provides a data operation interface for performing 'adding, deleting and checking' on the created memory sharing data index, data is added to the index, and subsequent application can also perform modification operation on the data.

The memory allocation means that after the data is added, the high dynamic operating system allocates the memory corresponding to the added data locally and associates the memory with the corresponding index, and in consideration of improving the access efficiency of the memory, the application layer should make the semantic sharing data block particles as much as possible, so that the semantic data sharing advantage can be advantageously exerted.

Semantic addressing mapping refers to the conversion of external generic semantics within a system to the form of page + offset addresses to determine data stored to local memory when the high dynamic operating system accesses shared data.

3) Semantically driven session services.

The main functions of the semantic driven session service include: semantic session index, semantic acceleration library, semantic context management, session performance management.

The semantic session index refers to that a high dynamic operating system provides an interface for an application layer to create a data session and generates a corresponding index, which is also called Context ID (CID).

The semantic acceleration library is a semantic acceleration library list which can be used by the high-dynamic operating system by the operating system and is used for creating a plurality of acceleration libraries related to the context, providing an acceleration pool service which is automatically and dynamically allocated, avoiding the participation of an application in specifying specific resources, and enabling an application program to be automatically adapted to hardware of high-dynamic computing.

The semantic context management refers to that a high-dynamic operating system provides related hardware configuration templates and configuration services such as a micro engine, an accelerator, an event queue and the like related to context, so that an application layer can flexibly create a data session of complex logic, and therefore, the aim of unloading a software processing high-frequency repetitive computing task to hardware processing is achieved, and the high-energy-efficiency computing capability is achieved.

Session performance management refers to that a high dynamic operating system provides performance monitoring services for sessions created by an application layer, and also provides parameters of specified performance requirements, such as bandwidth, rate, time delay and the like, for the application layer to actively report anomalies to the application layer for subsequent optimization and adjustment processing, such as operations of triggering route reconstruction and the like, under the condition of performance degradation.

Taking data processing system 100 of FIG. 1 as an example, when data processing system 100 is first started, the highly dynamic operating system of data processing system 100 discovers the resources of the system hardware through a semantic driven computing service. For example, a highly dynamic operating system discovers resources of system hardware through semantic driven computing services, such as: accelerators, microengines, and routing networks, etc. The high dynamic operating system can establish and store a corresponding system hardware resource list according to the resources of the system hardware discovered through the semantic driving computing service, and the system hardware resource list is refreshed when the hardware is checked to be changed, otherwise, the system hardware resource list can be directly used for quick starting.

After data processing system 100 is started, the application layer first creates the memory data to be shared through the semantic-driven data service of the high-dynamic operating system of data processing system 100 and creates a local memory address list of the corresponding semantic data index and semantic addressing map.

After the shared data of the data processing system 100 is created, the application layer can distribute the micro-engine through the semantic driving computing service of the high dynamic operating system of the data processing system 100 and load the codes corresponding to the computing tasks; meanwhile, the application layer may also create a data session through a semantic driven session service of the high dynamic operating system of data processing system 100, exchanging high frequency computing tasks directly through event queues through multiple semantic accelerators and microengines.

The above-described data processing system architecture and service scenario of the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application. Those skilled in the art will recognize that, with the evolution of the data processing system architecture and the appearance of new service scenarios, the technical solution provided by the embodiments of the present application is equally applicable to similar technical problems.

The embodiment of the application provides a message processing method and a device, wherein the method comprises the following steps: the first processing unit processes the first event message to obtain a second event message; the first event message is received by the first processing unit, or the first event message is generated by the first processing unit based on a processing request of the application; the first processing unit sends the second event message to the second processing unit according to the context information, wherein the context information comprises the route information from the first processing unit to the second processing unit, and the context information is generated based on the processing request of the application program; the first processing unit is a first engine, the second processing unit is a second accelerator, or the first processing unit is a first accelerator, the second processing unit is a second engine, or the first processing unit is a first engine, the second processing unit is a second engine, or the first processing unit is a first accelerator, and the second processing unit is a second accelerator. In the method, because the transmission of the event message between different processing units is realized based on the context information, compared with the transmission scheduling of the event message by adopting a scheduling mode (such as scheduling the message by using a scheduler, etc.), the implementation mode can avoid the performance bottleneck caused by the transmission scheduling, and further can improve the processing performance of the system.

The following provides a detailed description of the solution provided by the present application in connection with specific embodiments.

The message processing method in the embodiment of the present application may be applied to the data processing system 100 shown in fig. 1.

The message processing method provided by the embodiment of the application carries out dynamic resource allocation based on the event before carrying out message processing. The dynamic resource allocation procedure of the data processing system is first described below. In the following embodiments of the present application, an engine is described as a fusion computing microengine. It should be noted that, in the embodiment of the present application, the fusion calculation microengine may also be simply referred to as a microengine.

Specifically, when an application program is started, the high-dynamic operating system receives a processing request of the application program, acquires the semantics of the processing request, and determines at least two tasks included in the processing request according to the semantics of the processing request.

In specific implementation, tasks included in the processing request correspond to task semantics one by one. The semantics of the processing request include at least two task semantics, and a corresponding one of the tasks is determined based on each of the at least two task semantics.

For example, the processing request may include at least two tasks, a first task and a second task, the first task corresponding to a first task semantic, the second task corresponding to a second task semantic, the processing request semantic including a first task semantic and a second task semantic, the first task being different from the second task semantic, the first task semantic and the second task semantic being different.

In response to a received processing request of an application program, establishing at least two tasks of the processing request which belong to the application program, the high dynamic operating system further responds to the received processing request, determines computing resources for executing the processing request according to resource configuration information of the application program, wherein the computing resources at least comprise a first computing resource, a second computing resource and a third computing resource, and generates a context of the application program, and the context at least comprises routing information from the first computing resource to the second computing resource and from the second computing resource to the third computing resource. For the assigned computing resources, the system may also open communication links for each computing resource based on the context and event queues for each computing resource. It is to be understood that the number of computing resources for executing the processing request may be 3, or may be 4 or more, and the number of computing resources that may be allocated for executing the processing request is not specifically limited in the technical solution of the present application.

In order to more clearly describe the technical solution of the embodiment of the present application, the following description will take the computing resources used by the processing request as computing resources Resource1, computing resources Resource2, computing resources Resource3, and computing resources Resource4 as examples. In order to correspond to at least two tasks of the foregoing processing request, in some embodiments, the computing resources Resource1 and computing resources Resource3 may be two different microengines, and the computing resources Resource2 and computing resources Resource4 may be two different accelerators; in other embodiments, it may be that computing Resource1, computing Resource2, and computing Resource3 are three different microengines, and computing Resource4 is an accelerator; in other embodiments, computing resources Resource1 and computing resources Resource4 may be two different microengines, and computing resources Resource2 and computing resources Resource3 may be two different accelerators.

The high dynamic operating system also creates at least two threads corresponding to the at least two tasks; the at least two threads are loaded onto the at least two engines for execution, wherein different threads run on different engines, and different threads correspond to different tasks.

Taking computing Resource1 and computing Resource3 as two different microengines, computing Resource2 and computing Resource4 may be two different accelerators for illustration. For clarity, computing resources Resource1, computing resources Resource2, computing resources Resource3, computing resources Resource4 may be denoted as micro-engine XPU_A, accelerator SDA_A, micro-engine XPU_B, accelerator SDA_B, respectively, depending on the type of Resource 1-Resource 4. The computing resources of the first task may include a micro-engine xpu_a and an accelerator sda_a, the computing resources of the second task include a micro-engine xpu_b and an accelerator sda_b, and the high dynamic operating system creates a first thread corresponding to the first task on the micro-engine xpu_a and a second thread corresponding to the second task on the micro-engine xpu_b after dynamically allocating the computing resources based on the above process. The micro-engine xpu_a is different from the micro-engine xpu_b, the accelerator sda_a is different from the accelerator sda_b, and the accelerator sda_a corresponds to the first event queue.

In the embodiment of the application, each thread, each accelerator and each application/CPU can be provided with an event queue corresponding to the thread or each accelerator forwards the event message to be processed at the downstream to the event queue of the next processing unit through the own event queue, and the unit can be a thread or an accelerator or an application/CPU.

It should be noted that, in the above embodiment, two tasks, i.e., the first task and the second task, of the processing request belonging to the application program are established in response to the received processing request, which is merely for illustrating the message processing method in the embodiment of the present application. In other embodiments, in response to a received processing request, a plurality of tasks attributed to the processing request of the application may also be established, such as: task 1, task 2, …, task N, and create threads corresponding to each task.

In addition, in the above embodiment, the computing resources used by the first task and the second task are determined according to the resource configuration information of the application program, the computing resources of the first task include the micro engine xpu_a and the accelerator sda_a, and the computing resources of the second task include the micro engine xpu_b and the accelerator sda_b, where the number of accelerators in the computing resources used by the first task and the second task is 1, which is merely for illustrating the process of determining the computing resources used by the tasks. In other embodiments, for a plurality of tasks attributed to an application that process a request, a computing resource corresponding to at least one task of the plurality of tasks includes an engine and at least one accelerator; the computing resources corresponding to other tasks of the plurality of tasks, except that they include an engine, may be the number of accelerators: 0, 1, 2 or more than 2. That is, a task attributed to an application program that processes a request may use not only one engine and one accelerator as computing resources; individual tasks may also use only one engine, without using any accelerators; an individual task may also be the use of one engine and multiple accelerators.

In the embodiment of the application, the resource configuration information is a received parameter sent by the application layer.

It should be noted that, a user may develop a data processing task software package through the application layer of the data processing system 100 provided by the embodiment of the present application to obtain an installation file of an application program for performing data processing.

One possible implementation is that the resource configuration information includes a trigger event; in the process of starting an application program, in response to a processing request of the application program, determining a task corresponding to the processing request can be realized by the following steps: in response to a processing request of an application corresponding to a trigger event, a task corresponding to the processing request is determined. The trigger event is a preset event for starting a processing request after the data processing system loads a data processing task software package of an application program.

The video call terminal is a typical scenario of edge intelligent computing, and currently, the video call terminal supports artificial intelligent computing such as face recognition, background replacement and the like, requires higher and higher computing power, and also requires low power consumption, in particular to mobile office, emergency command and other scenarios.

Fig. 11 is a schematic diagram of a design scheme of intelligent edge computing according to an embodiment of the present application. Referring to fig. 11, the video telephony terminal 1100 is extended based on existing hardware, considering maximum reuse of existing hardware. In the video call terminal 1100, the CPU can fully utilize existing hardware, such as the CPU of x86 architecture, ARM architecture, RISC-V architecture, etc., and compared with the existing hardware, the following expansion is performed:

1) A transmission mechanism supporting an event queue is extended on buses such as PCI-E (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, peripheral component interconnect standard) or AMBA (Advanced Microcontroller Bus Architecture, on-chip bus protocol) and the like to serve as a port of a high-elasticity routing network;

2) The operating system layer can increase three major services of a high dynamic operating system on the basis of Linux and open application APIs to the upper part;

3) The call software is used for supporting the capabilities of a dispatching center and the like, and can realize the deployment of threads such as audio acquisition, audio and video encoding and decoding, network session and the like to high-dynamic computing hardware;

4) And the hardware of high dynamic computing is added, and corresponding microengines, routing networks, accelerators (such as FFT conversion, video rendering, DNN networks and the like) are configured and connected with corresponding peripheral devices (such as video memory, cameras, network cards, microphones and the like).

For the video telephony terminal 1100 shown in fig. 11, the triggering event may be a click of a talk key. Before data processing, dynamic resource allocation is performed based on a trigger event of 'click to talk key'. Let the first computing resource be XPU 3 in FIG. 11, the second computing resource be signal processing accelerator 1 in FIG. 11, the third computing resource be XPU0 in FIG. 11, and the fourth computing resource be audio accelerator 1 in FIG. 11. When a trigger event of clicking a talk button occurs, an application program is started, a data processing system receives a Voice call processing request Voice01 corresponding to the click talk button, and in response to the Voice call processing request Voice01 of the application program, semantics of the Voice call processing request Voice01 are obtained, for example, the semantics of the Voice call processing request Voice01 can be a Voice conversation, the semantics of the Voice call processing request Voice01 are assumed to comprise a first task semantic audio acquisition and a second task semantic audio processing, and a high dynamic operating system determines a plurality of tasks corresponding to the Voice call processing request Voice01 according to the semantics of the Voice call processing request Voice01, the tasks at least comprise a first task and a second task, the first task is assumed to be an audio acquisition task, the second task is assumed to be an audio processing task, the audio acquisition task corresponds to the first task semantic audio acquisition, and the audio processing task corresponds to the second task semantic audio processing. The above-mentioned audio acquisition task and audio processing task are assigned to the Voice call processing request Voice01.

It will be appreciated that the embodiment of the present application does not limit the number of task semantics included in the semantics of the processing request, and when the number of task semantics included in the semantics of the processing request is N, the data processing system may determine that the processing request includes N tasks.

Further, when the audio acquisition task and the audio processing task are established, the computing resources for executing the Voice call processing request Voice01 are determined according to the received Voice call processing request Voice01 and the resource configuration information of the application program, wherein the computing resources comprise XPU 3, signal processing accelerator 1, XPU 0 and audio accelerator 1 in fig. 11, and the context of the application program is generated, and the context comprises routing information from XPU 3 to signal processing accelerator 1, signal processing accelerator 1 to XPU 0 and XPU 0 to audio accelerator 1. And opening a communication link to the allocated computing resources according to the context and event queues of the computing resources. For example, a first communication link is established between XPU 3 and the signal processing accelerator 1, and a second communication link is established between XPU 0 and the audio accelerator 1. Creating an audio acquisition thread for processing an audio acquisition task on XPU 3, and creating an audio processing thread for processing an audio processing task on XPU 0; the audio acquisition thread corresponds to an audio acquisition task, and the audio processing thread corresponds to an audio processing task.

In embodiments of the present application, an identification of a context may also be provided, the context identification being used to indicate the context of the application. The context identifier CID1 may indicate the context of the application program generated by the video telephony terminal 1100 described above, including the routing information of XPU 3 to the signal processing accelerator 1, signal processing accelerators 1 to XPU 0, XPU 0 to the audio accelerator 1.

In some embodiments of the application, the highly dynamic operating system may determine the computing resources used by the audio acquisition task and the audio processing task based on the resource configuration information of the application. For example, it may be determined that the computing resources of the audio acquisition task include XPU 3 and signal processing accelerator 1 in fig. 11, and the computing resources of the audio processing task include XPU 0 and audio accelerator 1 in fig. 11.

In some embodiments of the application, the first processing unit or the second processing unit is selected from the plurality of processing units based on state information of the plurality of processing units at the time of receiving a processing request of the application, the state information of the processing units including network topology performance.

In particular implementations, computing resources for executing the processing request are determined, and the computing resources are allocated to the processing request based on hardware state information, including network topology performance, at the time the processing request is received. The computing resources are configured for the first task and the second task, and the real-time state of the hardware (threads, accelerators and the like) can be considered, so that the optimal hardware can be allocated for the first task and the second task on the premise of meeting the requirements of the first task and the second task. When the operating system is started, a hardware state table is established according to all hardware states, then the hardware state table is automatically updated every time the state of hardware changes, and then parameters in the hardware state table are referred when computing resources are allocated for the first task and the second task. In the embodiment of the application, the parameters of the considered hardware state comprise the network topology performance besides the utilization rate of the resources. The network topology performance includes, in particular, the link relation, throughput, available routes, available bandwidth, latency, etc. of the network topology.

As an example, the computing resources are allocated to the audio acquisition task and the audio processing task, which may be based on hardware state information when the voice call processing request is received; wherein the hardware state information includes network topology performance.

The above-mentioned hardware with optimal allocation may be the hardware with optimal allocation performance at present or the hardware with the optimal allocation performance, so as to avoid resource waste. In addition, the hardware state information may be obtained by creating a list of hardware states and refreshing them in real time, or may be obtained when configuring computing resources.

In another implementation, the process of determining the computing resources corresponding to the audio acquisition task and the audio processing task may specifically be that, when a trigger event of "clicking the talk button" occurs, a Voice call processing request Voice01 is started. Responding to a Voice call processing request Voice01 of an application program, generating an audio acquisition task and an audio processing task corresponding to the processing request, creating an audio acquisition thread for processing the audio acquisition task on the XPU 3, and determining computing resources corresponding to the audio acquisition task and the audio processing task after creating the audio processing thread for processing the audio processing task on the XPU 0. The computing resources corresponding to the audio acquisition task comprise XPU 3 and a signal processing accelerator 1, and the computing resources corresponding to the audio processing task comprise XPU 0 and the audio accelerator 1.

It should be noted that, in the embodiment of the present application, the computing resource corresponding to a task may include an engine and an accelerator, or may include an engine and a plurality of accelerators; some of the plurality of tasks may also include only one engine.

One possible implementation manner is that when the application program is started, the method further comprises the following steps:

and step A1, responding to the starting of the application program, and acquiring the resource configuration information of the application program.

Wherein the resource configuration information includes the number of engines, and the accelerator type and the number of accelerators.

Illustratively, assuming 10 signal processing accelerators are contained in accelerator Pool1 and 10 audio accelerators are contained in accelerator Pool2, the total number of microengines is 20. In response to the application program being started, in order to construct the video call terminal 1100, the acquired resource configuration information of the application program includes: the engine is a micro engine, the number of the micro engines is 2, the accelerator types are a signal processing accelerator and an audio accelerator, the number of the accelerators corresponding to the accelerator type signal processing accelerator is 1, and the number of the accelerators corresponding to the audio accelerator is 1.

And step A2, selecting an engine used by the application program according to the resource configuration information and the load of the candidate engine.

The selected engines comprise a first engine and/or a second engine.

Illustratively, 2 microengines are selected according to the number of microengines "2" and the load of the candidate engines, wherein the 2 microengines comprise a microengine XPU 3 and a microengine XPU 0, and the microengine XPU 3 is different from the microengine XPU 0.

In specific implementation, selecting the engines used by the application program, namely selecting a specified number of micro-engines from the candidate engines according to the order of the load rate from low to high; a specified number of microengines meeting the load requirement may also be selected from the candidate engines based on the load requirement, where the load requirement may be derived from the resource configuration information.

And step A3, selecting an accelerator used by the application program according to the resource configuration information, wherein the selected accelerator comprises a first accelerator and/or a second accelerator.

Illustratively, from the accelerator types "signal processing accelerator" and "audio accelerator", it may be determined that: the accelerator Pool corresponding to the "signal processing accelerator" is accelerator Pool1, and the accelerator Pool corresponding to the "audio accelerator" is accelerator Pool2. The accelerator used by the application program selected from the accelerator Pool1 comprises a signal processing accelerator 1, and the accelerator used by the application program selected from the accelerator Pool for Pool2 comprises an audio accelerator 1, wherein the signal processing accelerator 1 and the audio accelerator 1 are different.

A possible implementation is to establish a first communication link between the XPU 3 and the signal processing accelerator 1, in particular between the XPU 3 and the event queue 4, the event queue 4 corresponding to the signal processing accelerator 1. Thus, running the audio collection thread on XPU 3 may send event message Mes.1 to event queue 4 and signal processing accelerator 1 may obtain event message Mes.1 from event queue 4. Likewise, a second communication link is established between the XPU 0 and the audio accelerator 1, and may specifically be established between the XPU 0 and the event queue 5, the event queue 5 corresponding to the audio accelerator 1. Thus, the audio processing thread running on XPU 0 may send event message Mes.3 to event queue 5 and the audio accelerator 1 may retrieve event message Mes.3 from event queue 5.

In an alternative embodiment, when the audio acquisition thread can send the event message mes.1 to the event queue 4, in particular, the audio acquisition thread executes the recompilation instructions of the signal processing accelerator 1 to send the event message mes.1 to the event queue 4. The recompilation instruction of the signal processing accelerator 1 is obtained by loading the signal processing accelerator 1, distributing the identification of the event queue 4 to the signal processing accelerator 1, and modifying the machine code of the signal processing accelerator 1 according to the identification of the event queue 4; when the first recompilation instruction is executed, the audio collection thread sends an event message to the event queue 4.

In order to more efficiently deliver event messages, the present application defines a new event message information format, namely system information transmitted over a highly resilient network via the event queue of fig. 1.

In an alternative embodiment, the event message of the data processing system is in the format of a subframe of the highly elastic network shown in fig. 9, taking event message mes.1 as an example only, event message mes.1 comprising: a network layer message attribute information field for carrying event message routing information, the event message routing information including a target event queue identification, for example, the target event queue identification may be an identification of the event queue 4 of the signal processing accelerator 1; a network layer message length field for carrying the total length information of the event message mes.1; and the network layer data field is used for bearing the load of the event message mes.1.

One possible implementation manner is that the network layer data domain includes an operating system layer event information domain, and the operating system layer event information domain includes at least one of the following: routing scope, identification of context, source message queue identification, or custom attributes, the routing scope including at least one routing domain.

Illustratively, the predefining of the system subframes may take the following types:

Key=3, representing an operating system custom subframe whose data field is data transmitted by operating system services, such as: configuration data, program images, etc.

One possible implementation manner is that the network layer data domain includes an application layer event information domain, and the application layer event information domain includes custom information of an application layer.

In specific implementation, in the system subframe, the operating system may agree on its own "Sun Zhen", where "Sun Zhen" may also be a format conforming to KLV, so that the network may participate in frame parsing, thereby improving forwarding efficiency.

Illustratively, the predefining of the system subframe may further include the following types:

Key=4, representing an application layer custom subframe, the data field of the subframe is data shared between applications, and in this subframe, the applications may agree with their own "Sun Zhen", where "Sun Zhen" of the applications may also follow the KLV format.

The relationship of the application layer event information field, the operating system layer event information field, and the network layer data field can be seen in fig. 9.

The embodiment of the application provides a message processing method, which is used for processing an event message after dynamic resource allocation is carried out based on the event.

In some alternative embodiments, the process of processing a message in conjunction with the data processing system provided in the embodiments of the present application, such as the video call terminal 1100 shown in fig. 11, may include the following steps, as shown in fig. 12:

in step S1201, the first processing unit receives a first event message.

The first processing unit may be a first micro-engine or a first accelerator.

Illustratively, in the video telephony terminal 1100 shown in fig. 11, the first processing unit may refer to the signal processing accelerator 1, and may also refer to the micro engine XPU 0. The first processing unit is taken as an example of the signal processing accelerator 1 for explanation. The video telephony terminal 1100 may transmit event messages between the signal processing accelerator 1 and the XPU 0. In the message process of the video call terminal 1100, an event message is transmitted between the signal processing accelerator 1 and the XPU 0, and first, the signal processing accelerator 1 acquires the event message mes.1.

In further embodiments, the first processing unit is a first microengine and the first event message may be generated by the first processing unit based on a processing request of the application.

In step S1202, the first processing unit processes the first event message to obtain a second event message.

Illustratively, the signal processing accelerator 1 processes the event message mes.1 to obtain an event message mes.2.

One possible implementation is that the context further includes operation configuration information; the first processing unit processes the first event message to obtain a second event message, specifically: the method comprises the steps that a first processing unit obtains first operation configuration information corresponding to a context; the first processing unit processes the first event message according to the first operation configuration information.

In particular implementations, the context also includes operational configuration information for the computing resource; computing resources include microengines and accelerators; when the application program is started, the context and the context identification are distributed according to the resource configuration information. The context identification is used to indicate a context with the application. The context identification is included in all event messages corresponding to the same processing request of the application, e.g. in the first event message and the second event message, the context identification being available for retrieving the context.

Illustratively, taking as an example a Voice call processing request Voice01 of an application corresponding to "click to talk" the context includes operation configuration information CZXX1 for a computing resource, wherein the operation configuration information CZXX is "CID1, in: ADC, via: FFT, via: SHT, out: fra, bit width, number of samples, period, data sub-block time slice, double floating point precision, … ". When an application program is started, a context corresponding to a Voice call processing request Voice01 and a context identifier CID1 are distributed according to resource configuration information, wherein the context identifier CID1 is included in an event message Mes.1, an event message Mes.2 and an event message Mes.3. The context identification CID1 may be used to acquire the operation configuration information CZXX1 corresponding to the Voice call processing request Voice 01.

The process of the signal processing accelerator 1 for processing the event message mes.1 is specifically: firstly, according to a context identifier CID1 included in an event message Mes.1, acquiring corresponding first operation configuration information CZXX _1 for the signal processing accelerator 1, wherein the first operation configuration information CZXX1_1 is set as 'FFT conversion of the received event message of the context ID'; then, the signal processing accelerator 1 processes mes.1 according to the first operation configuration information CZXX 1_1. Similarly, the processing of the event message mes.3 by the audio accelerator 1 may be that the audio accelerator 1 obtains the corresponding second operation configuration information CZXX1_2 for the audio accelerator 1 according to the context ID CID1 included in the event message mes.3, and then processes the mes.3 according to the second operation configuration information CZXX1_2 for the audio accelerator 1 assuming that the second operation configuration information CZXX1_2 is "MP 4 encoding the received event message of the context ID".

In step S1203, the first processing unit sends the second event message to the second processing unit according to the context information, where the context information includes the route information from the first processing unit to the second processing unit.

Wherein the second processing unit may be a second micro-engine or a second accelerator, the context information being generated based on processing requests of the application.

In the implementation, when the first processing unit and the second processing unit transmit the event message, the method may specifically be: the first processing unit is a first micro-engine, the second processing unit is a second accelerator, or the first processing unit is a first accelerator, the second processing unit is a second micro-engine, or the first processing unit is a first micro-engine, the second processing unit is a second micro-engine, or the first processing unit is a first accelerator, and the second processing unit is a second accelerator.

Illustratively, when the first processing unit is the signal processing accelerator 1, the second processing unit is the micro engine XPU 0. The signal processing accelerator 1 sends the event message mes.2 to the microengine XPU 0 according to the context. The context includes the routing information of the signal processing accelerator 1 to the micro engine XPU 0.

One possible implementation manner is that the first processing unit sends the second event message to the second processing unit according to the context information, and the first processing unit can send the second event message to an event queue corresponding to the second processing unit according to the route information; the second processing unit then retrieves a second event message from the event queue.

In the embodiment of the application, each computing resource comprising threads and accelerators has an own event queue; a thread or accelerator sends messages to the event queue of a downstream microengine/accelerator via its own event queue for event messages that require other computing resource processing. It will be appreciated that the application/CPU may also have its own event queue, enabling the transfer of event messages between the application/CPU, the thread, and the accelerator. When the thread sends the event message through the corresponding event queue, the event message is forwarded through the event queue of the micro engine in which the thread is built. In an embodiment of the present application, the event queue of the microengine is the event queue of a thread running on the microengine.

Referring to fig. 11, an event queue 4 corresponds to the signal processing accelerator 1, an event queue 3 corresponds to the audio collection thread, an event queue 0 corresponds to the audio processing thread, and the audio accelerator 1 corresponds to the event queue 5 in fig. 11. The audio acquisition thread on XPU 3 acquires the Data request Data-1, and then sends the event message mes.1 generated according to the Data request Data-1 to the event queue 4 through the event queue 3 according to the routing information included in the context of the application program; in response to the event queue 4 receiving the event message mes.1, the signal processing accelerator 1 obtains the event message mes.1 from the event queue 4, processes the event message mes.1 to generate an event message mes.2, then sends the event message mes.2 to the event queue 0 corresponding to the XPU 0 according to the routing information included in the context of the application, the audio processing thread running on the XPU 0 generates an event message mes.3 based on the event message mes.2, and then sends the mes.3 to the event queue 5 through the event queue 0 according to the routing information included in the context of the application; after the message.3 is sent to the event queue 5, in response to the event queue 5 receiving the event message mes.3, the audio accelerator 1 retrieves the event message mes.3 from the event queue 5 and processes the event message mes.3.

One possible implementation manner is that the second event message includes a target event queue identifier, where the target event queue identifier is a queue identifier of an event queue corresponding to the second processing unit.

Specifically, the first processing unit sends the second event message to the event queue corresponding to the second processing unit according to the routing information, which may be: the first processing unit determines event message routing information to be added in the second event message according to the routing information included in the context information, wherein the event message routing information comprises a target event queue identifier, and the target event queue identifier is a queue identifier of an event queue corresponding to the second processing unit; the first processing unit adds event message routing information in the second event message; the first processing unit sends a second event message added with the event message routing information, and the second event message added with the event message routing information is sent to an event queue corresponding to the second processing unit.

In the embodiment of the application, the event message routing information can also be called as circulation information, and the routing information included in the context information can also be called as circulation sequence information corresponding to the application program. The context identifier is used for indicating the context of the application program, and can indicate the circulation sequence information corresponding to the application program.

Illustratively, the signal processing accelerator 1 sends the event message mes.2 to the event queue 0 corresponding to the micro engine XPU 0 according to the routing information included in the context of the application, which may be: the signal processing accelerator 1 obtains the circulation sequence information corresponding to the application program according to the context identifier CID1 included in the event message mes.2, and presumes that the circulation sequence information is CID1, event queue 3, event queue 4, event queue 0 and event queue 5", the characterization transfer sequence is an audio acquisition thread, the signal processing accelerator 1, an audio processing thread and the audio accelerator 1 in sequence, and further determines the circulation information to be added in the event message mes.2 according to the circulation sequence information. The flow information comprises a target event queue identifier, and the target event queue identifier of the flow information of the event message mes.2 is a queue identifier of an event queue 0 corresponding to the micro engine XPU 0. Then, the signal processing accelerator 1 adds the aforementioned determined circulation information in the event message mes.2. Next, the signal processing accelerator 1 may send an event message mes.2 with the added flow information, which event message mes.2 with the added flow information is sent to the event queue 0 corresponding to the micro engine XPU 0.

In some embodiments of the present application, the routing information further includes a target routing domain, where the target routing domain is used to indicate a target server, and the target server is different from the source server, and the source server is a server where the first processing unit is located.

The circulation sequence information corresponding to the application program further includes a first target routing domain, and when determining the circulation information to be added in the event message mes.2, the circulation information further includes the first target routing domain, where the first target routing domain is used to indicate a first target server, and the first target server is not the same server as the source server where the signal processing accelerator 1 in fig. 11 is located.

It may be appreciated that in the embodiment of the present application, the thread or the accelerator may obtain the routing information according to the context and forward the next processing unit to the event message that needs to be processed downstream, where the unit may be the thread or the accelerator, or may be an application/CPU. The process of sending an event message from one processing unit to another processing unit is similar to the process of sending an event message mes.2 from the signal processing accelerator 1 to the micro engine XPU 0, and will not be described again here.

It should be noted that, in the process of processing an event message by the data processing system, after dynamic resource allocation is performed based on an event, when sequential transmission of the event message between different processing units is implemented according to a context, for a first processing unit of the processing units, a microengine is used, and a thread running on the microengine may acquire a data request and generate the first event message based on the data request. The data request is request information for requesting a response to specific data corresponding to a processing request of the application program. It should be noted that the processing request may be a data acquisition request or a data processing request. The data acquisition request is used for requesting to acquire target data corresponding to the data information contained in the request message, and the data processing request is used for requesting to process the data information contained in the request message.

Illustratively, consider the case where the Data request is a Data processing request Data-1, data-1 being used to request a response to a digital signal corresponding to a trigger event of "click talk key". When a trigger event of "click talk button" occurs, an application program is started, the Data processing system receives a Voice call processing request Voice01, an audio collection thread running on a micro engine XPU 3 collects an audio signal from a microphone through an ADC, obtains a Data request Data-1 corresponding to the trigger event of "click talk button", and generates an event message mes.1 according to the Data request Data-1, see FIG. 13.

It will be appreciated that in the video call terminal 1100 shown in fig. 11, if the first processing unit refers to the micro engine XPU0, the second processing unit refers to the audio accelerator 1. The video telephony terminal 1100 may transmit event messages between the micro engine XPU0 and the audio accelerator 1. The process of transmitting event messages between the microengine XPU0 and the audio accelerator 1 is similar to the process of transmitting event messages between the signal processing accelerator 1 and XPU 0. In the message process of the video call terminal 1100, an event message is transmitted between the micro engine XPU0 and the audio accelerator 1, firstly, the micro engine XPU0 acquires an event message mes.2; the micro engine XPU0 processes the event message Mes.2 to obtain an event message Mes.3; the micro engine XPU0 sends an event message mes.3 to the audio accelerator 1 according to the context. The context includes the routing information of the micro engine XPU0 to the audio accelerator 1.

The above mentioned event messages mes.1, mes.2, mes.3 comprise an identification of the context, e.g. context identification CID1. The context identification CID1 is used to indicate the context of an application.

It should be noted that the manner in which event messages are transmitted between the different processing units is similar to the manner in which event messages are transmitted from accelerator to micro-engine and vice versa. Therefore, the transmission process of the event message from accelerator to accelerator and from micro engine to micro engine is not described in detail.

In some alternative embodiments, the message processing method further comprises releasing a first thread, the first thread being one of the at least two threads; if the first thread is released, the wireless program is run on the engine where the first thread is released, and the engine where the first thread is released is closed.

In the implementation, in response to receiving an instruction for releasing the first thread, releasing the first thread running on the engine; if the first thread is released, the wireless program is run on the engine where the first thread is released, and the engine where the first thread is released is closed.

The instruction for releasing the first thread may be generated in response to the occurrence of a release event corresponding to the trigger event, and after receiving the instruction for releasing the first thread, the data processing system releases the first thread running on the first microengine. The release event is an event configured to stop data processing corresponding to the processing request after the processing request is started.

For example, for the video telephony terminal 1100 shown in fig. 11, the release event may be a click to stop call key or a video telephony call hang-up. When the user clicks the stop call key, the video call terminal 1100 releases the audio collection thread running on the XPU 3 in response to receiving an instruction to release the audio collection thread corresponding to the occurrence of the second event "click the stop call key". After the audio acquisition thread running on the XPU 3 is released, if no running thread exists on the XPU 3, closing the XPU 3, and realizing near-zero standby power consumption.

In an embodiment of the present application, a data request is a request to respond to specific data corresponding to a processing request of an application program. It should be noted that the processing request may be a data acquisition request, or may be a data processing request, where the data acquisition request is used to request acquisition of data information, and the data processing request is used to request processing of the data information included in the request message. In response to the processing request, in some embodiments, the data request may be a request to obtain data from specific data corresponding to the processing request of the application; in other embodiments, the data request may be a request to process specific data corresponding to a processing request of an application.

One possible implementation manner is that the data request is used for requesting to acquire target data, the target data is stored in a memory of the second server, and the computing resource for executing the processing request further comprises a third processing unit and a fourth processing unit; the at least two engines comprise a first processing unit, a second processing unit and a third processing unit; the fourth processing unit is an accelerator; the first event message and the second event message comprise identifications of target data, the first processing unit and the second processing unit are located in a first server, and the third processing unit and the fourth processing unit are located in a second server; the context also comprises route information from the second processing unit to the third processing unit and from the third processing unit to the fourth processing unit;

the second processing unit sends a third event message to a third processing unit located at the second server according to the context;

The fourth processing unit obtains the identification of the target data from the received fourth event message, obtains the target data from the memory of the second server according to the identification of the target data, and obtains a fifth event message according to the target data; the fifth event message is used to send the target data to the first server.

Illustratively, assuming that the data request may be a data acquisition request Req1, req1 is used to request acquisition of target data, the target data is stored in the memory of the second server S2, and the computing resources for executing the processing request include a micro engine XPU 3', a micro engine XPU 1', a micro engine XPU 0", and a semantic memory accelerator 1"; the event message mes.1 'and the event message mes.2' comprise an identification DTM1 of target data, the micro engine XPU 3', the micro engine XPU 1' are positioned on the first server S1, and the micro engine XPU 0 'and the semantic memory accelerator 1' are positioned on the second server S2; the context at least comprises routing information from the micro engine XPU 3 'to the micro engine XPU 1', from the micro engine XPU 1 'to the micro engine XPU 0', and from the micro engine XPU 0 'to the semantic memory accelerator 1'; the event message processing method comprises the following steps: sending the event message mes.1' to the micro engine XPU 1' at the micro engine XPU 3' according to the context; the micro engine XPU 1 'encapsulates the event message Mes.1' based on the event message Mes.1 'to generate the event message Mes.2'; for example, the event message mes.2' may be the first ethernet frame YTZ01; the micro-engine XPU 1 'sends the event message mes.2' to the micro-engine XPU 0 "at the second server S2 according to the context; the micro engine XPU 0 'decapsulates the event message Mes.2' based on the event message Mes.2 'to obtain an event message Mes.3', and sends the event message Mes.3 'to the semantic memory accelerator 1' according to the context; the semantic memory accelerator 1 "acquires the identification DTM1 of the target Data from the received event message mes.3', acquires the target Data tar_data1 from the memory of the second server S2 according to the identification DTM1 of the target Data, and sends the target Data tar_data1 to the first server S1.

Based on the embodiment provided by the application, the transmission of the event message among different processing units is realized based on the context, and compared with the transmission scheduling of the event message by adopting a scheduling mode (such as using a scheduler to schedule the message), the method can avoid the performance bottleneck caused by the transmission scheduling, and further can improve the processing performance of the system.

The message processing method of the application is applicable to the scenes of intelligent edge calculation, high-performance super-computing centers, automatic driving automobiles, robots, unmanned factories, unmanned mines and the like, and not only needs great calculation force but also needs high energy efficiency. The message processing method provided by the embodiment of the application is further described below by combining edge intelligent computing and high-performance super computing as two main scenes.

Example 1

At present, the video call terminal supports artificial intelligent computation such as face recognition, background replacement and the like, has higher and higher requirements on computing power, and simultaneously needs low power consumption, in particular to scenes such as mobile office, emergency command and the like. In this embodiment, a video call terminal is used as a typical scenario of edge intelligent computing, the video call terminal is configured with a data processing system, and reference is made to fig. 14 for structural relationships of computing resources of the video call terminal.

The following describes an implementation scheme for dynamically deploying call related threads based on an event triggering mode to realize a data session of a voice session, thereby unloading software computing load. Wherein the event may be a call connection.

The voice session of the video call terminal may involve audio acquisition, FFT, etc. conversion, audio codec, data exchange with the call counterpart over a TCP/IP connection. The voice call application of the present application creates three threads onto different microengines through a highly dynamic operating system, wherein,

The audio acquisition thread is mainly responsible for acquiring audio signals from a microphone through an ADC (analog to digital converter), and acquiring audio digital signals according to a fixed time slice, such as 1ms, and packaging the audio digital signals into event messages;

The audio processing thread is mainly responsible for converting the audio signals subjected to noise removal and other processing into audio transmission messages according to MP3 or H264 coding formats;

The TCP/IP thread is mainly responsible for establishing and maintaining the IP session connection with the opposite terminal of the call, and the voice session has an independent port number.

After the data processing software package of the voice call is developed through the application layer, the resource configuration information of the data processing software package is loaded and registered through the high dynamic operating system. After the configuration operation, the voice call application program is installed on the video call terminal.

Wherein the resource configuration information includes, but is not limited to, some or all of the following: accelerator type, number of accelerators, number of microengines, operational configuration information, flow order information, and trigger events. Wherein the flow order information characterizes an order in which respective computing resources corresponding to processing requests of the application respond to the processing requests. Wherein the operation configuration information and the circulation order information may be obtained through data session information set by the application layer.

Illustratively, the accelerator type in the resource configuration information of the voice call application may be: the number of the accelerators corresponding to the three types of accelerators can be 1,1 and 1 respectively; the number of microengines may be "3". Wherein the number of accelerators of the signal processing accelerator is "1", the high dynamic operating system is characterized by configuring 1 signal processing accelerator for the voice call application according to the number of accelerators of the signal processing accelerator of "1". The signal processing accelerator, the audio processing accelerator and the session connection accelerator are assumed to be respectively configured as follows: signal processing accelerator a, audio processing accelerator B, session connection accelerator C. The triggering event of the voice call application may be call connection. Call connection is a preset event for initiating a session handling request after the data processing system loads the data processing software package of the voice call application.

When the user continues the call, a session processing request Chat01 is sent out, and the voice call application program is started. The following describes in detail the process of configuring computing resources by the highly dynamic operating system when the voice call application is started:

Step K1, in response to an instruction for starting an application program, the high dynamic operating system determines computing resources used by the application program according to the resource configuration information of the application program, and in response to a session processing request, determines a task corresponding to the processing request: audio acquisition tasks, audio processing tasks, session connection tasks.

As shown in fig. 14, the computing resources include a micro engine XPU 3, a signal processing accelerator a, a micro engine XPU0, an audio processing accelerator B, a micro engine XPU 2, and a session connection accelerator C; the signal processing accelerator A corresponds to the event queue EQ 1; the audio processing accelerator B corresponds to the event queue EQ 2; the session connection accelerator C corresponds to the event queue EQ 4; the micro engine XPU 3 corresponds to the event queue EQ 0; the micro engine XPU0 corresponds to the event queue EQ 3; the micro engine XPU 2 corresponds to the event queue EQ 5. When a trigger event of 'call connection' occurs, a session processing request Chat01 is started, a task corresponding to the session processing request Chat01 is determined in response to the session processing request Chat01 corresponding to 'call connection' of an application program, wherein the task at least comprises a first task, a second task and a third task, for example, the first task is an audio acquisition task, the second task is an audio processing task and the third task is a session connection task.

In specific implementation, the resource configuration information includes the number of engines, and the type and number of accelerators; when an application program is started, resource configuration information of the application program is obtained in response to the starting of the application program, an engine used by the application program is selected according to the resource configuration information and the load of the candidate engine, and an accelerator used by the application program is selected according to the resource configuration information, wherein the selected accelerator comprises a first accelerator and a second accelerator.

As an example, the accelerator types in the resource configuration information of the voice call application may include "signal processing accelerators", the number of accelerators corresponding to the accelerators of the "signal processing accelerators" type is "3", when determining the computing resource used by the voice call application according to the resource configuration information of the voice call application, an accelerator pool corresponding to the accelerator type "signal processing accelerators" may be determined according to the accelerator type "signal processing accelerators", and 3 signal processing accelerators may be selected from the foregoing accelerator pool according to the accelerator number "3", and the 3 signal processing accelerators may be respectively: a signal processing accelerator A, an audio processing accelerator B and a session connection accelerator C; similarly to the process of determining the accelerator, assuming that the number of microengines included in the resource configuration information is "3", the high dynamic operating system selects 3 microengines according to the number of microengines "3" and the load of the candidate engines, for example, to obtain the microengines XPU 3, XPU 0, XPU 2. The selection of the engines used by the application program may be, in some embodiments, selecting a specified number of microengines from the candidate engines according to the order of the load rate from low to high; in other embodiments, a specified number of microengines meeting the load requirement may be selected from the candidate engines based on the load requirement, where the load requirement may be obtained from resource configuration information.

Step K2, after generating an audio acquisition task, an audio processing task and a session connection task corresponding to the processing request in response to the session processing request Chat01 of the application program, creating an audio acquisition thread for processing the audio acquisition task on XPU 3, creating an audio processing thread for processing the audio processing task on XPU 0, creating a TCP/IP thread for processing the session connection task on XPU 2, and determining computing resources corresponding to the audio acquisition task, the audio processing task and the session connection task.

The computing resources corresponding to the audio acquisition task include an XPU 3 and a signal processing accelerator a, the computing resources corresponding to the audio processing task include an XPU 0 and an audio processing accelerator B, and the computing resources corresponding to the session connection task include an XPU2 and a session connection accelerator C, as shown in fig. 14.

It should be noted that, in the embodiment of the present application, when a thread sends an event message through its corresponding event queue, the thread forwards the event message, specifically through the event queue of the micro engine in which the thread is created. In the message processing method, in the process of configuring the computing resources by the high-dynamic operating system, after the computing resources are allocated to a plurality of tasks including a first task and a second task in response to the received processing request, threads corresponding to the tasks are created; alternatively, the threads corresponding to the respective tasks may be created first, and then the computing resources corresponding to the plurality of tasks including the first task and the second task may be determined.

And step K3, distributing a context identifier for indicating the context according to the resource configuration information.

The context comprises operation configuration information corresponding to the application program.

The resource configuration information includes operational configuration information for computing resources; computing resources include microengines and accelerators; and when the application program is started, the context identifier is allocated according to the resource configuration information. The context identification is used to indicate operation configuration information corresponding to the same processing request of the application program. The context identification is included in all event messages corresponding to the same processing request of the application.

For example, the operation configuration information may be a data Session set by the user through the application layer, and the context identifier of the voice call application for indicating the context may be CID2 obtained according to a data Session set by the user through the application layer, for example, "Create Session (CID 2, in: ADC, via: FFT, …, out: framer, bit width, sampling point number, period, data sub-block time slice, double floating point precision, …)".

In some embodiments, the context identifier is further used to indicate the circulation order information corresponding to the application program; the computing resources used by the application program send event messages to the next station according to the flow order information.

Assume that according to a data Session set by a user through an application layer, for example, "Create Session (CID 2, in: ADC, via: FFT, …, out: framer, bit width, sampling point number, period, data sub-block time slice, double floating point precision, …)", the obtained flow order information is "CID2, event queue EQ0, event queue EQ1, event queue EQ3, event queue EQ2, event queue EQ5, event queues EQ4, …" characterizes the transfer order as audio acquisition thread, signal processing accelerator a, audio processing thread, audio processing accelerator B, TCP/IP thread, session connection accelerator C. The computing resources used by the voice call application send event messages to the next station based on the flow order information determined by CID 2.

And K4, establishing a first route Line1 between the XPU 3 and the signal processing accelerator A, establishing a second route Line2 between the XPU 0 and the audio processing accelerator B, a third route Line3 between the signal processing accelerator A and the XPU 0, a third route Line4 between the audio processing accelerator B and the XPU 2, and a third route Line5 between the XPU 2 and the session connection accelerator C.

In specific implementation, a first route Line1 between the XPU 3 and the signal processing accelerator a may be set, where the first route information line1_lm1 corresponds to the audio acquisition thread, where the first route information line1_lm1 includes a first target event queue identifier line1_tqm1, where the first target event queue identifier line1_tqm1 is an event queue EQ1 shown in fig. 14, where the event message mes.1 includes the first route information line1_tqm1, that is, a communication link is set between the audio acquisition thread and the event queue EQ1, where the event queue EQ1 corresponds to the signal processing accelerator a, and where the communication link set between the audio acquisition thread and the event queue EQ1 is the first route Line1.

The second route Line2 between the XPU 0 and the audio processing accelerator B may be set to second route information line2_lm2 corresponding to the audio processing thread, where the second route information line2_lm2 includes a second target event queue identifier line2_tqm2, the second target event queue identifier line2_tqm2 is an event queue EQ2, and the second event message mes.3 includes the second route information line2_lm2.

The setup process of Line3 to Line5 is similar to the setup process of Line1 and Line2, and will not be described again here.

In an embodiment of the present application, the event message further includes routing domain information.

For example, the first routing information line1_lm1 further includes a first target routing domain, where the first target routing domain is used to indicate a first target server, which may be a different server than the source server where the XPU 3 in fig. 14 is located.

After the application program is configured, the data processing system can normally operate. The following exemplifies the data processing after the voice call application is started.

When the voice call application program is started, the following data processing process is executed when the audio data corresponding to the call connection of the user call is received:

in step L1, in response to receiving the Data request Data-1 'of the audio acquisition task, the audio acquisition thread for processing the audio acquisition task sends the event message mes.1_1 generated according to the Data request Data-1' to the event queue EQ1 corresponding to the audio acquisition task according to the context, see fig. 15, in response to the event queue EQ1 receiving the event message mes.1_1, the signal processing accelerator a corresponding to the audio acquisition task processes the mes.1_1, generates the event message mes.2_1 according to the processing result, and sends the event message mes.2_1 to the audio processing thread for processing the audio processing task according to the context.

In specific implementation, the context identifier CID2 is used to indicate a context corresponding to an application program, where the context includes routing information for representing sequential event message transfer among the microengine XPU 3, the signal processing accelerator a, the microengine XPU 0, the audio processing accelerator B, the microengine XPU 2, and the session connection accelerator C. In the embodiment of the application, the route information included in the context can also be called as circulation sequence information corresponding to the application program; the context identifier is included in each event message, e.g. the context identifier CID2 is included in the event message mes.1_1, the event message mes.2_1, the event message mes.3_1, etc.

The audio collection thread obtains the first circulation information for the audio collection thread in the circulation sequence information corresponding to the application program according to the context identifier CID2 included in the event message mes.1_1, and sends the event message mes.1_1 generated according to the Data request Data-1' to the event queue EQ1 corresponding to the audio collection task according to the first circulation information for the audio collection thread.

Wherein the flow information may be an identification of the event queue. Specifically, the first circulation information for the audio collection thread may be an identification of an event queue EQ 1; the second streaming information for the signal processing accelerator a may be an identification of an event queue EQ3 corresponding to the audio processing thread.

One possible implementation manner is that the signal processing accelerator a processes the first event message in the event queue EQ1, specifically: the signal processing accelerator a obtains corresponding first operation configuration information for the signal processing accelerator a according to the context identifier included in the first event message, and processes the first event message according to the first operation configuration information for the signal processing accelerator a.

In particular implementations, the context includes operational configuration information for the computing resource; computing resources include microengines and accelerators; when the application program is started, the context and the context identification are distributed according to the operation configuration information. The context identification is used to indicate a context corresponding to the same processing request of the application. The context identification is included in the first event message and the second event message.

For example, it is assumed that the first operation configuration information for the signal processing accelerator a is to specify that the received event message of the context ID is subjected to FFT or the like. In particular execution, the signal processing accelerator a acquires corresponding first operation configuration information "FFT etc. of the received event message of the context ID" for the signal processing accelerator a according to the context identification CID2 included in the first event message mes.1_1, and performs FFT etc. of the first event message mes.1_1 according to the first operation configuration information "FFT etc. of the received event message of the context ID".

It can be appreciated that, from the perspective of the event queue, when the event queue of the signal processing accelerator a receives the event message, the signal processing accelerator a can be triggered to respond to the event message in real time by adopting an asynchronous handshake signal mode, find the corresponding operation configuration information according to the CID2, and perform transformation such as FFT according to the specification.

In step L2, the audio processing thread generates an event message mes.3_1 based on the event message mes.2_1 and sends the event message mes.3_1 to an event queue EQ2 corresponding to the audio processing task according to the context, and in response to the event queue EQ2 receiving the event message mes.3_1, the audio processing accelerator B processes the event message mes.3_1, generates an event message Mes5_1 according to the processing result, and sends the event message mes.5_1 to the TCP/IP thread for processing the session connection task according to the context.

The process that the audio processing thread sends the event message mes.3_1 to the event queue EQ2 corresponding to the audio processing task according to the context, and the process that the audio processing accelerator B sends the event message mes.5_1 to the TCP/IP thread for processing the session connection task according to the context are similar to the process that the audio acquisition thread sends the event message mes.1_1 to the event queue EQ1 corresponding to the audio acquisition task according to the context, which are not repeated herein.

Illustratively, the second operation configuration information for the audio processing accelerator B may be an event message specifying the received context ID to be subjected to FFT or the like. The process of processing the event message mes.3_1 by the audio processing accelerator B is similar to the process of processing the first event message in the event queue EQ1 by the signal processing accelerator a, and detailed description thereof is omitted.

In step L3, the TCP/IP thread generates an event message mes.6_1 based on the event message mes.5_1, and sends the event message mes.6_1 to the event queue EQ4 corresponding to the session connection task according to the context, and in response to the event queue EQ4 receiving the event message mes.6_1, the session connection accelerator C corresponding to the session connection task processes the event message mes.6_1.

After this, the session connection accelerator C may also transmit the processing result data to the corresponding next station according to the context. For example, it may be to generate a new event message, assuming the new event message is event message mes.7_1, and send the event message mes.7_1 to a later node, such as a network card, application/CPU or other thread or accelerator, etc., depending on the context.

The release event of the voice call application of the present embodiment may be "call refusal" in correspondence to the trigger event being "call continuation". When the user performs the call refusing operation, the voice call application program responds to the release event of the call refusing, and releases the audio acquisition thread running on the XPU 3. After the audio acquisition thread running on the XPU 3 is released, if no running thread exists on the XPU 3, the XPU 3 is further closed, and the near-zero standby power consumption is realized.

In the embodiment, a high dynamic computing mode is adopted, a CPU and a PCI-E bus with high main frequency are not needed, and the manufacturing cost of the system can be greatly reduced; the micro engine, the accelerator and the like can be dynamically started and closed, so that the system power consumption is greatly reduced, and the system has longer endurance; the resources such as the microengines, the accelerators and the like remain unchanged once allocated, so that the service deterministic experience can be ensured.

Embodiment two:

The new computing technology driven by data such as machine learning is widely adopted by high-performance supercomputers such as weather forecast, petroleum detection, pharmacy and the like, wherein a key problem is exposed, namely, a problem of mass data sharing is solved, static data and dynamic data need to be shared among thousands or even tens of thousands of servers, and the transmission delay requirements of the servers are shorter and shorter, and the requirement can be smaller than microseconds. This example describes a technical scheme of massive parallel computing for implementing mass data sharing by using high dynamic computing, and focuses on a mechanism for implementing data sharing, and other mechanisms can completely reuse an implementation scheme of edge intelligent computing, including data.

Firstly, high dynamic computing adopts a semantic driven data sharing mode, massive shared data is structured and loaded into a memory through an application layer definition data semantic context, then computing tasks are deployed to a server which is closer to data through the application layer definition computation semantic context, corresponding route optimization network transmission delay is adjusted, data transmission delay is reduced, parallel computing performance is improved, and power consumption is reduced. Wherein the mapping mechanism of semantics between the application layer and the hardware layer is seen in fig. 16. The application layer performs hierarchical semantic definition on the multi-scale data through administrative regions, such as from the root to the layer in fig. 16; then appointing the event queue ID of the corresponding storage server, and distributing the definition of the layering storage position such as the corresponding object ID, grid ID, etc., the event queue ID can send the request of the storage information of the data access to the corresponding server, then the shared memory accelerator of the server analyzes the storage information, finds the corresponding page table data by the ID, and then packages the event information corresponding to the storage information and returns the event information to the data request service.

In order to reuse the network of the data center as much as possible, the scheme adopts a network card or an intelligent network card to connect with the data center network, and the high dynamic computing system scheme of the super computing server is shown in fig. 17, the network card is connected to a micro engine, and an accelerator for semantically driving the memory is added. The micro-engine deploys an Ethernet processing protocol to identify event messages of accelerators, such as event messages of semantic memory accelerators; once identified, the event message is forwarded to a semantic memory accelerator, e.g., a request message, via a routing network according to the local data context, corresponding data is found according to the above-defined semantics, and the data message is then sent back to the source server as a corresponding event queue message. Each server corresponds to a routing domain and application layer semantic creation is to be assigned to the event queue ID of a particular semantic accelerator.

Taking the remote access of semantic data by parallel computing threads as an example, the interactive flow of data sharing is described, with specific reference to fig. 18, the main steps are as follows:

1) The parallel computing thread of the server 1 finds out the corresponding semantic ID according to the object required by the computation, constructs event queue information according to the event queue ID of the opposite end of the semantic ID and the routing domain of the server to which the event queue information belongs, and forwards the message to the Ethernet protocol processing thread according to the context of the remote data session;

2) The Ethernet protocol processing thread of the server 1 receives the event queue, finds the MAC address of the opposite party and the VLAN ID (Virtual Local Area Network, virtual local area network number) special for data sharing according to the routing domain of the routing range field, constructs the protocol frame head of the Ethernet and then carries the event message, forwards the event message to the network card, and forwards the data center exchanger through the network card to finally reach the server 2;

3) The Ethernet protocol processing thread of the server 2 analyzes the Ethernet protocol frame received by the network card of the server 2 to obtain an event message, and forwards the event message to the internal routing network according to the event queue ID to the semantic memory accelerator;

4) The semantic memory accelerator of the server 2 analyzes the event message, takes out the object ID and the like, maps to the local memory, obtains corresponding data, and then forwards the corresponding data to the server requesting the data according to the source routing information of the event queue, and the subsequent flow is consistent with the above and is not repeated here.

According to the embodiment, the semantic data sharing mechanism with the high dynamic computing mode is adopted, so that the software processing overhead is reduced, the transmission delay of data sharing across servers and the parallelism of multiple computing tasks in the servers are shortened, the performance of the whole supercomputer center is improved, and the power consumption is reduced.

The message processing method according to the embodiment of the present application has been described in detail above with reference to fig. 1 to 18, and based on the same technical concept as the above-described message processing method, the embodiment of the present application further provides a message processing apparatus 1900, as shown in fig. 19, where the message processing apparatus 1900 includes: the first operational module 1901, the apparatus 1900 may be used to implement the methods described in the method embodiments of message processing described above.

A first execution module 1901, configured to process, by the first processing unit, a first event message to obtain a second event message, where the first event message is received by the first processing unit, or the first event message is generated by the first processing unit based on a processing request of an application program;

Sending, by the first processing unit, a second event message to the second processing unit according to context information, the context information including routing information from the first processing unit to the second processing unit, the context information being generated based on a processing request of the application;

In one possible design, the message processing apparatus 1900 further includes a resource configuration module 1902, the resource configuration module 1902 being configured to:

Receiving a processing request from an application program;

Determining computing resources according to processing requests of the application program, wherein the computing resources comprise a first processing unit and a second processing unit;

context information is generated according to a processing request of an application program.

In one possible design, the first processing unit or the second processing unit is selected from the plurality of processing units by the resource configuration module 1902 based on state information of the plurality of processing units at the time of receipt of a processing request of the application, the state information of the processing units including network topology performance.

In one possible design, resource configuration module 1902 is further configured to:

Determining at least two tasks included in the processing request;

creating at least two threads corresponding to at least two tasks;

At least two threads are loaded onto at least two engines for execution, wherein different threads are run on different engines.

In one possible design, resource configuration module 1902 is specifically configured to:

acquiring semantics of a processing request, wherein the semantics of the processing request comprise at least two task semantics;

It should be noted that, in the embodiment of the present application, the division of the modules is merely schematic, and there may be another division manner in actual implementation, and in addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or may exist separately and physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Based on the same concept as the above-described message processing method, as shown in fig. 20, an embodiment of the present application further provides a schematic structural diagram of a message processing apparatus 2000. The apparatus 2000 may be used to implement the method described in the message processing method embodiments described above as applied to a data processing system, see the description in the method embodiments described above. The device 2000 may be in a data processing system or a data processing system.

The device 2000 includes one or more processors 2001. The processor 2001 may be a general-purpose processor or a special-purpose processor, or the like. For example, a central processing unit. The central processor may be used to control a message processing apparatus (e.g., a terminal, or a chip, etc.), execute a software program, and process data of the software program. The message processing device may comprise a transceiver unit for enabling input (reception) and output (transmission) of signals. For example, the transceiver unit may be a transceiver, a radio frequency chip, or the like.

The apparatus 2000 includes one or more processors 2001, and the one or more processors 2001 may implement the methods of the data processing system in the embodiments shown above.

Alternatively, the processor 2001 may implement other functions in addition to the methods of the embodiments shown above.

Alternatively, in one design, processor 2001 may execute instructions to cause device 2000 to perform the methods described in the method embodiments above. The instructions may be stored in whole or in part within a processor, such as instructions 2003, or in whole or in part within a memory 2002 coupled to the processor, such as instructions 2004, or may cause the apparatus 2000 to perform the methods described in the method embodiments above by the instructions 2003 and 2004 together.

In yet another possible design, message processing device 2000 may also include circuitry that may implement the functions of the data processing system in the foregoing method embodiments.

In yet another possible design, the device 2000 may include one or more memories 2002 having instructions 2004 stored thereon that can be run on a processor to cause the device 2000 to perform the methods described in the method embodiments above. Optionally, the memory may also have data stored therein. The optional processor may also store instructions and/or data. For example, the one or more memories 2002 may store the correspondence described in the above embodiment, or related parameters or tables or the like involved in the above embodiment. The processor and the memory may be provided separately or may be integrated.

In yet another possible design, device 2000 may also include a transceiver 2005 and an antenna 2006. The processor 2001 may be referred to as a processing unit, controlling the device. The transceiver 2005 may be referred to as a transceiver, a transceiver circuit, a transceiver unit, or the like, for implementing the transceiver function of the device through the antenna 2006.

It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), an off-the-shelf programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

It will be appreciated that the memory in embodiments of the application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM, EPROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory, among others. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The embodiments of the present application also provide a computer readable medium having stored thereon a computer program which, when executed by a computer, implements the message processing method of any of the method embodiments described above applied to a data processing system.

Embodiments of the present application also provide a computer program product which, when executed by a computer, implements a message processing method as described above for any of the method embodiments of the data processing system.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a solid-state disk (solid-state drive STATE DISK, SSD)), or the like.

The embodiment of the application also provides a processing device, which comprises a processor and an interface; a processor for performing the message processing method of any of the method embodiments described above as applied to a data processing system.

It should be understood that the processing device may be a chip, and the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and exist separately.

As shown in fig. 21, an embodiment of the present application further provides a chip 2100, which includes an input/output interface 2101 and a logic circuit 2102, where the input/output interface 2101 is used to receive/output code instructions or information, and the logic circuit 2102 is used to execute the code instructions or according to the information, so as to execute the message processing method of any of the method embodiments applied to the data processing system.

The chip 2100 may implement the functions shown by the processing unit and/or the transmitting/receiving unit in the above-described embodiments.

For example, the input-output interface 2101 is used to input resource configuration information of the data processing system, and the input-output interface 2101 is also used to output request information for acquiring target data stored in the shared memory. Optionally, the input-output interface 2101 may also be used to receive code instructions for instructing the retrieval of a data request from an application.

The embodiment of the application also provides a data processing system, which comprises the message processing device in the embodiment, wherein the message processing device is used for executing the message processing method of any method embodiment.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in hardware, or firmware, or a combination thereof. When implemented in software, the functions described above may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. Taking this as an example but not limited to: the computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, it is possible to provide a device for the treatment of a disease. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the fixing of the medium. As used herein, discs (disks) and disks include Compact Discs (CDs), laser discs, optical discs, digital Versatile Discs (DVDs), floppy disks, and blu-ray discs where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

In summary, the above embodiments are only preferred embodiments of the present application, and are not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

A method of message processing comprising:

The first processing unit processes a first event message to obtain a second event message, wherein the first event message is received by the first processing unit or is generated by the first processing unit based on a processing request of an application program;

The first processing unit sends the second event message to a second processing unit according to context information, wherein the context information comprises route information from the first processing unit to the second processing unit, and the context information is generated based on a processing request of the application program;

the first processing unit is a first engine, the second processing unit is a second accelerator, or the first processing unit is a first accelerator, the second processing unit is a second engine, or the first processing unit is a first engine, the second processing unit is a second engine, or the first processing unit is a first accelerator, and the second processing unit is a second accelerator.
The method of claim 1, wherein the first processing unit sending the second event message to a second processing unit based on context information, comprising:

The first processing unit sends the second event message to an event queue corresponding to the second processing unit according to the routing information;

The second processing unit obtains the second event message from the event queue.
The method of claim 2, wherein the second event message includes a target event queue identification, the target event queue identification being a queue identification of an event queue corresponding to the second processing unit.
The method of claim 3, wherein the routing information further comprises a target routing domain, the target routing domain to indicate a target server, the target server being different from an origin server, the origin server being a server where the first processing unit is located.
The method of claim 1, wherein the second processing unit is a second accelerator; the first processing unit sends the second event message to a second processing unit according to the context information, and the method comprises the following steps:

The first processing unit sends the second event message to an event queue corresponding to an accelerator pool according to the routing information, wherein the accelerator pool comprises a plurality of accelerators, and the types of the accelerators are the same; determining the second accelerator from the plurality of accelerators according to the states of the plurality of accelerators;

and sending the second event message to the second accelerator.
The method of any of claims 1-5, wherein prior to the first processing unit receiving a first event message, further comprising:

Receiving a processing request from an application program;

Determining computing resources according to the processing request of the application program, wherein the computing resources comprise the first processing unit and the second processing unit;

and generating the context information according to the processing request of the application program.
The method of claim 6, wherein the first processing unit or the second processing unit is selected from a plurality of processing units based on state information of the plurality of processing units at the time of receiving a processing request of the application, the state information of the processing units including network topology performance.
The method of claim 6 or 7, wherein after receiving the processing request from the application program, further comprising:

Determining at least two tasks included in the processing request;

Creating at least two threads corresponding to the at least two tasks;

And loading the at least two threads to at least two engines for running, wherein different threads run on different engines.
The method of claim 8, wherein the determining at least two tasks that the processing request includes comprises:

Acquiring the semantics of the processing request, wherein the semantics of the processing request comprise at least two task semantics;

and determining a corresponding task according to each task semantic in the at least two task semantics.
The method of claim 8 or 9, wherein the method further comprises:

Releasing a first thread, wherein the first thread is one of the at least two threads;

And if the first thread is released, the wireless program on the engine where the first thread is released already runs, and the engine where the first thread is released is closed.
The method according to any of claims 8-10, wherein the processing request is for requesting acquisition of target data, the target data being stored in a memory of the second server; the computing resource for executing the processing request further includes a third processing unit and a fourth processing unit; the at least two engines include the first processing unit, the second processing unit, and the third processing unit; the fourth processing unit is an accelerator; the first event message and the second event message comprise identifications of the target data, the first processing unit and the second processing unit are located in a first server, and the third processing unit and the fourth processing unit are located in a second server; the context also includes routing information of the second processing unit to the third processing unit, the third processing unit to the fourth processing unit;

After the first processing unit sends the second event message to the second processing unit according to the context, the method further comprises:

the second processing unit encapsulates the second event message based on the second event message to generate a third event message;

The second processing unit sends the third event message to the third processing unit located at the second server according to the context;

The third processing unit decapsulates the third event message based on the third event message to obtain a fourth event message, and sends the fourth event message to the fourth processing unit according to the context;

The fourth processing unit obtains the identification of the target data from the received fourth event message, obtains the target data from the memory of the second server according to the identification of the target data, and obtains the fifth event message according to the target data; the fifth event message is used to send the target data to the first server.
The method of any of claims 1-11, wherein the context information further comprises operational configuration information;

The first processing unit processes the first event message to obtain a second event message, including:

and the first processing unit processes the first event message according to the operation configuration information to obtain a second event message.
The method according to any of claims 1-12, wherein an identification of the context information is included in the first event message and the second event message, the identification of the context information being used to obtain the context information.
The method of any of claims 1-13, wherein the second event message comprises:

A message attribute information field including event message routing information, where the event message routing information includes a target event queue identifier, where the target event queue identifier is a queue identifier of an event queue corresponding to the second processing unit;

A message length field including total length information of the second event message;

A data field comprising a payload of the second event message.
The method of claim 14, wherein the data field comprises a first event information field, the first event information field comprising at least one of:

A routing scope, an identification of the context information, a source message queue identification, or a custom attribute, the routing scope including at least one routing domain.
The method of claim 15, wherein the data field comprises a second event information field comprising custom information for an application layer.
A message processing apparatus, comprising:

The first operation module is used for: processing a first event message through a first processing unit to obtain a second event message, wherein the first event message is received by the first processing unit or is generated by the first processing unit based on a processing request of an application program;

Sending, by the first processing unit, the second event message to a second processing unit according to context information, the context information including routing information of the first processing unit to the second processing unit, the context information being generated based on a processing request of the application;

the first processing unit is a first engine, the second processing unit is a second accelerator, or the first processing unit is a first accelerator, the second processing unit is a second engine, or the first processing unit is a first engine, the second processing unit is a second engine, or the first processing unit is a first accelerator, and the second processing unit is a second accelerator.
A message processing apparatus, comprising a processor and a memory,

The memory is used for storing executable programs;

The processor for executing a computer executable program in memory, such that the method of any of claims 1-16 is performed.
A computer readable storage medium, characterized in that the computer readable storage medium stores a computer executable program which, when called by a computer, causes the computer to perform the method according to any of claims 1-16.
A chip, comprising: logic circuitry and an input-output interface for receiving code instructions or information, the logic circuitry to execute the code instructions or to perform the method of any of claims 1-16 in accordance with the information.
A computer program product comprising computer instructions which, when executed by a computing device, can perform the method of any of claims 1-16.