CN116719764A

CN116719764A - Data synchronization method, system and related device

Info

Publication number: CN116719764A
Application number: CN202310983578.2A
Authority: CN
Inventors: 杨龚轶凡; 郑瀚寻; 闯小明; 张家诚
Original assignee: Suzhou Yangsiping Semiconductor Co ltd
Current assignee: Suzhou Yangsiping Semiconductor Co ltd
Priority date: 2023-08-07
Filing date: 2023-08-07
Publication date: 2023-09-08
Anticipated expiration: 2043-08-07
Also published as: CN116719764B

Abstract

The embodiment of the application provides a data synchronization method, a data synchronization system and a related device. The method comprises the following steps: receiving a data carrying task from a Central Processing Unit (CPU); wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates the execution sequence of the data handling request; receiving request data returned by the external equipment based on the data carrying request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request; according to the second command sequence number carried by the request data, synchronous information corresponding to the data carrying task is sent to the CPU; the synchronization information is used for indicating the execution progress of the data handling task. The application can realize the data synchronization between the DMA engine and the CPU, improve the data transmission efficiency and improve the calculation power of the equipment.

Description

Data synchronization method, system and related device

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a data synchronization method, a data synchronization system and a related device.

Background

Currently, more and more fields need to rely on artificial intelligence technology. Artificial intelligence, while providing powerful computing functionality, is also accompanied by high computational power demands on hardware devices.

How to run the instructions of a central processing unit (Central Processing Unit, CPU) in a hardware device more efficiently, and to improve the computing power of the device, is one of research directions. In the related art, a direct memory access (Direct Memory Access, DMA) transmission technology is adopted to realize high-speed data transmission between the external device and the memory, so that the data transmission efficiency of the hardware device is improved, and the consumption of CPU computing resources is reduced.

A DMA Engine (Engine) is a hardware component for completing data reading and writing between a memory and an external device and between memories without intervention of a CPU. Because the completion time of reading and writing different data is inconsistent, in order to improve the transmission efficiency, the DMA engine generally receives the data fed back by different time in a burst mode, however, the execution efficiency of the CPU instruction is reduced in this mode, and the calculation power of the device is affected. Taking the advanced extensible interface (Advanced eXtensible Interface, AXI) bus protocol as an example, assume that an instruction of the CPU needs to read four data, data_0, data_1, data_2, and data_3, to begin performing operations. Then, even if the data_1, data_2 and data_3 are read faster and are preferentially fed back to the DMA engine before the data_0, the CPU still needs to wait until the data_0 is read completely to execute the instruction, which greatly reduces the execution efficiency of the CPU instruction.

Therefore, a new solution is needed to overcome the technical problems caused by the out-of-order receiving and transmitting mode, and optimize the data transmission efficiency and the physical energy consumption of the node.

Disclosure of Invention

The embodiment of the application provides an improved data synchronization method, an improved data synchronization system and a related device, which are used for realizing data synchronization between a DMA engine and a CPU, improving data transmission efficiency and improving equipment computing power.

The embodiment of the application aims to provide a data synchronization method, a data synchronization system and a related device.

In a first aspect of the present application, there is provided a data synchronization method applied to a DMA engine, comprising:

receiving a data carrying task from a Central Processing Unit (CPU); wherein the data handling task includes a descriptor;

sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests;

receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request;

According to the second command sequence number carried by the request data, synchronous information corresponding to the data carrying task is sent to a CPU; the synchronization information is used for indicating the execution progress of the data handling task.

In a second aspect of the present application, there is provided a DMA engine for use in implementing a data synchronization method as described in any one of the first aspects; the DMA engine includes:

a transceiver module configured to receive a data handling task from a Central Processing Unit (CPU); wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests; receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request;

the processing module is configured to send synchronous information corresponding to the data carrying task to the CPU according to the second instruction sequence number carried by the request data; the synchronization information is used for indicating the execution progress of the data handling task.

In a third aspect of the present application, there is provided a data synchronization system, characterized in that the system comprises a CPU and a DMA engine; wherein,,

the CPU is configured to send a data handling task to the DMA engine; wherein the data handling task includes a descriptor;

the DMA engine is configured to receive a data handling task from a Central Processing Unit (CPU); wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests; receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request; according to the second command sequence number carried by the request data, synchronous information corresponding to the data carrying task is sent to a CPU; the synchronous information is used for indicating the execution progress of the data carrying task;

the CPU is further configured to receive synchronization information corresponding to the data handling task.

In a fourth aspect of the application, there is provided a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the data synchronization method described in the first aspect.

In a fifth aspect of the application, there is provided a computing device configured to: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the data synchronization method in the first aspect when executing the computer program.

The technical scheme provided by the embodiment of the application provides a data synchronization method which is applied to a DMA engine. First, the DMA engine receives a data handling task from the CPU, the data handling task including a descriptor. Further, a data handling request is sent to the external device based on the descriptor, the data handling request carrying a first instruction sequence number indicating an order of execution of the data handling request. Further, request data returned by the external device based on the data carrying request is received, and the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request. And finally, according to the second command sequence number carried by the request data, sending synchronization information corresponding to the data carrying task to the CPU, wherein the synchronization information is used for indicating the execution progress of the data carrying task.

Compared with the mode of disordered receiving and disordered feedback in the prior art, the embodiment of the application ensures that the DMA engine can acquire the execution progress of the data carrying task based on the second command sequence number after disordered receiving the request data by the first command sequence number indicating the task execution sequence in the data carrying request and the second command sequence number matched with the first command sequence number in the request data, further informs the CPU of the execution progress of the data carrying task through the synchronous information, realizes the data synchronization between the DMA engine and the CPU, overcomes the problems of low CPU command execution efficiency, excessively long storage space occupation and the like caused by disordered receiving and transmitting modes, improves the data transmission efficiency and improves the calculation power of equipment.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 schematically shows a flow diagram of a data synchronization method according to the present application;

FIG. 2 schematically illustrates an architecture diagram of a data synchronization system in accordance with the present application;

FIG. 3 schematically illustrates a schematic diagram of a data synchronization method according to the present application;

FIG. 4 schematically illustrates another schematic diagram of a data synchronization method in accordance with the present application;

fig. 5 schematically shows a further schematic diagram of a data synchronization method according to the application;

FIG. 6 schematically illustrates an interactive schematic of a data synchronization system in accordance with the present application;

fig. 7 schematically shows a schematic structural diagram of a data synchronizing device according to the application;

FIG. 8 schematically illustrates a structural diagram of a computing device in accordance with the present application;

fig. 9 schematically shows a schematic structural diagram of a server according to the present application.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present application will be described below with reference to several exemplary embodiments. It should be understood that these examples are given solely to enable those skilled in the art to better understand and practice the present application and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the application may be implemented as a system, apparatus, device, system, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

How to run CPU instructions more efficiently in hardware devices, improving device computing power, is one of the research directions. In the related art, a DMA transmission technology is adopted to realize high-speed data transmission between the external equipment and the memory, allow the direct reading and writing of data between the external equipment and the memory, improve the data transmission efficiency of hardware equipment, reduce the consumption of CPU computing resources and assist in improving the computing power of the equipment.

The DMA engine is a hardware component used for completing data reading and writing between the memory and the external device and data reading and writing between the memories under the condition of no CPU intervention. In the related art, because the completion time of reading and writing of different data is inconsistent, in order to improve the transmission efficiency, the DMA engine generally receives the data fed back at different times in a burst transmission manner, however, the execution efficiency of the CPU instruction is reduced in this manner, and the calculation power of the device is affected. Taking the AXI bus protocol as an example, assume that an instruction of the CPU needs to read four data, namely data_0, data_1, data_2 and data_3, to start executing an operation. Then, even if the data_1, data_2 and data_3 are read faster and are preferentially fed back to the DMA engine before the data_0, the CPU still needs to wait until the data_0 is read completely to execute the instruction, which greatly reduces the execution efficiency of the CPU instruction.

In order to overcome the technical problems, according to the embodiments of the present application, a data synchronization method, system and related device are provided.

Compared with the mode of disordered receiving and disordered feedback in the prior art, the embodiment of the application ensures that the DMA engine can acquire the execution progress of the data carrying task based on the second command sequence number after disordered receiving the request data by the first command sequence number indicating the task execution sequence in the data carrying request and the second command sequence number matched with the first command sequence number in the request data, further informs the CPU of the execution progress of the data carrying task through the synchronous information, realizes the data synchronization between the DMA engine and the CPU, overcomes the problems of low execution efficiency, excessively long occupied storage space and the like of the CPU command caused by the disordered receiving and transmitting mode, improves the data transmission efficiency and improves the calculation power of equipment.

As an alternative embodiment, the number of data synchronizing devices is one or more. In some examples, the data synchronization device may be implemented as a logic unit deployed inside the chip; in other examples, other forms may be deployed in digital circuit structures, and the application is not limited. For example, the data synchronizing means may be provided in the processing device of various devices, such as terminal devices, servers.

Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only, and not for any limiting sense.

Exemplary method

A method for communication according to an exemplary embodiment of the present application is described below with reference to fig. 1 in conjunction with a specific application scenario. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in any way. Rather, embodiments of the application may be applied to any scenario where applicable.

The following describes the execution of the data synchronization method in conjunction with the following examples. Fig. 1 is a flowchart of a data synchronization method according to an embodiment of the present application. The method is applied to a DMA engine. As shown in fig. 1, the method comprises the steps of:

step 101, receiving a data carrying task from a Central Processing Unit (CPU);

step 102, sending a data handling request to an external device based on the descriptor;

step 103, receiving request data returned by the external equipment based on the data handling request;

and 104, according to the second instruction sequence number carried by the request data, sending the synchronous information corresponding to the data carrying task to the CPU.

First, before describing the specific embodiments of steps 101 to 104, the execution subject, i.e., the DMA engine, according to the embodiments of the present application will be described.

The DMA engine is a hardware component used for completing data reading and writing between the memory and the external device and data reading and writing between the memories under the condition of no CPU intervention. Specifically, the DMA engine can realize high-speed data transmission between the external equipment and the memory or between the memory and the memory under the condition that CPU intervention is not needed, so that the data transmission efficiency of the hardware equipment is effectively improved, the CPU is prevented from applying extra calculation force in the aspect of data transmission, the consumption of CPU computing resources is reduced, and the execution efficiency of CPU instructions is improved.

The external device according to the embodiment of the present application may be a Register (DR) of the external device, for example, a Data Register of an ADC, a Data Register of a serial port, and the like. Memories, including but not limited to, a run memory (e.g., SRAM) and a program memory (e.g., flash), are memory spaces for storing variables, arrays, and program code.

In the related art, because the completion time of reading and writing of different data is inconsistent, in order to improve the transmission efficiency, the DMA engine often receives the data fed back by different times out of order, so that the execution efficiency of the CPU instruction is reduced, and the calculation power of the device is affected. Taking the AXI bus protocol as an example, assume that an instruction of the CPU needs to read four data, namely data_0, data_1, data_2 and data_3, to start executing an operation. Then, even if the data_1, data_2 and data_3 are read faster and are preferentially fed back to the DMA engine before the data_0, the CPU still needs to wait until the data_0 is read completely to execute the instruction, which greatly reduces the execution efficiency of the CPU instruction. In addition, during the waiting period of the CPU, the CPU occupies the storage space to store the received partial data, and the access of the subsequent data is affected.

In order to solve at least one technical problem, the embodiment of the application provides a data synchronization method, which is used for realizing data synchronization between a DMA engine and a CPU, so that request data fed back to the DMA engine in disorder can be marked with an association with the execution sequence of a data carrying task through an instruction sequence number, thereby helping the CPU side acquire the request data consistent with the execution sequence of the data carrying task, further improving the data transmission efficiency, reducing the consumption of CPU computing resources and improving the computing power of equipment.

Specifically, in step 101, the DMA engine receives a data handling task from the CPU.

In an embodiment of the present application, the data handling task includes a descriptor (descriptor). The descriptor is used to indicate various configurations involved in the data handling task, such as data handling order, start point of data handling, end point of data handling, amount of data to be handled. The above configuration types and configuration numbers can be adjusted according to practical situations, and are only examples herein.

In step 102, the DMA engine sends a data handling request to the external device based on the descriptor.

In an embodiment of the present application, the plurality of data handling requests include a data read request (read responses) and/or a data write request (write responses). It will be appreciated that the data read request is for the DMA engine to invoke data in its own memory space to perform the operation indicated by the CPU. The data write request is that the DMA engine writes data into its own memory space under the direction of the CPU. The memory space herein includes various types of memory or storage media that the DMA engine may call.

In the embodiment of the application, in order to cope with the problems of efficiency reduction and the like caused by out-of-order data transmission and reception, the data carrying requests respectively carry command serial numbers (command IDs). For ease of distinguishing instruction sequence numbers herein are referred to as first instruction sequence numbers, which are used to indicate the order of execution of the data-handling requests. Therefore, the task execution sequence corresponding to the data carrying request can be indicated through the first instruction sequence number, and an implementation basis is provided for marking the follow-up request data.

Continuing with the AXI bus protocol as an example, assume that one instruction of the CPU needs to sequentially read four data, data_0, data_1, data_2, and data_3, for performing the correlation operation. After receiving the instruction, the DMA engine extracts a descriptor corresponding to the data reading task from the instruction, and further generates a data carrying request for reading the data based on the data reading requirement in the descriptor. For example, data read request 0 (corresponding to data_0), data read request 1 (corresponding to data_1), data read request 2 (corresponding to data_2), and data read request 3 (corresponding to data_3). The data reading requests respectively carry first instruction sequence numbers consistent with the CPU data reading sequence, namely, the data reading request 0 carries first instruction sequence numbers 0, the data reading request 1 carries first instruction sequence numbers 1, the data reading request 2 carries first instruction sequence numbers 2, the data reading request 3 carries first instruction sequence numbers 3, the instruction sequence numbers are ordered from small to large based on the numerical values, and the execution sequence corresponds to the execution sequence of the data reading tasks from front to back. Further, the DMA engine transmits the generated 4 data read requests to the corresponding external device side or other memory side, respectively.

Further alternatively, in order to preferentially return the data that needs to be preferentially read, the transmission timing of the above-described data read request may be set based on the execution order of the data read requests. Of course, the data read requests may be sent simultaneously, or out of order. In any of the above cases, the data reading request carries the first instruction sequence number, which can indicate the actual requirement of the data reading request on the execution sequence, so that the external device or other memories carry the corresponding second instruction sequence number when the request data is returned, so as to indicate the execution sequence corresponding to the request data in the original data reading task.

Of course, in practical applications, other numbers of data read requests may be generated according to storage locations or other factors, and this is not a limitation in the examples. For example, three data read requests are generated, including data read request x (corresponding to data_0, data_1), data read request y (corresponding to data_2), and data read request z (corresponding to data_3). Based on this, the first instruction sequence number may also be set corresponding to the execution order of the data reading tasks from first to second, based on the order of the data sequence numbers to be read from small to large.

In step 103, the DMA engine receives request data returned by the external device based on the data transfer request.

In the implementation of the application, the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request. The second instruction sequence number is mainly used for indicating the execution order corresponding to the request data in the original data carrying task.

It will be appreciated that, in order to more intuitively reflect the execution sequence of the data handling task corresponding to the request data, the second instruction sequence number carried in the request data may be set to the same value as the first instruction sequence number carried in the corresponding data handling request.

Continuing with the AXI bus protocol as an example, assume that data read requests are in order of execution from first to last, comprising the following: data read request 0 (corresponding to data_0), data read request 1 (corresponding to data_1), data read request 2 (corresponding to data_2), data read request 3 (corresponding to data_3). Then, the DMA engine receives the request data returned by the data read request from the external device out of order, namely, data_1 (carrying the second instruction sequence number 1), data_0 (carrying the second instruction sequence number 0), data_3 (carrying the second instruction sequence number 3), and data_2 (carrying the second instruction sequence number 2). As can be seen from the carried second instruction number, the data_0, data_1, data_2 and data_3 are arranged from the beginning to the end according to the execution order of the corresponding data reading task.

It should be noted that, the matching relationship between the first instruction sequence number and the second instruction sequence number may be maintained by a mapping table in the DMA engine. Alternatively, a sequence number allocator for maintaining this relationship may be provided in the DMA engine. As shown in fig. 2, in the infrastructure of the DMA engine, the sequence number allocator is mainly used to manage the instruction sequence numbers to be allocated and allocated. Specifically, the sequence number allocator allocates the command sequence numbers to be allocated to different data handling tasks before the data handling tasks are started, recovers the allocated command sequence numbers from the completed data handling tasks after the data handling tasks are finished, sets the allocated command sequence numbers to be allocated, and waits for the next DMA data transmission task.

Further alternatively, it is also possible to monitor whether there is an instruction sequence number in a state to be allocated. If the instruction sequence number which is not in the state to be allocated is monitored, the fact that the currently unfinished data transmission task exceeds the processing capacity of the DMA engine is indicated, and under the condition, the data transmission task needs to be suspended so as to avoid overlong waiting time of the CPU. In this case, therefore, a stop instruction for instructing to suspend the generation process of the data transfer request may be generated. Therefore, the problem of overlong CPU waiting time caused by that the request data fed back in disorder is not fed back is avoided, the data transmission efficiency is ensured, and the execution efficiency of CPU instructions is improved in an auxiliary mode.

After receiving the request data, in step 104, the DMA engine sends synchronization information (sync message) corresponding to the data transfer task to the CPU according to the second command sequence number carried by the request data.

In the embodiment of the application, the synchronization information is used for indicating the execution progress of the data carrying task. For example, the execution progress of the data transfer task may be an identifier corresponding to a returned data segment in the data transfer task. Or, the execution progress of the data handling task is taken as the proportion of the storage space of the written task completion mark in the total storage space of the reorder buffer. Or, the number of the data carrying tasks corresponding to the task completion marks is directly used as the execution progress of the data carrying tasks.

Continuing with the AXI bus protocol as an example, assume that the DMA engine receives request data out of order from the external device, respectively: data_3 (carrying the second instruction number 3), data_2 (carrying the second instruction number 2).

Based on the above assumption, it can be known from the second instruction sequence number that the execution order of the currently received request data data_2, data_3 in the data handling task is the third bit and the fourth bit, respectively. Based on the above, the synchronization information corresponding to the data transfer task may be generated and sent to the CPU to indicate that the request data with the third and fourth execution order in the data transfer task has been received.

In the above or the following embodiments, in order to further improve the data transmission efficiency of the DMA engine and achieve data synchronization between the DMA engine and the CPU, the embodiments of the present application further provide the following data synchronization methods.

Specifically, in an alternative embodiment, it is assumed that a Reorder Buffer (ROB) is provided in the DMA engine. Based on this, in step 104, according to the second instruction sequence number carried by the request data, an alternative implementation manner of sending the synchronization information corresponding to the data handling task to the CPU, as shown in fig. 3, includes the following steps:

step 301, recording the request data received by the DMA engine in a reorder buffer based on a second command sequence number carried by the request data, to obtain the execution progress of the data handling task;

step 302, generating synchronization information according to the execution progress of the data handling task, and transmitting the synchronization information to the CPU according to the transmission sequence indicated by the second instruction sequence number.

In the above steps 301 to 302, by introducing the ROB into the DMA engine, the synchronization information is released according to the execution sequence of the data handling task, so that the CPU can trigger the execution flow of the program in advance according to the execution progress indicated by the synchronization information. Therefore, by introducing the ROB, the progress tracking of the DMA engine on the data handling tasks (such as reading/writing data) can be realized, so that the data synchronization between the DMA engine and the CPU is realized, the execution progress of the data handling tasks is kept consistent with the execution sequence of the CPU, and the execution efficiency of CPU instructions is effectively improved.

Based on the above description, as an alternative embodiment, step 301 records, in the reorder buffer, the request data received by the DMA engine based on the second instruction sequence number carried by the request data, to obtain an alternative implementation of the execution progress of the data handling task, as shown in fig. 4, may be:

step 401, obtaining an instruction sequence number from request data;

step 402, using the instruction sequence number as an index address, searching a storage space corresponding to the index address in a reordering buffer;

step 403, writing a task completion mark in a storage space corresponding to the index address; the task completion mark is used for indicating that the data carrying task corresponding to the request data is completed;

step 404, the writing condition of the task completion mark in the reorder buffer is used as the execution progress of the data handling task.

Further, it is assumed that a plurality of storage areas are partitioned in the reorder buffer. An alternative implementation of generating the synchronization information according to the execution progress of the data handling task in step 302, as shown in fig. 4, may be:

step 405, monitoring the task completion mark writing situation corresponding to each storage area in the reorder buffer;

step 406, if it is detected that all the storage spaces in one storage area have written task completion marks, generating synchronization information of the data handling task corresponding to each task completion mark in the storage area.

Through the steps 401 to 406, the progress tracking of the data handling task (such as reading/writing data) can be further improved, so that the execution progress of the data handling task is consistent with the execution sequence of the CPU, and the execution efficiency of the CPU instruction is effectively improved.

Further alternatively, after generating the synchronization information of the data handling task corresponding to each task completion flag in the storage area, the instruction sequence number corresponding to each task completion flag in the storage area may also be set to a state to be allocated.

In the embodiment of the application, the ROB can write data through a plurality of ports in the same period and empty a plurality of storage areas in the reorder buffer in the same period.

In order to cope with the operation requirements under different application scenes, the relevant parameters of the reordering buffer are dynamically configured through a control state register (CSR-configurable), so that the data reading and writing efficiency of DMA is further improved, and the operation efficiency of CPU instructions is improved. Wherein the relevant parameters include, but are not limited to: the number of memory cells contained in the memory area, the number of memory cells that are releasable at a single time. Each memory location may be written with at least one task completion flag. For example, the number of memory cells included in the memory area may be configured as 16, 32, 64, 128 addresses.

As an alternative embodiment, as shown in FIG. 5, a counter (counter) may be provided in the DMA engine to record the number of scan cycles. Specifically, the respective storage areas of the ROB are pointed to in the order of execution of the data-handling tasks.

Assuming that the scanning period is set to 8 periods, it is checked whether or not the memory cells in the respective memory areas in the ROB have all been written with the task completion flag (e.g., the memory cell is set to 1) every 8 periods. If the corresponding data transfer tasks (either the entire data transfer tasks or a portion of the data transfer tasks) in the storage unit are completed, the ROB releases the synchronization signals corresponding to the data transfer tasks and returns the second instruction numbers corresponding to the synchronization signals to the instruction generator (Command Generation Module).

In some examples, as shown in fig. 2, based on the infrastructure of the DMA engine, if it is detected that all memory space in the ROB is full, the sequence number allocator may feed back to the request generator (Request Generator) to cause the request generator to stop generating requests. Further, based on the DMA engine infrastructure shown in fig. 2, if an available memory space is detected in the ROB, the process Tracker (Progress Tracker) issues a credit (credits) to the sequence number allocator, which indicates that there is a free memory area in the ROB.

In the above or below embodiments, it is monitored whether there is memory space in the reorder buffer for unbound data handling tasks. The storage space bound with the data carrying task is used for recording the execution progress corresponding to the data carrying task. If it is detected that all the storage spaces in the reorder buffer are bound with data handling tasks, it is indicated that the currently outstanding data transfer tasks have exceeded the processing capacity of the DMA engine, in which case the data transfer tasks need to be suspended to avoid excessive CPU latency. In this case, therefore, a stop instruction for instructing to suspend the generation process of the data transfer request may be generated.

In this embodiment, by the first instruction sequence number indicating the execution sequence of the task in the data handling request and the second instruction sequence number matching the first instruction sequence number in the request data, the DMA engine can acquire the execution progress of the data handling task based on the second instruction sequence number after receiving the request data in disorder, and further inform the CPU of the execution progress of the data handling task through the synchronization information, so as to realize data synchronization between the DMA engine and the CPU, solve the problems of low execution efficiency of the CPU instruction, excessive occupation of the storage space, and the like caused by the disorder receiving and transmitting mode, improve the data transmission efficiency, and improve the computing power of the device.

Having described the method of an embodiment of the present application, a description of a data synchronization system of an embodiment of the present application follows with reference to fig. 6. In the data synchronization system shown in fig. 6, at least: a CPU and a DMA engine.

In this data synchronization system, the CPU is mainly configured to perform the following functions, namely: transmitting a data handling task to the DMA engine; wherein the data handling task includes a descriptor; and receiving the synchronous information corresponding to the data carrying task.

A DMA engine configured primarily to perform the following functions, namely: receiving a data carrying task from a Central Processing Unit (CPU); wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests; receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request; according to the second command sequence number carried by the request data, synchronous information corresponding to the data carrying task is sent to a CPU; the synchronization information is used for indicating the execution progress of the data handling task.

In an alternative embodiment, a reorder buffer is provided in the DMA engine;

the DMA engine is configured to, when sending the synchronization information corresponding to the data carrying task to the CPU according to the second instruction sequence number carried by the request data:

recording the request data received by the DMA engine in the reordering buffer based on a second instruction sequence number carried by the request data to obtain the execution progress of the data handling task;

and generating the synchronous information according to the execution progress of the data carrying task, and transmitting the synchronous information to a CPU according to the transmission sequence indicated by the second instruction sequence number.

In an alternative embodiment, the DMA engine is configured to, when obtaining the execution progress of the data handling task, record, in the reorder buffer, the request data that has been received by the DMA engine based on the second instruction sequence number carried by the request data:

acquiring the instruction sequence number from the request data;

using the instruction sequence number as an index address, and searching a storage space corresponding to the index address in the reordering buffer;

writing a task completion mark in a storage space corresponding to the index address; the task completion mark is used for indicating that the data carrying task corresponding to the request data is completed;

And taking the writing situation of the task completion mark in the reordering buffer as the execution progress of the data carrying task.

In an alternative embodiment, the reorder buffer is divided into a plurality of storage areas;

the DMA engine, when generating the synchronization information according to the execution progress of the data handling task, is configured to:

monitoring task completion mark writing conditions corresponding to all storage areas in the reordering buffer;

if it is detected that all the storage spaces in one storage area are written with task completion marks, synchronous information of the data carrying task corresponding to each task completion mark in the storage area is generated.

In an alternative embodiment, the DMA engine is further configured to:

after the synchronous information of the data handling task corresponding to each task completion mark in the storage area is generated, the instruction sequence number corresponding to each task completion mark in the storage area is set to be in a state to be allocated.

In an alternative embodiment, the DMA engine is further configured to:

detecting the proportion of the instruction sequence number in the state to be allocated in all instruction sequence numbers, and dynamically configuring the relevant parameters of the reordering buffer region through a control state register based on the proportion;

Wherein the related parameters comprise the number of the storage units contained in the storage area and the number of the storage units which can be released once.

In an alternative embodiment, the descriptor includes an execution sequence of a plurality of data handling tasks;

the DMA engine, when sending a data handling request to an external device based on the descriptor, is configured to:

acquiring the execution sequence of the plurality of data handling tasks from the descriptor;

according to the execution sequence of the plurality of data carrying tasks, respectively configuring the instruction serial numbers in the state to be allocated to the plurality of data carrying tasks as first instruction serial numbers corresponding to the plurality of data carrying tasks; the first instruction sequence number allocated to the data carrying task corresponds to the execution sequence of the data carrying task;

generating data carrying requests corresponding to the data carrying tasks based on the allocated first instruction sequence numbers and the configuration information of the plurality of data carrying tasks in the descriptors;

and sending the data carrying requests corresponding to the data carrying tasks to the external equipment.

In an alternative embodiment, the system further comprises a process tracker configured to:

Monitoring whether a storage space for unbound data handling tasks exists in the reordering buffer; the storage space bound with the data carrying task is used for recording the execution progress corresponding to the data carrying task;

and if all the storage spaces in the reordering buffer are detected to be bound with the data carrying task, generating a stop instruction, wherein the stop instruction is used for indicating to suspend the generation process of the data carrying request.

In the above or the following embodiments, in an embodiment of the present application, there is further provided a DMA engine, where the DMA engine is applied to a data synchronization system;

The DMA engine is used for implementing each function in the data synchronization method shown in fig. 1, and is not described herein in detail.

Having described the method and system of embodiments of the present application, a description of a data synchronization apparatus of embodiments of the present application follows with reference to fig. 7.

The data synchronization device 70 shown in fig. 7 in the embodiment of the present application can implement the steps corresponding to the data synchronization method in the embodiment corresponding to fig. 1. The functions performed by the data synchronizing device 70 may be realized by hardware, or may be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The data synchronization device 70 is applied to a server device or a terminal device. The data synchronization device 70 may refer to the operations performed in the embodiment corresponding to fig. 1, which are not described herein.

In some embodiments, the data synchronization device 70 includes a transceiver module 701 and a processing module 702.

A transceiver module 701 configured to receive a data handling task from a central processing unit CPU; wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests; receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request;

the processing module 702 is configured to send synchronization information corresponding to the data handling task to the CPU according to the second instruction sequence number carried by the request data; the synchronization information is used for indicating the execution progress of the data handling task.

In some embodiments, the data synchronizer 70 has a reorder buffer provided therein;

the processing module 702 is configured to, when sending the synchronization information corresponding to the data handling task to the CPU according to the second instruction sequence number carried by the request data:

Recording the request data received by the data synchronization device 70 in the reorder buffer based on the second instruction sequence number carried by the request data, so as to obtain the execution progress of the data handling task;

In some embodiments, the processing module 702, based on the second instruction sequence number carried by the request data, records, in the reorder buffer, the request data received by the data synchronization device 70, and when obtaining the execution progress of the data handling task, is configured to:

acquiring the instruction sequence number from the request data;

In some embodiments, the reorder buffer is partitioned into a plurality of storage regions;

the processing module 702, when generating the synchronization information according to the execution progress of the data handling task, is configured to:

In some embodiments, the processing module 702 is further configured to:

after the instruction sequence numbers corresponding to the task completion marks in the storage area are set to be in a state to be allocated, detecting the proportion of the instruction sequence numbers in the state to be allocated in all instruction sequence numbers, and dynamically configuring relevant parameters of the reordering buffer area through a control state register based on the proportion;

In some embodiments, the descriptor includes an order of execution of a plurality of data-handling tasks;

the transceiver module 701, when sending a data handling request to an external device based on the descriptor, is configured to:

In some embodiments, the apparatus further comprises a process tracker configured to:

Having described the methods, systems, and apparatus of embodiments of the present application, a description will now be made of a computer-readable storage medium of embodiments of the present application, which may be an optical disk having a computer program (i.e., a program product) stored thereon that, when executed by a processor, performs the steps described in the above-described method embodiments, e.g., receiving a data-carrying task from a central processing unit CPU; wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests; receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request; according to the second command sequence number carried by the request data, synchronous information corresponding to the data carrying task is sent to a CPU; the synchronization information is used for indicating the execution progress of the data handling task. The specific implementation of each step is not repeated here.

It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.

The data synchronization device 70 in the embodiment of the present application is described above from the point of view of a modularized functional entity, and the server and the terminal device for performing the data synchronization method in the embodiment of the present application are described below from the point of view of hardware processing, respectively.

It should be noted that, in the embodiment of the data synchronization apparatus of the present application, the entity device corresponding to the transceiver module 701 shown in fig. 7 may be an input/output unit, a transceiver, a radio frequency circuit, a communication module, an input/output (I/O) interface, etc., and the entity device corresponding to the processing module 702 may be a processor. The data synchronization device 70 shown in fig. 7 may have a structure as shown in fig. 8, and when the data synchronization device 70 shown in fig. 7 has a structure as shown in fig. 8, the processor and the transceiver in fig. 8 can implement the same or similar functions as the processing module 702 and the transceiver module 701 provided in the foregoing device embodiment corresponding to the device, and the memory in fig. 8 needs to be called when executing the data synchronization method described above.

Fig. 9 is a schematic diagram of a server structure provided in an embodiment of the present application, where the server 1100 may vary considerably in configuration or performance, and may include one or more central processing units (central processing units, CPU) 1122 (e.g., one or more processors) and memory 1132, one or more storage mediums 1130 (e.g., one or more mass storage devices) storing applications 1142 or data 1144. Wherein the memory 1132 and the storage medium 1130 may be transitory or persistent. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 1122 may be provided in communication with a storage medium 1130, executing a series of instruction operations in the storage medium 1130 on the server 1100.

The Server 1100 may also include one or more power supplies 1127, one or more wired or wireless network interfaces 1180, one or more input/output interfaces 1159, and/or one or more operating systems 1141, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.

The steps performed by the server in the above embodiments may be based on the structure of the server 1100 shown in fig. 9. For example, the steps performed by the data synchronizer 80 shown in fig. 9 in the above-described embodiment may be based on the server structure shown in fig. 9. For example, the CPU 1122 may perform the following operations by calling instructions in the memory 1132:

receiving data handling tasks from a Central Processing Unit (CPU) through an input-output interface 1159 of the sub-connection unit; wherein the data handling task includes a descriptor; sending a data handling request to an external device based on the descriptor; the data carrying request carries a first instruction sequence number; the first instruction sequence number indicates an execution order of the data handling requests; receiving request data returned by the external equipment based on the data handling request; the second instruction sequence number carried in the request data is matched with the first instruction sequence number carried in the corresponding data carrying request; according to the second command sequence number carried by the request data, synchronous information corresponding to the data carrying task is sent to a CPU; the synchronization information is used for indicating the execution progress of the data handling task.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The above description has been made in detail on the technical solutions provided by the embodiments of the present application, and specific examples are applied in the embodiments of the present application to illustrate the principles and implementation manners of the embodiments of the present application, where the above description of the embodiments is only for helping to understand the methods and core ideas of the embodiments of the present application; meanwhile, as for those skilled in the art, according to the idea of the embodiment of the present application, there are various changes in the specific implementation and application scope, and in summary, the present disclosure should not be construed as limiting the embodiment of the present application.

Claims

1. A method of data synchronization, applied to a DMA engine, comprising:

2. The method of claim 1, wherein a reorder buffer is provided in the DMA engine;

the step of sending the synchronization information corresponding to the data carrying task to the CPU according to the second instruction sequence number carried by the request data comprises the following steps:

3. The method according to claim 2, wherein the recording the request data received by the DMA engine in the reorder buffer based on the second instruction sequence number carried by the request data, to obtain the execution progress of the data handling task, comprises:

Acquiring the instruction sequence number from the request data;

4. The method of claim 3, wherein the reorder buffer is partitioned into a plurality of storage regions;

the generating the synchronization information according to the execution progress of the data handling task includes:

5. The method of claim 4, wherein generating synchronization information for the data handling tasks corresponding to the task completion flags in the storage area further comprises:

And setting the instruction sequence numbers corresponding to the task completion marks in the storage area to be in a state to be allocated.

6. The method of claim 5, wherein after setting the instruction sequence number corresponding to each task completion flag in the storage area to the to-be-allocated state, further comprising:

7. The method of claim 1, wherein the descriptor includes an order of execution of a plurality of data-handling tasks;

the sending a data handling request to an external device based on the descriptor includes:

8. The method as recited in claim 1, further comprising:

9. A DMA engine, the DMA engine comprising:

10. A data synchronization system, the system comprising a CPU and a DMA engine; wherein,,

11. The system of claim 10, wherein a reorder buffer is provided in the DMA engine;

12. The system of claim 11, wherein the DMA engine, when recording the request data received by the DMA engine in the reorder buffer based on the second instruction sequence number carried by the request data, is configured to:

acquiring the instruction sequence number from the request data;

13. The system of claim 12, wherein the reorder buffer defines a plurality of memory regions;

14. The system of claim 13, wherein the DMA engine is further configured to:

15. The system of claim 14, wherein the DMA engine is further configured to:

16. The system of claim 10, wherein the descriptor includes an order of execution of a plurality of data-handling tasks;

17. The system of claim 10, further comprising a process tracker configured to:

18. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the data synchronization method of any one of claims 1-8.

19. A computing device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data synchronization method according to any one of claims 1-8 when executing the computer program.