CN111158936A - Method and system for queue exchange information - Google Patents

Method and system for queue exchange information Download PDF

Info

Publication number
CN111158936A
CN111158936A CN201911415907.3A CN201911415907A CN111158936A CN 111158936 A CN111158936 A CN 111158936A CN 201911415907 A CN201911415907 A CN 201911415907A CN 111158936 A CN111158936 A CN 111158936A
Authority
CN
China
Prior art keywords
queue
dma
pointer
message
present application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911415907.3A
Other languages
Chinese (zh)
Other versions
CN111158936B (en
Inventor
黄好城
王祎磊
伍德斌
兰彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Starblaze Technology Co ltd
Original Assignee
Beijing Starblaze Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Starblaze Technology Co ltd filed Critical Beijing Starblaze Technology Co ltd
Priority to CN201911415907.3A priority Critical patent/CN111158936B/en
Publication of CN111158936A publication Critical patent/CN111158936A/en
Application granted granted Critical
Publication of CN111158936B publication Critical patent/CN111158936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Abstract

The application discloses a method and a system for queue exchange information. The disclosed method for exchanging information through a queue includes: the first producer writes the first message into a queue; the first consumer obtains a first message from the queue; the first consumer writes the processing result of the first message into the first message in the queue; wherein, the processing result of the first message forms a second message; the second consumer retrieves the second message from the queue. The single queue in the memory is used for the first producer to submit the first information to the first consumer and for the first consumer to submit the second information to the second consumer, so that the requirement on the storage space of the memory is reduced.

Description

Method and system for queue exchange information
Technical Field
The present application relates to the field of electronic devices, and in particular, to a method and a system for queue exchange information.
Background
DMA (Direct Memory Access) transmission may be performed between the electronic device and an external device. As shown in fig. 1, the electronic device 100 includes a physical layer (PHY) module 110, a DMA module 120, a memory 130, and a Central Processing Unit (CPU) 140.
The external device 300 is coupled to the electronic device 100 through the PHY module 110, and DMA transfer between the electronic device 100 and the external device 300 is achieved. The electronic device 100 is also coupled to a memory 400. In the DMA transfer, data of the memory 400 is transferred to the external device 300 or data provided from the external device 300 is stored in the memory 400 under the control of the DMA module 120. By way of example, memory 400 is a DRAM (dynamic random access memory), and memory 400 has a larger storage capacity than memory 130.
The CPU 140 instructs the DMA module 120 to perform DMA transfers by generating DMA commands (or DMA descriptors) indicating one or more DMA transfers between the electronic device 100 and the external device 300 and writing to the memory 130. Also, in response to completion of the DMA transfer, the DMA module 120 writes the execution result of the DMA command to the memory 130, thereby making the CPU 140 aware that the DMA command processing is completed and obtaining the execution result of the DMA command. Thus, in transferring DMA commands, the CPU 140 and DMA module 120 form a pair of producer and consumer; in transferring the results of the execution of the DMA commands, the DMA module 120 and the CPU 140 form another pair of producer and consumer.
In the prior art, two queues are provided in the memory 130, one queue is used for transferring DMA commands, and the other queue is used for transferring the execution results of the DMA commands. The use of two queues places high memory space requirements on the memory while reducing data processing speed.
Disclosure of Invention
The application aims to provide a method for exchanging information through a queue and a system for processing the queue, which are used for reducing the requirement on the storage space of a memory and improving the data processing speed.
According to a first aspect of the present application, there is provided a first method of exchanging information through a queue according to the first aspect of the present application, comprising:
the first producer writes the first message into a queue;
the first consumer obtains a first message from the queue;
the first consumer writes the processing result of the first message into the first message in the queue; wherein, the processing result of the first message forms a second message;
the second consumer retrieves the second message from the queue.
According to a first method of exchanging information through a queue according to a first aspect of the present application, there is provided a second method of exchanging information through a queue according to the first aspect of the present application, wherein a first producer writes a first message into the queue in accordance with a write pointer of the queue;
the first consumer obtains a first message from the queue according to the read pointer of the queue;
the first consumer writes the second message to the queue according to the read pointer.
According to a first method of exchanging information through a queue according to a first aspect of the present application, there is provided a third method of exchanging information through a queue according to the first aspect of the present application, wherein the first producer writes the first message into the queue in accordance with a write pointer of the queue;
the first consumer obtains a first message from the queue according to the read pointer of the queue;
the first consumer records the position of the first message in the queue and writes a second message to the queue according to the recorded position.
The fourth method of exchanging information through a queue according to the first aspect of the present application is provided, wherein the second consumer retrieves the second message from the queue in accordance with the read completion pointer of the queue.
According to one of the second to fourth methods of exchanging information through a queue of the first aspect of the present application, there is provided a fourth method of exchanging information through a queue of the first aspect of the present application, comprising:
updating a write pointer in response to the first message being written to the queue;
the read pointer is updated in response to the second message being written to the queue.
According to a fourth method of exchanging information through a queue of the first aspect of the present application, there is provided a sixth method of exchanging information through a queue of the first aspect of the present application, wherein the method includes: the read completion pointer is updated in response to the second consumer retrieving the second message from the queue.
According to a fourth method of exchanging information through a queue of the first aspect of the present application, there is provided a seventh method of exchanging information through a queue of the first aspect of the present application, wherein the method comprises: the read completion pointer is written to memory, from which the first producer retrieves the read completion pointer.
According to a fourth method of exchanging information through a queue of the first aspect of the present application, there is provided an eighth method of exchanging information through a queue of the first aspect of the present application, wherein the method comprises: the second consumer provides the read completion pointer to the first producer.
The second or third method for exchanging information via a queue according to the first aspect of the present application provides a ninth method for exchanging information via a queue according to the first aspect of the present application, wherein the method comprises:
the first producer providing a write pointer to the first consumer;
the first consumer provides the read pointer to the second consumer.
A tenth method of exchanging information through a queue according to the first aspect of the present application is provided in one of the first to ninth methods of exchanging information through a queue according to the first aspect of the present application, wherein the write pointer includes a wrap around flag or information of the number of times a wrap around occurred.
According to one of the fourth to ninth methods of exchanging information through a queue of the first aspect of the present application, there is provided the eleventh method of exchanging information through a queue of the first aspect of the present application, wherein the second consumer suspends retrieving the second message from the queue in response to the read completion pointer and the read pointer pointing to the same address.
According to one of the fourth to eleventh methods of exchanging information through a queue of the first aspect of the present application, there is provided the twelfth method of exchanging information through a queue of the first aspect of the present application, wherein the first producer suspends writing the first message to the queue in response to the read completion pointer leading the write pointer.
According to one of the fourth to twelfth methods of exchanging information through a queue of the first aspect of the present application, there is provided the thirteenth method of exchanging information through a queue of the first aspect of the present application, wherein the read completion pointer points to a head of the queue; the write pointer points to the tail of the queue.
A fourteenth method of exchanging information via queues according to the first aspect of the present application is provided according to one of the first to thirteenth methods of exchanging information via queues according to the first aspect of the present application, wherein the queues are provided in a memory.
A fifteenth method of exchanging information through a queue according to the first aspect of the present application is provided according to one of the first through fourteenth methods of exchanging information through a queue of the first aspect of the present application, wherein the first producer and the second consumer are direct memory access accelerators;
the first consumer is a direct memory access processing module;
the first message is a direct memory access command and the second message is a processing result of the direct memory access command.
According to a fourteenth method of exchanging information via a queue of the first aspect of the present application, there is provided the sixteenth method of exchanging information via a queue of the first aspect of the present application, wherein the memory includes a plurality of queues independent of each other, and the first producer, the first consumer, and the second consumer exchange information via the plurality of queues.
According to a second aspect of the present application, there is provided a system of processing a queue according to the second aspect of the present application, comprising:
the first producer writes the first message into the queue;
the first consumer acquires a first message from the queue; writing the processing result of the first message into the first message in the queue; wherein, the processing result of the first message forms a second message;
and the second consumer acquires the second message from the queue.
A system for processing queues according to the second aspect of the present application, wherein the queues comprise:
a write pointer indicating an address to write a message to the queue;
a read pointer to indicate an address at which the first consumer reads the message from the queue.
The system for processing a queue according to the second aspect of the present application, there is provided the system for processing a queue according to the third aspect of the present application, wherein the first producer writes the first message to the queue according to a write pointer of the queue;
the first consumer obtains a first message from the queue according to the read pointer of the queue;
the first consumer writes the second message to the queue according to the read pointer.
The system for processing a queue according to the second aspect of the present application provides the system for processing a queue according to the fourth aspect of the present application, wherein the first producer writes the first message into the queue according to a write pointer of the queue;
the first consumer obtains a first message from the queue according to the read pointer of the queue;
the first consumer records the position of the first message in the queue and writes a second message to the queue according to the recorded position.
The system of the fifth processing queue according to the second aspect of the present application is provided according to one of the systems of the first to fourth processing queues according to the second aspect of the present application, wherein the queue further comprises a read completion pointer indicating an address at which the second consumer reads a message from the queue.
A fifth system for processing queues according to the second aspect of the present application provides the sixth system for processing queues according to the second aspect of the present application, wherein the second consumer retrieves the second message from the queue according to the read completion pointer of the queue.
There is provided the system of the seventh processing queue according to the second aspect of the present application, wherein the write pointer is updated in response to the first message being written to the queue;
the read pointer is updated in response to the second message being written to the queue.
The system of the eighth processing queue according to the second aspect of the present application is provided according to one of the systems of the fifth to seventh processing queues according to the second aspect of the present application, wherein the read completion pointer is updated in response to the second consumer retrieving the second message from the queue.
The system of the ninth processing queue according to the second aspect of the present application is provided by a system of the fifth processing queue according to the second aspect of the present application, wherein the system further comprises a memory, the read completion pointer is written into the memory, and the first producer acquires the read completion pointer from the memory.
The system of fifth processing queues according to the second aspect of the present application provides the system of tenth processing queues according to the second aspect of the present application, wherein the second consumer provides the read completion pointer to the first producer.
A system of third or fourth processing queues according to the second aspect of the present application, there is provided a system of eleventh processing queues according to the second aspect of the present application, wherein the first producer provides a write pointer to the first consumer;
the first consumer provides the read pointer to the second consumer.
The system of the first to eleventh processing queues according to the second aspect of the present application provides the system of the twelfth processing queue according to the second aspect of the present application, wherein the write pointer includes a wrap flag or information of the number of times wrapping has occurred.
In accordance with the fifth to twelfth processing queues of the second aspect of the present application, there is provided the thirteenth processing queue of the second aspect of the present application, wherein in response to the read completion pointer and the read pointer pointing to the same address, the second consumer suspends fetching of the second message from the queue.
In accordance with the fifth to thirteenth processing queues of the second aspect of the present application, there is provided the system of the fourteenth processing queue of the second aspect of the present application, wherein the first producer suspends writing the first message to the queue in response to the read completion pointer leading the write pointer.
According to the fifth to fourteenth systems for processing queues of the second aspect of the present application, there is provided the system for processing a queue of the fifteenth aspect of the present application, wherein the read completion pointer points to a head of the queue; the write pointer points to the tail of the queue.
A system for processing queues according to the sixteenth aspect of the present application is provided in accordance with the system for processing queues according to the first to fifteenth aspects of the present application, wherein the queues are provided in a memory.
The system of the seventeenth processing queue according to the second aspect of the present application, wherein the first producer and the second consumer are direct memory access accelerators;
the first consumer is a direct memory access processing module;
the first message is a direct memory access command and the second message is a processing result of the direct memory access command.
The system of the eighteenth processing queue according to the second aspect of the present application, wherein the memory includes a plurality of queues independent of each other, through which the first producer, the first consumer, and the second consumer exchange information.
According to a third aspect of the present application, there is provided a first electronic device according to the third aspect of the present application, comprising a physical layer module, a direct memory access module, a memory, a direct memory access accelerator, and a central processor, the physical layer module being coupled to the direct memory access module, the memory being coupled to the direct memory access module and the direct memory access accelerator, the direct memory access accelerator being coupled to the central processor;
the direct memory access accelerator converts a data packet provided by the central processing unit into a direct memory access command and writes the direct memory access command into the memory; and obtaining an execution result of the direct memory access command in the memory;
the direct memory access module initiates direct memory access transmission according to a direct memory access command acquired from the memory and writes an execution result of the direct memory access command into the memory;
the electronic device communicates with an external device of the electronic device through the physical layer module.
According to a first electronic device of the third aspect of the present application, there is provided the second electronic device of the third aspect of the present application, wherein the direct memory access accelerator is provided with a streaming interface or a first-in first-out interface for central processor access.
According to a first electronic device of the third aspect of the present application, there is provided the third electronic device according to the third aspect of the present application, the direct memory access accelerator is provided with a streaming write interface and a streaming read interface for central processor access.
According to one of the first to third electronic devices of the third aspect of the present application, there is provided the fourth electronic device according to the third aspect of the present application, wherein the direct memory access accelerator includes a direct memory access command receiving unit and a first processing unit; wherein:
the direct memory access command receiving unit is coupled to the central processing unit and the first processing unit and receives the data packet provided by the central processing unit; and
the first processing unit is coupled with the memory, and the first processing unit acquires the data packet provided by the central processing unit from the direct memory access command receiving unit, converts the data packet into a direct memory access command, and writes the direct memory access command into the memory.
According to one of the first to fourth electronic devices of the third aspect of the present application, there is provided the fifth electronic device according to the third aspect of the present application, wherein the direct memory access accelerator includes a direct memory access command completion unit and a second processing unit; wherein:
the second processing unit is coupled with the memory, and the second processing unit acquires the execution result of the direct memory access command from the memory; and
the direct memory access command completion unit is coupled to the second processing unit and the central processor, acquires an execution result of the direct memory access command from the second processing unit, and provides the execution result of the direct memory access command to the central processor.
According to the fourth to fifth electronic devices of the third aspect of the present application, there is provided the sixth electronic device of the third aspect of the present application, wherein the direct memory access command receiving unit and/or the direct memory access command completing unit is provided with a buffer.
According to one of the first to sixth electronic devices of the third aspect of the present application, there is provided the seventh electronic device according to the third aspect of the present application, wherein the plurality of direct memory access commands stored in the memory are organized as a queue, and the first processing unit writes the direct memory access commands to the memory in accordance with a write pointer of the queue.
According to a seventh electronic device of the third aspect of the present application, there is provided the eighth electronic device of the third aspect of the present application, wherein the second processing unit acquires the execution result of the direct memory access command from the memory in accordance with the read pointer of the queue.
According to an eighth electronic device of the third aspect of the present application, there is provided the ninth electronic device according to the third aspect of the present application, wherein the second processing unit updates the read completion pointer in accordance with a result of execution of the direct memory access command being received by the central processing unit.
According to a ninth electronic device of the third aspect of the present application, there is provided the tenth electronic device of the third aspect of the present application, wherein the direct memory access accelerator further comprises a pointer manager, the pointer manager being coupled with the first processing unit and the second processing unit, the first processing unit obtaining the write pointer from the pointer manager, the second processing unit obtaining the read pointer from the pointer manager and updating the read completion pointer to the pointer manager.
According to a ninth electronic device of the third aspect of the present application, there is provided the eleventh electronic device of the third aspect of the present application, wherein the first processing unit is coupled with the second processing unit, and the first processing unit acquires the read completion pointer from the second processing unit.
According to a ninth electronic device of the third aspect of the present application, there is provided the twelfth electronic device of the third aspect of the present application, wherein the second processing unit writes the read completion pointer to the memory, and the first processing unit acquires the read completion pointer from the memory.
According to one of the first to sixth electronic devices of the third aspect of the present application, there is provided the thirteenth electronic device according to the third aspect of the present application, wherein the plurality of direct memory access commands stored in the memory are organized as a linked list, a linear table, or an array.
According to one of the first to thirteenth electronic devices of the third aspect of the present application, there is provided the fourteenth electronic device of the third aspect of the present application, wherein the direct memory access command instructs the direct memory access module to transmit the data to be transmitted indicated by the data packet in a plurality of data frames.
According to a fourteenth electronic device of the third aspect of the present application, there is provided the fifteenth electronic device of the third aspect of the present application, wherein the size of the data frame is 512 bytes.
According to a fourteenth electronic device of the third aspect of the present application, there is provided the sixteenth electronic device according to the third aspect of the present application, wherein the size of the data frame is a data block size encrypted with an advanced encryption standard.
According to a fourteenth electronic device of the third aspect of the present application, there is provided the seventeenth electronic device of the third aspect of the present application, wherein the size of the data frame is a size of a data block checked with a cyclic redundancy check code.
According to one of the first to seventeenth electronic devices of the third aspect of the present application, there is provided the eighteenth electronic device according to the third aspect of the present application, wherein data transfer is performed between the direct memory access accelerator and the central processor through a plurality of mutually independent streams.
According to one of the first to seventeenth electronic devices of the third aspect of the present application, there is provided the nineteenth electronic device of the third aspect of the present application, wherein the data packet indicates an identifier from which the direct memory access accelerator determines a storage address of the direct memory access command in the memory.
According to one of the second to seventeenth electronic devices of the third aspect of the present application, there is provided the twentieth electronic device of the third aspect of the present application, wherein the direct memory access accelerator includes one or more streaming interfaces or first-in first-out interfaces, and the direct memory access accelerator determines a storage address of the direct memory access command in the memory in accordance with the streaming interface or the first-in first-out interface that receives the data packet.
According to a fourth aspect of the present application, there is provided a first direct memory access command processing method according to the fourth aspect of the present application, comprising the steps of:
receiving a data packet;
converting the data packet into a direct memory access command and writing the direct memory access command into a memory;
in response to the execution result of the direct memory access command in the memory being updated, the execution result of the updated direct memory access command is obtained.
According to a first direct memory access command processing method of a fourth aspect of the present application, there is provided a second direct memory access command processing method of the fourth aspect of the present application, comprising: and carrying out data transmission through the streaming interface.
According to a second direct memory access command processing method of the fourth aspect of the present application, there is provided a third direct memory access command processing method of the fourth aspect of the present application, wherein the method includes:
providing a status flag for the streaming interface;
if the status flag of the streaming interface is an available status flag, a data packet is received from the streaming interface or the execution result of the direct memory access command is written to the streaming interface.
According to the first direct memory access command processing method of the fourth aspect of the present application, there is provided a fourth direct memory access command processing method of the fourth aspect of the present application, including: and data transmission is carried out through a first-in first-out interface.
According to a fourth direct memory access command processing method of the fourth aspect of the present application, there is provided a fifth direct memory access command processing method of the fourth aspect of the present application, wherein the method includes:
providing the state of a first-in first-out interface;
if the FIFO queue in the FIFO is not full, the data packet is received from the FIFO.
According to a fourth direct memory access command processing method of the fourth aspect of the present application, there is provided a sixth direct memory access command processing method of the fourth aspect of the present application, including:
providing the state of a first-in first-out interface;
and if the first-in first-out queue in the first-in first-out interface is not empty, writing the execution result of the direct memory access command into the first-in first-out interface.
According to one of the first to sixth direct memory access command processing methods of the fourth aspect of the present application, there is provided a seventh direct memory access command processing method according to the fourth aspect of the present application, wherein the method includes:
writing the direct memory access command into the memory according to the write pointer;
updating the write pointer and writing the updated write pointer into the memory;
acquiring a reading completion pointer;
the write pointer points to the tail of the queue in the memory for storing the direct memory access command, and the read completion pointer points to the head of the queue in the memory for storing the direct memory access command;
determining that a queue for storing the direct memory access command in the memory is not full according to the fact that the read completion pointer lags behind the write pointer; the direct memory access command is written to the memory only when the queue in the memory is not full.
According to a seventh direct memory access command processing method of the fourth aspect of the present application, there is provided the eighth direct memory access command processing method of the fourth aspect of the present application, wherein the direct memory access module acquires the direct memory access command from a head of a queue in the memory;
in response to the direct memory access command being completed, the direct memory access module writes the execution result of the direct memory access command to the memory, updating the read pointer in the memory.
According to a seventh direct memory access command processing method of the fourth aspect of the present application, there is provided the ninth direct memory access command processing method of the fourth aspect of the present application, wherein in response to the read pointer leading the read completion pointer, an execution result of the direct memory access command is fetched from the memory in accordance with the read completion pointer;
updating a read completion pointer;
wherein the read pointer is updated in response to a result of the execution of the direct memory access command being written to the memory.
According to a seventh direct memory access command processing method of the fourth aspect of the present application, there is provided a tenth direct memory access command processing method of the fourth aspect of the present application, wherein the method includes:
in response to the read pointer being different from the read completion pointer, a result of the execution of the direct memory access command is retrieved from the memory.
According to one of the seventh to tenth direct memory access command processing methods of the fourth aspect of the present application, there is provided the eleventh direct memory access command processing method according to the fourth aspect of the present application, further comprising: the read completion pointer is written to memory.
According to one of the first to eleventh direct memory access command processing methods of the fourth aspect of the present application, there is provided the twelfth direct memory access command processing method according to the fourth aspect of the present application, wherein the data transfer is performed by a plurality of mutually independent streams, and each stream is in one-to-one correspondence with a queue in the memory.
According to a twelfth direct memory access command processing method of the fourth aspect of the present application, there is provided the thirteenth direct memory access command processing method of the fourth aspect of the present application, further comprising:
providing a status flag for each stream;
if the state flag of the flow is the available state flag, receiving a data packet through the flow;
writing a direct memory access command into a queue in the memory corresponding to the flow indicated by the data packet according to the flow indicated by the data packet;
and providing the execution result of the direct memory access command to the central processor through the flow corresponding to the queue of the completed direct memory access command.
According to one of the first to eleventh direct memory access command processing methods of the fourth aspect of the present application, there is provided the fourteenth direct memory access command processing method according to the fourth aspect of the present application, wherein data transmission is performed through a plurality of mutually independent streams and a plurality of streaming interfaces; the streams correspond to the streaming interfaces one to one;
the direct memory access command processing method comprises the following steps:
identifying the stream to which the data packet belongs according to the stream interface of the received data packet;
and writing the execution result of the direct memory access command to a streaming interface corresponding to the stream where the completed direct memory access command is located.
According to the first to fourteenth direct memory access command processing methods of the fourth aspect of the present application, there is provided a fifteenth direct memory access command processing method according to the fourth aspect of the present application, wherein the direct memory access command instructs the direct memory access module to transmit data to be transmitted, which is indicated by the data packet, in a plurality of data frames.
According to a fifteenth direct memory access command processing method of the fourth aspect of the present application, there is provided the sixteenth direct memory access command processing method according to the fourth aspect of the present application, wherein the size of the data frame is a data block size encrypted with an advanced encryption standard.
According to a fifteenth direct memory access command processing method of the fourth aspect of the present application, there is provided the seventeenth direct memory access command processing method according to the fourth aspect of the present application, wherein the size of the data frame is the size of the data block checked with the cyclic redundancy check code.
The technical scheme of the application obtains the following beneficial effects:
(1) the single queue in the memory is used for the first producer to submit the first information to the first consumer and for the first consumer to submit the second information to the second consumer, so that the requirement on the storage space of the memory is reduced.
(2) The method and the device realize effective management and operation of one or more relatively independent queues in the memory by maintaining the read pointer, the write pointer and the read completion pointer of the same queue, and improve the data transmission speed.
(3) According to the method and the device, the queue in the memory is maintained, so that the first processing unit of the DMA accelerator and the DMA module form one pair of producer and consumer of the queue, the DMA module and the second processing unit of the DMA accelerator form the other pair of producer and consumer of the queue, the single queue is used for the DMA accelerator to submit the DMA command to the DMA module and the DMA module to submit the execution result of the DMA command to the DMA accelerator, and the requirement for the storage space of the memory is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a system block diagram of a prior art DMA transfer;
FIG. 2 is a system block diagram of a DMA transfer according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a streaming write interface provided by a direct memory access accelerator according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a streaming read interface provided by a direct memory access accelerator according to an embodiment of the present application;
FIG. 5 is a block diagram of a direct memory access accelerator according to an embodiment of the present application;
FIGS. 6-10 are schematic diagrams of pointers to a single queue according to an embodiment of the present application;
fig. 11 is a structural diagram of a direct memory access accelerator according to a second embodiment of the present application;
FIG. 12 is a flowchart of a method for processing DMA commands according to a third embodiment of the present application; and
fig. 13 is a flowchart illustrating a CPU executing a dma operation through a dma accelerator according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 2 is a system configuration diagram of DMA transfer according to an embodiment of the present application. As shown in fig. 2, the electronic device 200 includes a physical layer module (PHY module) 210, a direct memory access module (DMA module) 220, a memory 230, a direct memory access accelerator (DMA accelerator) 240, and a Central Processing Unit (CPU) 250. The PHY module 210 is coupled to the DMA module 220, the memory 230 is coupled to the DMA module 220 and the DMA accelerator 240, and the DMA accelerator 240 is coupled to the CPU 250.
The DMA accelerator 240 converts the data packet supplied from the CPU250 into a DMA command and writes the DMA command to the memory 230, and acquires the execution result of the DMA command in the memory 230. Wherein the DMA command instructs the DMA module 220 to transfer the data to be transferred indicated by the data packet in a plurality of data frames.
Specifically, the DMA accelerator 240 generates a DMA command that meets the requirements of the format, transfer length, etc. required by the DMA module 220, instead of the CPU 250. As one embodiment, the data bus width of the CPU250 interface is 32 bits, 4KB of data is to be transferred to the external device 300, and the size of the data frame is a data block size encrypted with Advanced Encryption Standard (AES). The DMA command accepted by the DMA module 220 is 16 or 32 bytes and the data block length supported by AES encryption is 512 bytes. Thus, the DMA accelerator 240 generates a DMA command from the plurality of 32-bit data received from the CPU250, wherein 8 DMA transfers of 512 bytes are indicated to comply with AES encryption requirements. As another example, the size of the data frame may also be the size of the data block checked with the cyclic redundancy check code.
The DMA accelerator 240 maintains a data structure of the DMA command stored in the memory 230 instead of the CPU250, writes the DMA command to the memory 230, and monitors status update of the DMA command in the memory 230 and provides an execution result of the DMA command to the CPU 250.
The DMA module 220 maintains DMA transfers between the external device 300 and the electronic device 200 through the PHY module 210. For example, the DMA module 220 acquires a DMA command indicating a DMA transfer from the memory 230, then initiates the DMA transfer with the external device 300 in accordance with the DMA command, and writes the execution result of the DMA command to the memory 230.
The electronic device 200 performs DMA transfer with an external device (e.g., the external device 300) of the electronic device through the PHY module 210. PHY module 210 may be a PCIe PHY module for processing PCIe underlying protocols, a FCPHY module for processing FC underlying protocols, or an ethernet PHY module for processing ethernet underlying protocols.
The electronic device 200 is also coupled to a memory 400. In the DMA transfer, data of the memory 400 is transferred to the external device 300 or data provided from the external device 300 is stored in the memory 400 under the control of the DMA module 220. For one embodiment, memory 400 is a DRAM (dynamic random access memory), and memory 400 has a larger storage capacity than memory 230.
As one embodiment, the DMA accelerator is provided with a streaming interface for CPU access. FIG. 3 is a schematic diagram of a streaming write interface of a DMA accelerator according to an embodiment of the present application; FIG. 4 is a diagram of a streaming read interface of a DMA accelerator according to an embodiment of the application.
As shown in fig. 3, the DMA accelerator provides a streaming write interface with an accessible address and a status flag indicating whether the streaming interface is available. When the CPU recognizes that the streaming write interface is available through the available status flag, the data associated with the DMA command is written to the streaming write interface in a specified width (e.g., 32 bits).
As one example, the CPU provides data using a streaming write interface provided by the DMA accelerator through a code segment as follows. The variable stream indicates the state of the streaming write interface obtained from the available state flag of the streaming write interface, and when it indicates that the streaming write interface is available, the DMA command data (indicated by the variable CMD) is written to the accessible address of the streaming interface by the DMA _ Wr ite (CMD) process. If the DMA command data CMD is large, the CPU splits it into a plurality of 32-bit data by executing a program and supplies the data to the streaming write interface.
If(stream!=full)
DMA_Write(CMD)
As shown in fig. 4, the read-out interface is a streaming read-out interface. The CPU accesses the availability status flag provided by the streaming read interface and indicates that there is data that is readable when the streaming read interface is available. Accordingly, the CPU reads out data (indicating the execution result of the DMA command) from the accessible address provided by the streaming readout interface.
As one example, a CPU obtains data using a streaming read interface provided by a DMA accelerator through a code segment as follows. The variable stream indicates the state of the streaming read interface obtained from the available state flag of the streaming read interface, and when it indicates that the streaming read interface is not empty, the execution result of the DMA command (indicated by the variable CMD _ Status) is obtained from the accessible address of the streaming read interface by the DMA _ read (CMD) process. If the size of the execution result of the DMA command is large, the CPU obtains the execution result of the DMA command from the streaming read interface by reading data a plurality of times in a predetermined width (for example, 32 bits) by executing the program.
If(stream!=empty)
DMA_Read(CMD_Status)
Optionally, the DMA accelerator provides an interrupt to the CPU. In response to the interrupt, the CPU knows that the streaming read interface has readable data, and the CPU reads the data from the streaming read interface.
Alternatively, the streaming write interface and the streaming read interface may not use the available status flag. The CPU writes data directly to and reads data directly from the accessible addresses provided by the streaming write interface.
By using the streaming interface, the CPU writes data to the stream without concern for the storage address and data structure of the data, thereby reducing the load on the CPU. When reading out data, the CPU acquires data from the stream by using a stream interface without concern for the storage address and data structure of the data. Although the streaming interface provides an accessible address, the accessible address is a single address or a designated address, and even if multiple copies of data are accessed, the CPU does not need to process the update of the address and does not need to perform memory management.
It will be appreciated that the accessible address may also be an identifier of the accessed stream, such that the data is provided to the DMA accelerator or DMA module by an operation that adds the data to the stream as seen by software.
As another embodiment, the DMA accelerator is provided with a first-in-first-out interface (FIFO interface) for CPU access. The DMA accelerator provides a FIFO write interface to the CPU. When the FIFO queue provided by the DMA accelerator is not full, the CPU may write data to the tail of the FIFO queue. The DMA accelerator provides a FIFO read interface to the CPU. When the FIFO queue is not empty, the CPU can read data from the head of the FIFO queue.
Example one
Fig. 5 is a block diagram of a DMA accelerator 240 (see also fig. 2) according to an embodiment of the present application. As shown in fig. 5, the DMA accelerator 240 includes a DMA command reception unit 501, a processing unit 502, a DMA command completion unit 504, and a processing unit 503.
As shown in fig. 5, the DMA command receiving unit 501 is coupled to the CPU250 and the processing unit 502, respectively, and the DMA command receiving unit 501 receives a packet provided from the CPU 250.
The processing unit 502 is coupled to the memory 230, and the processing unit 502 obtains the data packet provided by the CPU250 from the DMA command receiving unit 501, converts the data packet to a format acceptable to the DMA module 220, forms a DMA command and writes to the memory 230, and maintains a data structure (e.g., a queue) acceptable to the DMA module 220.
The processing unit 503 is coupled to the memory 230, and the processing unit 503 retrieves the DMA command execution result from the memory 230. The DMA command completion unit 504 is coupled to the processing unit 503 and the CPU250, respectively, and the DMA command completion unit 504 acquires the execution result of the DMA command from the processing unit 503 and supplies the execution result of the DMA command to the CPU 250.
For one embodiment, the DMA command reception unit 501 and/or the DMA command completion unit 504 are provided with a buffer area to buffer the data packet from the CPU250 and/or the execution result of the DMA command from the processing unit 503.
As one embodiment, as shown in fig. 5, the DMA command reception unit 501 provides a streaming write interface to receive a packet provided by the CPU 250. Also, the DMA command completion unit 504 buffers the status of the DMA command supplied from the processing unit 503 and indicates to the CPU250 through its streaming read interface that the streaming read interface is available or that there is data to be read in the streaming read interface, and the CPU250 reads the execution result of the DMA command from the DMA command completion unit 504 through the streaming read interface of the DMA command completion unit 504.
By providing the streaming interface for the CPU250 to access the DMA accelerator 240, the CPU250 does not need to maintain the data structure of the DMA command stored in the memory 230, does not need to care about the format of the DMA command received by the DMA module 220, does not need to adapt to a specific format of different types of DMA commands, simplifies the interface for the CPU250 to access the DMA command, and reduces the load when the CPU250 processes the DMA command.
The plurality of DMA commands stored in memory 230 may be organized as a queue, linked list, linear table, array, or the like.
As one embodiment, the plurality of DMA commands stored in memory 230 are organized as a queue. In this embodiment, data transfer between the CPU250 and the DMA accelerator 240 is performed by a single stream, and a queue (single queue) is provided in the memory 230. The processing unit 502 writes the DMA command to the memory 230 according to the write pointer of the queue, and the processing unit 503 obtains the execution result of the DMA command from the memory 230 according to the read pointer of the queue, and updates the read completion pointer according to the execution result of the DMA command received by the CPU 250.
As shown in FIG. 5, processing unit 502 maintains a read pointer and a write pointer, processing unit 502 writes DMA commands to memory 230 in accordance with the address indicated by the write pointer in the process indicated by marker ①②③, Next, processing unit 502 updates the write pointer maintained by itself such that the write pointer points to the tail of the updated queue, writes the updated write pointer to memory 230 in the process indicated by marker ④⑤⑥ to record the position of the tail of the queue in memory 230. processing unit 502 also monitors the read completion pointer (indicating the head of the queue position) in memory 230 through the process indicated by marker ⑦⑧⑨ and records the latest value of the read completion pointer inside processing unit 502. processing unit 502 knows whether the queue in memory 230 is not full by maintaining the read completion pointer and the write pointer.
With continued reference to FIG. 5, the DMA module 220 retrieves the DMA command from the memory 230. The DMA module 220 determines whether there is an added DMA command in the queue from the write pointer and the read pointer recorded in the memory 230 and acquires the DMA command from the head of the queue (indicated by the read pointer). The DMA module 220 initiates a DMA transfer between the external device and the electronic device according to the fetched DMA command, and updates the read pointer in the memory 230 after the DMA transfer corresponding to the DMA command is completed and the execution result of the DMA command is written into the memory 230, so as to indicate that the DMA command is completely processed by the DMA module 220.
It is to be understood that the queue operated by the processing unit 502 has the write pointer indicating the end of the queue and the read completion pointer indicating the head of the queue. And a queue operated by the processing unit 503, the end of the queue being indicated by the read pointer and the head of the queue being indicated by the read completion pointer.
The processing unit 503 monitors the read pointer in the memory 230. In response to a change in the read pointer of the memory 230 or a difference in the read pointer of the memory 230 and the read completion pointer recorded by the processing unit 503, the processing unit 503 knows that a new DMA command is completed by the DMA module. When the read completion pointer is different from the read pointer of the memory 230, the processing unit 503 acquires the execution result of the processed DMA command from the memory 230 according to the read completion pointer recorded by itself, and supplies it to the DMA command completion unit 504. The processing unit 503 also updates the read completion pointer maintained by itself in response to providing the result of the execution of the DMA command to the DMA command completion unit 504.
For one embodiment, processing unit 503 may write a read completion pointer to memory 230. The processing unit 502 monitors the read completion pointer in the memory 230 and takes the read completion pointer as the position of the head of the queue. The processing unit 502 retrieves the read completion pointer from the memory 230.
As another embodiment, the processing unit 502 is coupled to the processing unit 503, and the processing unit 502 obtains the read completion pointer directly from the processing unit 503, and uses the read completion pointer as the position of the head of the queue, without obtaining the read completion pointer from the memory 230.
As yet another embodiment, the DMA accelerator 240 further includes a pointer manager (see pointer manager 1105 in fig. 11) respectively coupled to the processing unit 502 and the processing unit 503, the processing unit 502 obtaining a read completion pointer from the pointer manager and updating a write pointer to the pointer manager, and the processing unit 503 obtaining a read pointer from the pointer manager and updating a read completion pointer to the pointer manager.
As yet another example, processing unit 502 provides the write pointer directly to DMA module 220, and DMA module 220 provides the read pointer directly to processing unit 503.
Fig. 6 to 10 are schematic diagrams of pointers of a single queue according to an embodiment of the present application. The pointers associated with the single queue include a read pointer, a write pointer, and a read completion pointer. The queues in memory 230 and pointers associated with the queues may be accessed by DMA accelerator 240 or DMA module 220.
Alternatively, the DMA accelerator 240 and DMA module 220 may maintain copies of pointers associated with the queues.
FIG. 6 shows the queue and pointers in an initial state. The queue includes 16 entries (numbered 0-15, respectively) that can accommodate 16 DMA commands. Alternatively, the DMA commands may be of the same or different sizes.
In the initial state, after the electronic device 200 is powered on or reset, no DMA command is written in the queue (the queue is empty), and the read pointer, the write pointer, and the read completion pointer are all 0 and point to entry 0 of the queue.
The processing unit 502 of the DMA accelerator 240 adds a DMA command to the queue (write pointer 0 in fig. 6) according to the write pointer and updates the write pointer in the memory 230 after writing the DMA command to the queue. Referring to FIG. 7, the queue entry numbered 0 is written to a DMA command and the write pointer is updated to 1 (pointing to the queue entry numbered 1). While the read pointer and read completion pointer remain 0.
The DMA module 220 identifies that the queue was written with a DMA command based on the read pointer lagging the write pointer. The DMA module 220 fetches the DMA command from the queue and processes it according to the read pointer (pointing to the queue entry numbered 0 in fig. 7).
It will be appreciated that the process of the processing unit 502 of the DMA accelerator 240 adding a DMA command to the queue may be concurrent with the process of the DMA module retrieving a DMA command from the queue and may not affect each other.
Referring to FIG. 8, the DMA module 220 processes DMA commands slower than the DMA accelerator 240 adds commands to the queue, the processing unit 502 of the DMA accelerator 240 continues to add DMA commands to the queue, the write pointer has been updated to 10 (queue entries numbered 0 through 9 are all written to DMA commands). The DMA module 220 processes DMA commands for queue entries numbered 0 through 3 in the queue and the read pointer is updated to 4. After the DMA module 220 completes the processing of the DMA command, the DMA command in the queue is updated, and the execution result of the DMA command is recorded in the DMA command. The DMA module 220 writes the execution result of the DMA command to the entry of the queue. In FIG. 8, the DMA commands in queue entries numbered 0 through 3 in the queue are all processed by DMA module 220.
Optionally, in this process, the read pointer indicates a queue entry at which the DMA module 220 writes the execution result of the DMA command to the queue. After the DMA module 220 finishes writing the execution result of the DMA command to the queue entry, the read pointer in the memory 230 is updated.
Alternatively, the DMA module 220 records the position of the DMA command in the queue, and writes the execution result of the DMA command into the queue according to the recorded position after the DMA command is processed.
The processing unit 503 of the DMA accelerator 240 obtains the execution result of the DMA command from the entry of the queue. The processing unit 503 of the DMA accelerator 240 recognizes that the read pointer leads the read completion pointer, knows the status of the DMA command written by the DMA module 220 in the queue, and obtains the execution result of the DMA command according to the entry indicated by the read completion pointer. After the processing unit 503 obtains the execution result of the DMA command from the queue entry, it also updates the read completion pointer to point to the next entry of the queue. Referring to FIG. 8, the processing unit 503 of the DMA accelerator 240 obtains the results of the execution of DMA commands with queue entries numbered 0 and 1 from the read completion pointer and updates the read completion pointer to point to the queue entry numbered 2.
The processing unit 502 of the DMA accelerator 240 continues to add DMA commands to the queue. When the entry numbered 15 is written to the DMA command, the write pointer wraps around and points to the entry numbered 0 (see FIG. 9) since the queue has a total of 16 entries. Meanwhile, DMA module 220 continues to process DMA commands in the queue and writes the DMA command execution status to the queue and updates the read pointer (pointing to entry number 11 in FIG. 9). The processing unit 503 of the DMA accelerator 240 continues to fetch the results of the execution of the DMA command from the queue and updates the read completion pointer (pointing to entry number 7 in fig. 9).
Preferably, the wrap around occurred in the write pointer 0 is also recorded, and the write pointer is recognized to lead the read pointer based on the wrap around flag or the number of times the wrap around occurred.
The processing unit 502 of the DMA accelerator 240 identifies whether the queue is not full based on the write pointer and the read completion pointer. If the read completion pointer leads the write pointer, meaning that the queue is full of added but not taken DMA commands (which may have been processed by DMA module 220), then processing unit 502 of DMA accelerator 240 suspends adding DMA commands to the queue while waiting for processing unit 503 of DMA accelerator 240 to take the results of the execution of DMA commands from the queue.
The process of the processing unit 503 of the DMA accelerator 240 acquiring the execution result of the DMA command from the queue and the process of the DMA module 220 adding the execution state of the DMA command to the queue can be executed in parallel without affecting each other.
Referring to FIG. 10, the processing unit 503 of the DMA accelerator 240 fetches the execution results of the DMA commands from the queue faster than the DMA module 220 adds the execution results of the DMA commands to the queue so that the read completion pointer gradually catches up with the read pointer. In FIG. 10, with respect to FIG. 9, both the read completion pointer and the read pointer wrap around, and the read completion pointer and the read pointer point to the same location (queue entry numbered 2). In response, the processing unit 503 of the DMA accelerator 240 recognizes that the DMA module 220 has not yet generated an updated DMA command execution result, and thus the processing unit 503 of the DMA accelerator 240 suspends fetching DMA command execution results from the queue.
During the use of the queue, the processing unit 502 of the DMA accelerator and the DMA module 220 are one pair of producer and consumer using the queue, and the processing unit 504 of the DMA module 220 and the DMA accelerator are the other pair of producer and consumer using the queue. The single queue is used for the DMA accelerator to submit the DMA command to the DMA module and also used for the DMA module to submit the execution result of the DMA command to the DMA accelerator, so that the requirement on the storage space of the memory is reduced.
Example two
Fig. 11 is a structural diagram of a direct memory access accelerator according to a second embodiment of the present application. As shown in fig. 11, the DMA accelerator 240 includes a DMA command reception unit 1101, a processing unit 1102, a DMA command completion unit 1104, and a processing unit 1103.
As shown in fig. 11, the DMA command receiving unit 1101 is coupled to the CPU250 and the processing unit 1102, respectively, and the DMA command receiving unit 1101 receives a packet supplied from the CPU 250.
The processing unit 1102 is coupled to the memory 230, and the processing unit 1102 obtains the data packet provided by the CPU250 from the DMA command receiving unit 1101, converts the data packet to a format acceptable to the DMA module 220, forms a DMA command and writes to the memory 230, and maintains a data structure (e.g., a queue) acceptable to the DMA module 220.
The processing unit 1103 is coupled to the memory 230, and the processing unit 1103 retrieves the DMA command execution result from the memory 230. The DMA command completion unit 1104 is coupled to the processing unit 1103 and the CPU250, respectively, and the DMA command completion unit 1104 acquires the execution result of the DMA command from the processing unit 1103 and supplies the execution result of the DMA command to the CPU 250.
The difference between the second embodiment and the first embodiment is that: the DMA accelerator 240 and the CPU250 perform data transfer via a plurality of independent streams, and the memory 230 is provided with queues corresponding to the respective streams. The DMA module independently processes DMA commands provided by the CPU250 in the respective streams.
As an example, the DMA command reception unit 1101 is provided with a port (a single port, such as a streaming write interface). The streaming write interface of the DMA command reception unit 1101 provides an available state flag for each stream, and the CPU250 can acquire the available state flag for each stream independently or jointly. The port receives packets from the CPU250, marking the stream to be accessed, depending on the availability of the stream. The DMA command reception unit 1101 identifies the stream to which the DMA command belongs, based on the flag in the packet. The processing unit 1102 obtains a pointer of a queue corresponding to the stream (i.e., a storage address of the DMA command in the memory 230) according to the stream to which the DMA command belongs, and writes the DMA command into the queue corresponding to the stream.
The processing unit 1103 monitors pointers of queues corresponding to respective flows. After the execution status of the DMA command whose processing is completed appears in the queue, the processing unit 1103 acquires the execution result of the DMA command and supplies it to the CPU250 through the DMA command completion unit 1104. Specifically, the streaming read interface of the DMA command completion unit 1104 provides an available state flag for each stream, and the CPU250 can acquire the available state flag for each stream independently or jointly and acquire the execution result of the DMA command from each stream through the streaming read interface.
The DMA accelerator 240 ensures that the stream to which the DMA command submitted by the CPU250 belongs and the stream to which the execution result of the DMA command acquired by the CPU250 belongs are the same stream. Even if the CPU250 submits the DMA command to a plurality of streams at the same time, the execution result of the DMA command acquired by the CPU250 from each stream appears in the same stream as the DMA command submitted by the CPU 250. For example, the DMA accelerator 240 provides 4 streams (S0, S1, S2, and S3), the CPU250 submits the DMA commands C1 and C2 to the stream S1, and the DMA commands C3 and C4 to the stream S2, and the CPU250 accordingly acquires the execution results of the DMA commands C1 and C2 from the stream S1 provided by the DMA command completion unit 1104, and acquires the execution results of the DMA commands C3 and C4 from the stream S2 provided by the DMA command completion unit 1104.
Alternatively, in the same stream, the execution results of the DMA commands are supplied to the CPU250 in the order in which the DMA commands are submitted to the stream. Also optionally, the order in which the results of the execution of the DMA commands are provided to CPU250 may be out of order in the same stream.
As another specific embodiment, the DMA command receiving unit 1101 is provided with a plurality of ports (such as streaming interfaces or first-in first-out interfaces), the number of ports is the same as the number of streams, and the ports correspond to the streams one to one. The DMA command reception unit 1101 receives a packet of the CPU250 through a port corresponding one-to-one to the stream. The DMA command reception unit 1101 identifies the stream to which the DMA command belongs according to the port from which the packet is received. The processing unit 1102 obtains a pointer of a queue corresponding to the stream (i.e., a storage address of the DMA command in the memory 230) according to the stream to which the DMA command belongs, and writes the DMA command into the queue corresponding to the stream. The DMA command completion unit 1104 supplies the DMA command execution result to the CPU250 from the port corresponding to the stream to which the DMA command belongs.
As yet another example, the packet indicates an identifier (sID, also referred to as a flow identifier) that indicates the flow to which the packet belongs. The DMA accelerator 240 determines the memory address of the DMA command in the memory 230 based on the identifier.
For one embodiment, the DMA accelerator 240 also includes a pointer manager 1105. The pointer manager 1105 is coupled to the processing unit 1102 and the processing unit 1103, respectively, for recording a read pointer, a write pointer, and a read completion pointer (e.g., multiple pointers for streams with stream identifier S0, multiple pointers for streams with stream identifier S1) for the respective queues in the memory 230. The processing unit 1102 updates the write pointer of the queue corresponding to each flow to the pointer manager 1105, and the processing unit 1103 acquires the read pointer of the queue corresponding to each flow from the pointer manager 1105 and updates the read completion pointer of the queue corresponding to each flow to the pointer manager 1105. The pointer manager is used to manage a plurality of queues in the memory and for the processing unit 1102 and the processing unit 1103 to exchange pointers. The processing unit 1102 and the processing unit 1103 acquire or write a pointer corresponding to the stream from the pointer manager 1105 to the pointer manager 1105 according to the stream identifier.
Optionally, memory 230 provides storage space for queues corresponding to each flow and pointers corresponding to the queues. The memory 230 is shown in fig. 11 as accommodating two queues, and a read pointer and a write pointer for each of the two queues. The flows are in one-to-one correspondence with queues in memory 230.
EXAMPLE III
Fig. 12 is a flowchart of a DMA command processing method according to a third embodiment of the present application. As shown in fig. 12, the method for processing the DMA command by the DMA accelerator includes the following steps:
step 1201: a data packet is received from the CPU.
Step 1202: the data packet is converted to a DMA command and written to memory.
Step 1203: in response to the execution result of the DMA command in the memory being updated, the updated execution result of the DMA command is obtained.
Step 1204: the result of the execution of the DMA command is provided to the CPU.
For one embodiment, data transfers are made between the CPU and the DMA accelerator over a streaming interface, with the DMA accelerator providing a status flag for each stream.
In step 1201, the DMA accelerator provides a status flag for the streaming interface indicating whether the streaming interface can receive data. If the CPU recognizes that the state mark of the streaming interface is an available state mark, the CPU sends a data packet to the streaming interface.
In step 1204, the DMA accelerator provides a status flag of the streaming interface, where the status flag indicates whether there is data to be output by the streaming interface. If the CPU recognizes that the status flag of the streaming interface is an available status flag, the CPU reads the packet from the streaming interface.
As another embodiment, data transmission between the CPU and the DMA accelerator is performed through a first-in first-out interface (FIFO interface).
In step 1201, the DMA accelerator provides the status of the FIFO interface. If the CPU identifies that the first-in first-out queue is not full, the CPU sends a data packet to the first-in first-out interface.
In step 1204, the DMA accelerator provides the status of the FIFO interface. If the CPU identifies that the FIFO queue is not empty, the CPU reads the data packet from the FIFO interface.
Optionally, the plurality of DMA commands stored in the memory are organized as a queue. The queue is maintained by a write pointer, a read pointer, and a read completion pointer. The write pointer points to the tail of the queue in memory where the DMA command is stored, and the read completion pointer points to the head of the queue in memory where the DMA command is stored.
The DMA accelerator determines that the queue in memory storing the DMA command is not full based on the read completion pointer lagging the write pointer. The DMA accelerator writes a DMA command to memory according to the write pointer only when the queue in memory is not full, and then updates the write pointer.
And the DMA module acquires the DMA command from the queue in the memory according to the read pointer and processes the DMA command. In response to the DMA command being completed, the DMA module writes the results of the execution of the DMA command to memory and updates a read pointer in the memory.
In response to the read pointer being different from the read completion pointer, the DMA accelerator retrieves from memory a result of the execution of the DMA command. In response to the read pointer leading the read completion pointer, the DMA accelerator fetches the result of the execution of the DMA command from memory in accordance with the read completion pointer and updates the read completion pointer.
Alternatively, the DMA accelerator may write a read completion pointer to memory and monitor the read completion pointer in memory and treat the read completion pointer as the position of the head of the queue.
According to one embodiment, data transmission is performed between the DMA accelerator and the CPU through a plurality of mutually independent streams, and queues which correspond to the streams one to one are arranged in a memory. The DMA module independently processes DMA commands provided by the CPU in the respective streams. The DMA accelerator provides a status flag for each stream.
Fig. 13 is a flowchart illustrating a CPU executing a dma operation through a dma accelerator according to an embodiment of the present application. As shown in fig. 13, the CPU executing the DMA operation by the DMA accelerator includes the steps of:
step 1310: the CPU fetches the available streams from the DMA accelerator. For example, the CPU accesses a status flag provided by the DMA accelerator as to whether a streaming interface is available to obtain an available stream. The status flag indicates whether the corresponding stream can write data or read data.
If the CPU is to provide DMA commands to the DMA accelerator, then step 1320 is executed: the packet is sent to the stream available for write DMA commands.
If the CPU is to obtain the execution result of the DMA command from the DMA accelerator, step 1330 is executed: the result of the execution of the DMA command is obtained from the stream available for reading data.
By using the streaming interface, the CPU writes data to the stream without concern for the storage address and data structure of the data, thereby reducing the load on the CPU. When reading out data, the CPU acquires data from the stream by using a stream interface without concern for the storage address and data structure of the data. Although the streaming interface provides an accessible address, the accessible address is a single address or a designated address, and even if multiple copies of data are accessed, the CPU does not need to process the update of the address and does not need to perform memory management.
The technical scheme of the application obtains the following beneficial effects:
(1) the method and the device use a single queue for the first producer to submit the first information to the first consumer and for the first consumer to submit the second information to the second consumer, so that the requirement on the storage space of the storage is reduced.
(2) According to the method and the device, one or more relatively independent queues in the memory are effectively managed and operated through the read pointer, the write pointer and the read completion pointer, and the data transmission speed is improved.
(3) According to the method and the device, the queue in the memory is maintained, so that the first processing unit of the DMA accelerator and the DMA module form one pair of producer and consumer of the queue, the DMA module and the second processing unit of the DMA accelerator form the other pair of producer and consumer of the queue, the single queue is used for the DMA accelerator to submit the DMA command to the DMA module and the DMA module to submit the execution result of the DMA command to the DMA accelerator, and the requirement for the storage space of the memory is reduced.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for exchanging information via a queue, comprising:
the first producer writes the first message into a queue;
the first consumer obtains the first message from the queue;
the first consumer writes a processing result of the first message into the first message in the queue; wherein the processing result of the first message forms a second message;
the second consumer retrieves the second message from the queue.
2. The method of claim 1, wherein the first producer writes the first message to the queue in accordance with a write pointer of the queue;
the first consumer obtains the first message from the queue according to a read pointer of the queue;
the first consumer records the position of the first message in the queue and writes the second message into the queue according to the recorded position.
3. The method of any of claims 1-2, wherein the second consumer retrieves the second message from the queue in accordance with a read completion pointer of the queue.
4. The method of claim 3, wherein the method further comprises:
writing the read completion pointer to a memory from which the first producer obtained the read completion pointer; alternatively, the first and second electrodes may be,
the second consumer provides the read completion pointer to the first producer.
5. The method of any one of claims 1-4, wherein the write pointer includes a wrap around flag or information of a number of times a wrap around occurred.
6. The method of any one of claims 3-5, further comprising:
in response to the read completion pointer and the read pointer pointing to the same address, the second consumer suspends fetching the second message from the queue.
7. The method of exchanging information via a queue of any of claims 1-6, wherein the memory comprises a plurality of the queues independent of one another, the first producer, the first consumer, and the second consumer exchanging information via the plurality of the queues.
8. A system for processing a queue, comprising:
the first producer writes the first message into the queue;
a first consumer obtaining the first message from the queue; and writing a processing result of the first message to the first message in the queue; wherein the processing result of the first message forms a second message;
a second consumer obtaining the second message from the queue.
9. The system of claim 8, wherein the queue further comprises a read completion pointer that indicates an address for the second consumer to read a message from the queue.
10. The system of claim 8 or 9, wherein the system further comprises a memory, the memory comprising a plurality of separate queues, the first producer, the first consumer, and the second consumer exchanging information via the plurality of queues.
CN201911415907.3A 2017-06-15 2017-06-15 Method and system for exchanging information by queues Active CN111158936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911415907.3A CN111158936B (en) 2017-06-15 2017-06-15 Method and system for exchanging information by queues

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710451236.0A CN109144742B (en) 2017-06-15 2017-06-15 Method for exchanging information through queue and system for processing queue
CN201911415907.3A CN111158936B (en) 2017-06-15 2017-06-15 Method and system for exchanging information by queues

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201710451236.0A Division CN109144742B (en) 2017-06-15 2017-06-15 Method for exchanging information through queue and system for processing queue

Publications (2)

Publication Number Publication Date
CN111158936A true CN111158936A (en) 2020-05-15
CN111158936B CN111158936B (en) 2024-04-09

Family

ID=64829783

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710451236.0A Active CN109144742B (en) 2017-06-15 2017-06-15 Method for exchanging information through queue and system for processing queue
CN201911415907.3A Active CN111158936B (en) 2017-06-15 2017-06-15 Method and system for exchanging information by queues

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710451236.0A Active CN109144742B (en) 2017-06-15 2017-06-15 Method for exchanging information through queue and system for processing queue

Country Status (1)

Country Link
CN (2) CN109144742B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213256B (en) * 2019-05-28 2021-09-28 哈尔滨工程大学 Firewall control method based on producer consumer mode
CN112416826B (en) * 2020-11-20 2023-09-22 成都海光集成电路设计有限公司 Special computing chip, DMA data transmission system and method

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050166206A1 (en) * 2004-01-26 2005-07-28 Parson Dale E. Resource management in a processor-based system using hardware queues
US20070288931A1 (en) * 2006-05-25 2007-12-13 Portal Player, Inc. Multi processor and multi thread safe message queue with hardware assistance
CN101105786A (en) * 2006-07-14 2008-01-16 中兴通讯股份有限公司 Double CPU communication method based on shared memory
CN101208671A (en) * 2005-06-27 2008-06-25 起元软件有限公司 Managing message queues
CN101409715A (en) * 2008-10-22 2009-04-15 中国科学院计算技术研究所 Method and system for communication using InfiniBand network
US20110153877A1 (en) * 2009-12-23 2011-06-23 King Steven R Method and apparatus to exchange data via an intermediary translation and queue manager
CN102541779A (en) * 2011-11-28 2012-07-04 曙光信息产业(北京)有限公司 System and method for improving direct memory access (DMA) efficiency of multi-data buffer
CN102970353A (en) * 2012-11-08 2013-03-13 大唐软件技术股份有限公司 Method and system for business data processing
CN103631624A (en) * 2013-11-29 2014-03-12 华为技术有限公司 Method and device for processing read-write request
CN103645942A (en) * 2013-12-12 2014-03-19 北京奇虎科技有限公司 Message queue based write and read method and system of shared memory
CN103761141A (en) * 2013-12-13 2014-04-30 北京奇虎科技有限公司 Method and device for realizing message queue
CN103914341A (en) * 2013-01-06 2014-07-09 中兴通讯股份有限公司 Data queue de-queuing control method and device
CN105095365A (en) * 2015-06-26 2015-11-25 北京奇虎科技有限公司 Information flow data processing method and device
CN105183665A (en) * 2015-09-08 2015-12-23 福州瑞芯微电子股份有限公司 Data-caching access method and data-caching controller
CN105608223A (en) * 2016-01-12 2016-05-25 北京中交兴路车联网科技有限公司 Hbase database entering method and system for kafka
US20160149801A1 (en) * 2013-06-13 2016-05-26 Tsx Inc. Apparatus and method for failover of device interconnect using remote memory access with segmented queue

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100395737C (en) * 2006-06-08 2008-06-18 杭州华三通信技术有限公司 Method for transmitting data between internal memory and digital signal processor
CN103150278B (en) * 2013-03-05 2014-03-05 中国人民解放军国防科学技术大学 Submission method of descriptor of network interface card (NIC) based on mixing of PIO (process input output) and DMA (direct memory access)

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050166206A1 (en) * 2004-01-26 2005-07-28 Parson Dale E. Resource management in a processor-based system using hardware queues
CN101208671A (en) * 2005-06-27 2008-06-25 起元软件有限公司 Managing message queues
US20070288931A1 (en) * 2006-05-25 2007-12-13 Portal Player, Inc. Multi processor and multi thread safe message queue with hardware assistance
CN101105786A (en) * 2006-07-14 2008-01-16 中兴通讯股份有限公司 Double CPU communication method based on shared memory
CN101409715A (en) * 2008-10-22 2009-04-15 中国科学院计算技术研究所 Method and system for communication using InfiniBand network
US20110153877A1 (en) * 2009-12-23 2011-06-23 King Steven R Method and apparatus to exchange data via an intermediary translation and queue manager
CN102541779A (en) * 2011-11-28 2012-07-04 曙光信息产业(北京)有限公司 System and method for improving direct memory access (DMA) efficiency of multi-data buffer
CN102970353A (en) * 2012-11-08 2013-03-13 大唐软件技术股份有限公司 Method and system for business data processing
CN103914341A (en) * 2013-01-06 2014-07-09 中兴通讯股份有限公司 Data queue de-queuing control method and device
US20160149801A1 (en) * 2013-06-13 2016-05-26 Tsx Inc. Apparatus and method for failover of device interconnect using remote memory access with segmented queue
CN103631624A (en) * 2013-11-29 2014-03-12 华为技术有限公司 Method and device for processing read-write request
CN103645942A (en) * 2013-12-12 2014-03-19 北京奇虎科技有限公司 Message queue based write and read method and system of shared memory
CN103761141A (en) * 2013-12-13 2014-04-30 北京奇虎科技有限公司 Method and device for realizing message queue
CN105095365A (en) * 2015-06-26 2015-11-25 北京奇虎科技有限公司 Information flow data processing method and device
CN105183665A (en) * 2015-09-08 2015-12-23 福州瑞芯微电子股份有限公司 Data-caching access method and data-caching controller
CN105608223A (en) * 2016-01-12 2016-05-25 北京中交兴路车联网科技有限公司 Hbase database entering method and system for kafka

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GERRY BAUER 等: "A Comprehensive Zero-Copy Architecture for High Performance Distributed Data Acquisition Over Advanced Network Technologies for the CMS Experiment", 《IEEE TRANSACTIONS ON NUCLEAR SCIENCE》, vol. 60, no. 6, pages 4595 - 4602, XP011534125, DOI: 10.1109/TNS.2013.2282340 *
丁颖 等: "高速信息交换系统中DMA接收控制器的设计", 《国外电子测量技术》, vol. 24, no. 11, pages 16 - 18 *

Also Published As

Publication number Publication date
CN109144742A (en) 2019-01-04
CN111158936B (en) 2024-04-09
CN109144742B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
US6611883B1 (en) Method and apparatus for implementing PCI DMA speculative prefetching in a message passing queue oriented bus system
US9672143B2 (en) Remote memory ring buffers in a cluster of data processing nodes
US6622193B1 (en) Method and apparatus for synchronizing interrupts in a message passing queue oriented bus system
TWI526838B (en) Memory device
US9678866B1 (en) Transactional memory that supports put and get ring commands
US8972630B1 (en) Transactional memory that supports a put with low priority ring command
KR101121592B1 (en) Processing apparatus with burst read write operations
US7849214B2 (en) Packet receiving hardware apparatus for TCP offload engine and receiving system and method using the same
US9678891B2 (en) Efficient search key controller with standard bus interface, external memory interface, and interlaken lookaside interface
WO2020000482A1 (en) Nvme-based data reading method, apparatus and system
US9594702B2 (en) Multi-processor with efficient search key processing
JP2006338538A (en) Stream processor
CN109144742B (en) Method for exchanging information through queue and system for processing queue
US9727521B2 (en) Efficient CPU mailbox read access to GPU memory
CN110737614B (en) Electronic equipment with DMA accelerator and DMA command processing method thereof
US11243767B2 (en) Caching device, cache, system, method and apparatus for processing data, and medium
US20160011995A1 (en) Island-based network flow processor with efficient search key processing
US20150177985A1 (en) Information processing device
US20110283068A1 (en) Memory access apparatus and method
JP2007207249A (en) Method and system for cache hit under miss collision handling, and microprocessor
US9632959B2 (en) Efficient search key processing method
US7284075B2 (en) Inbound packet placement in host memory
US10261700B1 (en) Method and apparatus for streaming buffering to accelerate reads
KR100898345B1 (en) Packet receiver hardware apparatus for tcp offload engine and system and method based on ??? packet receive hardware
TWI556102B (en) System and method for accessing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant