CN117196929B - Software and hardware interaction system based on fixed-length data packet - Google Patents

Software and hardware interaction system based on fixed-length data packet Download PDF

Info

Publication number
CN117196929B
CN117196929B CN202311242401.3A CN202311242401A CN117196929B CN 117196929 B CN117196929 B CN 117196929B CN 202311242401 A CN202311242401 A CN 202311242401A CN 117196929 B CN117196929 B CN 117196929B
Authority
CN
China
Prior art keywords
fixed
length data
target
data packet
sequence number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311242401.3A
Other languages
Chinese (zh)
Other versions
CN117196929A (en
Inventor
高卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltd filed Critical Muxi Integrated Circuit Shanghai Co ltd
Priority to CN202311242401.3A priority Critical patent/CN117196929B/en
Publication of CN117196929A publication Critical patent/CN117196929A/en
Application granted granted Critical
Publication of CN117196929B publication Critical patent/CN117196929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Systems (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a software and hardware interaction system based on fixed-length data packets, which comprises software, GPU hardware, software and a storage area, wherein the software comprises a driver and M processes, and the length of the fixed-length data packet corresponding to each process is R; the GPU hardware comprises a doorbell interface, N processing modules and a W group of state registers, wherein the doorbell interface is connected with the N processing modules, each processing module is connected with the X group of state registers, and each processing module can process fixed-length data packets corresponding to X processes in a time-sharing multiplexing mode; in the system initialization process, a driver is used for setting W annular buffer areas in the storage area, and establishing a mapping relation between the annular buffer areas and doorbell addresses; and in the process of storing the fixed-length data packet, updating the target write pointer sequence number and the target read pointer sequence number. The invention can reasonably issue the fixed-length data packet to the GPU hardware resource for processing, thereby improving the hardware performance of the GPU.

Description

Software and hardware interaction system based on fixed-length data packet
Technical Field
The invention relates to the technical field of computers, in particular to a software and hardware interaction system based on a fixed-length data packet.
Background
When the computer processes complex operations such as artificial intelligence (Artificial Intelligence, AI for short) operation, the software and graphics processor (English: graphics Processing Unit, abbreviated as GPU) hardware are needed to cooperatively process, and fixed-length data packets with equal length are issued by the software to the GPU hardware for processing. The software includes thousands of processes, each corresponding to a number of fixed-length data packets, and the GPU hardware resources are limited. The existing software and hardware interaction mechanism is complex, and the GPU hardware performance is poor. Therefore, how to reasonably issue the fixed-length data packet of the software process to the GPU hardware resource for processing, so as to improve the GPU hardware performance becomes a technical problem to be solved.
Disclosure of Invention
The invention aims to provide a software and hardware interaction system based on a fixed-length data packet, which can reasonably issue the fixed-length data packet to a GPU hardware resource for processing, thereby improving the hardware performance of the GPU.
According to one aspect of the invention, a software-hardware interaction system based on fixed-length data packets is provided, and the system comprises software, GPU hardware and a storage area which can be accessed by the software and the GPU hardware, wherein the software comprises a driver and M processes, and the length of the fixed-length data packet corresponding to each process is R; the GPU hardware comprises a doorbell interface, N processing modules and W groups of state registers, wherein M is greater than W and is greater than N, the doorbell interface is connected with the N processing modules, each processing module is connected with the X groups of state registers, each processing module can process fixed-length data packets corresponding to X processes in a time division multiplexing mode, and W=N X;
in the system initialization process, the driver is used for setting W annular buffer areas in the storage area, setting corresponding state information in W groups of state registers, and establishing a mapping relation between the annular buffer areas and doorbell addresses, wherein each annular buffer area corresponds to one doorbell address, the state information comprises a use state, a doorbell address, an annular buffer area starting address, a read pointer sequence number and a write pointer sequence number, the initial values of the read pointer sequence number and the write pointer sequence number are 0, the length L=Y×R of the annular buffer areas, and Y is the number of fixed-length data packets which can be stored in the annular buffer areas;
in the process of storing fixed-length data packets, the driver selects a target annular buffer area corresponding to a to-be-processed process based on the use state of a W group state register, determines the number of distributable fixed-length data packets corresponding to the to-be-processed process according to a target read pointer sequence number and a target write pointer sequence number corresponding to the target annular buffer area, and a target processing module corresponding to the target annular buffer area stores the to-be-distributed fixed-length data packets corresponding to the to-be-processed process into the target annular buffer area according to the number of distributable fixed-length data packets, and updates the target write pointer sequence number according to the number of the stored fixed-length data packets through the doorbell interface;
and in the fixed-length data packet processing process, the target processing module is used for reading the fixed-length data packet from the target annular buffer area to process when the target read pointer sequence number and the target write pointer sequence number are unequal, and updating the target read pointer sequence number in real time according to the number of the processed fixed-length data packet through the doorbell interface when the processing is completed.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the software and hardware interaction system based on the fixed-length data packet can achieve quite technical progress and practicality, has wide industrial utilization value, and has at least the following beneficial effects:
according to the system, the mapping relation between the W doorbell addresses and the annular buffer zone arranged in the storage area which can be accessed by both software and GPU hardware is established through the W group of state registers, the cooperative interaction of the software and GPU hardware resources is realized based on the information in the state registers, fixed-length data packets can be reasonably issued to the GPU hardware resources for processing, and the GPU hardware performance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a software and hardware interaction system based on a fixed-length data packet according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a software and hardware interaction system based on a fixed-length data packet, which is suitable for application scenes of issuing the fixed-length data packet such as AI operation, wherein the fixed-length data packet refers to the data packet with a fixed size. As shown in fig. 1, the system includes software, GPU hardware, and memory regions accessible to both the software and the GPU hardware. The software runs in a central processing unit (Central Processing Unit, CPU for short) and comprises a Driver and M processes, and the length of a fixed-length data packet corresponding to each process is R. The GPU hardware includes a doorbell interface (Doorbell Interface), N processing modules, and W-set status registers, M > W > N, it being noted that the hardware resources are limited, so W is typically on the order of 10, while software processes can be thousands of, and therefore M is much larger than W. The doorbell interface is connected with N processing modules, each processing module is connected with an X group of state registers, and each processing module can process fixed-length data packets corresponding to X processes in a time-sharing multiplexing mode, wherein W=N X; the processing module comprises a RISC-V instruction set, wherein RISC-V is an open Instruction Set Architecture (ISA) established based on a Reduced Instruction Set Computing (RISC) principle, and V is expressed as a fifth generation RISC (reduced instruction set computer) and represents a prototype chip of a fourth generation RISC processor.
In the system initialization process, the driver is used for setting W ring buffers (ringbuffers) in the storage area, setting corresponding state information in the W group of state registers, and establishing a mapping relation between the ring buffers and doorbell addresses, wherein each ring buffer corresponds to one doorbell address. The state information comprises a use state, a doorbell address, a ring buffer starting address, a read pointer sequence number and a write pointer sequence number, wherein initial values of the read pointer sequence number and the write pointer sequence number are 0, the length L=Y×R of the ring buffer, and Y is the number of fixed-length data packets which can be stored in the ring buffer. It should be noted that, the usage status stored in the status register can indicate whether the corresponding ring buffer can be reassigned to other processes. The doorbell address and the ring buffer starting address can define the mapping relation between the ring buffer and the doorbell address, and the next write starting address can be determined by combining the write pointer sequence number with the ring buffer starting address and the length R of the fixed-length data packet. The next read start address can be determined by combining the read pointer sequence number with the ring buffer start address and the length R of the fixed length data packet.
In the process of storing fixed-length data packets, the driver selects a target annular buffer area corresponding to a to-be-processed process based on the use state of a W group state register, determines the number of distributable fixed-length data packets corresponding to the to-be-processed process according to a target read pointer sequence number and a target write pointer sequence number corresponding to the target annular buffer area, and a target processing module corresponding to the target annular buffer area stores the to-be-distributed fixed-length data packets corresponding to the to-be-processed process into the target annular buffer area according to the number of distributable fixed-length data packets, and updates the target write pointer sequence number according to the number of the stored fixed-length data packets through the doorbell interface.
And in the fixed-length data packet processing process, the target processing module is used for reading the fixed-length data packet from the target annular buffer area to process when the target read pointer sequence number and the target write pointer sequence number are unequal, and updating the target read pointer sequence number in real time according to the number of the processed fixed-length data packet through the doorbell interface when the processing is completed.
The system updates the target read pointer sequence number and the target write pointer sequence number in real time according to the writing condition and the processing condition of the fixed-length data packet, and then determines the target read start address and the target write start address according to the target read pointer sequence number and the target write pointer sequence number, so that the cooperative interaction of software and hardware is realized, the fixed-length data packet can be reasonably issued to GPU hardware resources for processing, and the hardware performance of the GPU is improved.
As an embodiment, each ring buffer can only store a fixed-length data packet corresponding to one process at the same time, and the use states corresponding to the ring buffers comprise optional states and non-optional states; when the fixed-length data packet is stored in the annular buffer zone and the incomplete processing is completed, setting the corresponding use state as an unselected state; and when the fixed-length data packet is not stored in the shape buffer area or the corresponding process fixed-length data packet is processed completely, setting the corresponding use state as an optional state. It can be understood that the ring buffer is an annular first-in first-out queue connected end to end, when fixed-length data packets corresponding to one process are already allocated in the ring buffer and all fixed-length data packets are not processed, fixed-length data packets of other processes cannot be stored, but when all fixed-length data packets are processed, data packets of another process can be stored, the ring buffer is not required to be emptied, the mapping relation between the ring buffer and doorbell addresses is only required to be changed by changing the corresponding state register, and the fixed-length data packets of the other process can be directly covered with the fixed-length data packets in the corresponding ring buffer.
As an embodiment, in the process of storing the fixed-length data packet, if there is at least one ring buffer with a selectable usage state, the driver selects the ring buffer with the selectable usage state as the target ring buffer corresponding to the process to be processed. I.e. when there is a ring buffer to which a process has not yet been allocated, one may be directly selected as the target ring buffer to which the process to be processed corresponds.
As an embodiment, each process includes priority information, in the process of storing the fixed-length data packet, if the usage states of the W-group status registers are all non-selectable states, and there is a process with a priority lower than that of the process to be processed in the process of currently distributing the fixed-length data packet, the driver newly adds a ring buffer in the storage area as a target ring buffer corresponding to the process to be processed, selects a process to be adjusted from the processes of distributing the fixed-length data packet, sets the starting address of the ring buffer in the corresponding status register in the process to be adjusted as the starting address of the target ring buffer, and sets the initial sequence numbers of the corresponding read pointer and write pointer to 0. It should be noted that, in the processing process, the process with a low priority may need a processing result of the process with a high priority, but in the distributing process, the process with a low priority may be distributed first, when the hardware resource is insufficient to process the newly added ring buffer, a ring buffer may be directly added by adjusting the mapping relationship, and the newly added ring buffer may be processed preferentially.
As an embodiment, the system further includes a preset buffer area, the driver is further configured to store a ring buffer area starting address, a read pointer sequence number, and a write pointer sequence number corresponding to the process to be adjusted in the preset buffer area, and when the usage status of at least one set of status registers is updated to be a selectable status, the driver selects a set of status registers whose usage status is the selectable status, and updates the ring buffer area starting address, the read pointer sequence number, and the write pointer sequence number in the selected status registers to the ring buffer area starting address, the read pointer sequence number, and the write pointer sequence number corresponding to the process to be adjusted. According to the system, the preset buffer area is set to store the starting address of the annular buffer area, the sequence number of the read pointer and the sequence number of the write pointer corresponding to the process to be adjusted, when the use state of at least one group of state registers is updated to be the selectable state, the mapping relation between the annular buffer area corresponding to the process to be adjusted and the doorbell address can be quickly established, the process to be adjusted is continuously executed, and the interaction efficiency of software and hardware is improved.
In one embodiment, in the process of storing the fixed-length data packet, the driver is further configured to generate a fixed-length data packet distribution instruction based on a target doorbell address corresponding to the selected target ring buffer and fixed-length data packet information to be distributed, and send the fixed-length data packet distribution instruction to a target processing module corresponding to the target doorbell address through a doorbell interface, where the fixed-length data packet information to be distributed includes the number K of fixed-length data packets to be distributed and fixed-length data packet data to be distributed.
As an embodiment, in the process of storing the fixed-length data packets, the driver determines the number of distributable fixed-length data packets corresponding to the to-be-processed process according to the target read pointer sequence number and the target write pointer sequence number corresponding to the target ring buffer, including: the method comprises the steps of obtaining a target read pointer sequence number D and a target write pointer sequence number C, wherein the sequence numbers D and C correspond to the sequence numbers of storage units of a target annular buffer area, so that the method is realized in a cycle counting mode, and if D is smaller than C, the target write pointer is before the target read pointer, the distributable fixed-length data packet quantity E=Y- (C-D); if D > C, the target write pointer is described as following the target read pointer, then the number of distributable fixed length packets e=d-C.
As another embodiment, the values of D and C are all counted in an accumulated manner, and if C-D < Y, the number of distributable fixed-length packets e=y- (C-D); if c=d, or C-d=y, e=0.
In the process of storing the fixed-length data packets, the target processing module stores the fixed-length data packets to be distributed into the target annular buffer area according to the number of the distributable fixed-length data packets, and the method comprises the following steps: the target processing module analyzes the fixed-length data packet information to be distributed to obtain the number K of the fixed-length data packets to be distributed and the fixed-length data packet data to be distributed, if the current E is larger than or equal to K, the number K of the fixed-length data packets to be distributed is enough to store the K fixed-length data packets to be distributed, the K fixed-length data packets to be distributed are sequentially stored in the target annular buffer zone, if the E is smaller than K, the number K of the fixed-length data packets to be distributed is not enough to store the K fixed-length data packets to be distributed, and if the E is larger than or equal to K, the K fixed-length data packets to be distributed need to be waited to be met, and then the K fixed-length data packets to be distributed are sequentially stored in the target annular buffer zone.
As an embodiment, said updating, by the doorbell interface, the target write pointer sequence number according to the number of stored fixed-length data packets comprises:
if c+k is less than or equal to Y, indicating that the stored K fixed-length data packets to be distributed do not span the starting position of the target ring buffer, updating c=c+k; if c+k > Y, indicating that the stored K fixed-length packets to be distributed span the starting position of the target ring buffer, c=c+k-Y is updated.
As an embodiment, in the processing of the fixed-length data packet, the target processing module is configured to read the fixed-length data packet from the target ring buffer to process when the target read pointer sequence number and the target write pointer sequence number are unequal, and update, in real time, the target read pointer sequence number according to the number of the processed fixed-length data packets through the doorbell interface, including: and when the calculation resource of the target processing module is enough to process at least one fixed-length data packet, the fixed-length data packets are read one by one from the fixed-length data packet corresponding to the target read pointer sequence number to be processed, if D+1 is less than or equal to Y, D=D+1 is updated, and if D+1>Y, D=D+1-Y is updated. It should be noted that, the existing GPU hardware technology can acquire the occupation condition of the current hardware resource in real time, and determine whether the computing resource is sufficient to process at least one fixed-length data packet, which is not described herein. After the fixed-length data packet is processed, the information of the current read fixed-length data packet can be actively sent to the driver besides updating the corresponding target read pointer sequence number, so that the driver can quickly know whether the corresponding data packet can be distinguished to the target annular buffer area.
According to the system provided by the embodiment of the invention, the mapping relation between the W doorbell addresses and the annular buffer zone arranged in the storage area which can be accessed by both software and GPU hardware is established through the W group of state registers, the cooperative interaction of the software and GPU hardware resources is realized based on the information in the state registers, the fixed-length data packet can be reasonably issued to the GPU hardware resources for processing, and the performance of the GPU hardware is improved.
The present invention is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the invention.

Claims (8)

1. A software and hardware interaction system based on fixed-length data packets is characterized in that,
the method comprises software, GPU hardware and a storage area which can be accessed by the software and the GPU hardware, wherein the software comprises a driver and M processes, and the length of a fixed-length data packet corresponding to each process is R; the GPU hardware comprises a doorbell interface, N processing modules and W groups of state registers, wherein M is greater than W and is greater than N, the doorbell interface is connected with the N processing modules, each processing module is connected with the X groups of state registers, each processing module can process fixed-length data packets corresponding to X processes in a time division multiplexing mode, and W=N X;
in the system initialization process, the driver is used for setting W annular buffer areas in the storage area, setting corresponding state information in W groups of state registers, and establishing a mapping relation between the annular buffer areas and doorbell addresses, wherein each annular buffer area corresponds to one doorbell address, the state information comprises a use state, a doorbell address, an annular buffer area starting address, a read pointer sequence number and a write pointer sequence number, the initial values of the read pointer sequence number and the write pointer sequence number are 0, the length L=Y×R of the annular buffer areas, and Y is the number of fixed-length data packets which can be stored in the annular buffer areas;
in the process of storing fixed-length data packets, the driver selects a target annular buffer area corresponding to a to-be-processed process based on the use state of a W group state register, determines the number of distributable fixed-length data packets corresponding to the to-be-processed process according to a target read pointer sequence number and a target write pointer sequence number corresponding to the target annular buffer area, and a target processing module corresponding to the target annular buffer area stores the to-be-distributed fixed-length data packets corresponding to the to-be-processed process into the target annular buffer area according to the number of distributable fixed-length data packets, and updates the target write pointer sequence number according to the number of the stored fixed-length data packets through the doorbell interface;
and in the fixed-length data packet processing process, the target processing module is used for reading the fixed-length data packet from the target annular buffer area to process when the target read pointer sequence number and the target write pointer sequence number are unequal, and updating the target read pointer sequence number in real time according to the number of the processed fixed-length data packet through the doorbell interface when the processing is completed.
2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
each annular buffer zone can only store a fixed-length data packet corresponding to one process at the same time, and the use states corresponding to the annular buffer zones comprise selectable states and non-selectable states; when the fixed-length data packet is stored in the annular buffer zone and the incomplete processing is completed, setting the corresponding use state as an unselected state; and when the fixed-length data packet is not stored in the shape buffer area or the corresponding process fixed-length data packet is processed completely, setting the corresponding use state as an optional state.
3. The system of claim 2, wherein the system further comprises a controller configured to control the controller,
in the process of storing the fixed-length data packet, if at least one ring buffer with the use state being the optional state exists, the driver program selects the ring buffer with the use state being the optional state as a target ring buffer corresponding to the process to be processed.
4. The system of claim 2, wherein the system further comprises a controller configured to control the controller,
each process comprises priority information, in the process of storing the fixed-length data packet, if the use states of the W group of state registers are all non-selectable states and a process with priority lower than that of a process to be processed exists in the process of distributing the fixed-length data packet currently, the driver newly adds a ring buffer in the storage area to serve as a target ring buffer corresponding to the process to be processed, selects one process to be adjusted from the processes of distributing the fixed-length data packet, sets the starting address of the ring buffer in the corresponding state register in the process to be adjusted as the starting address of the target ring buffer, and sets the initial sequence numbers of the corresponding read pointer and write pointer to 0.
5. The system of claim 4, wherein the system further comprises a controller configured to control the controller,
the system also comprises a preset buffer zone, the driver is further used for storing a ring buffer zone starting address, a read pointer sequence number and a write pointer sequence number corresponding to the process to be adjusted in the preset buffer zone, when the use state of at least one group of state registers is updated to be the selectable state, the driver selects one group of state registers with the use state being the selectable state, and updates the ring buffer zone starting address, the read pointer sequence number and the write pointer sequence number in the selected state registers to the ring buffer zone starting address, the read pointer sequence number and the write pointer sequence number corresponding to the process to be adjusted.
6. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
in the process of storing the fixed-length data packets, the driver is further used for generating a fixed-length data packet distribution instruction based on a target doorbell address corresponding to the selected target annular buffer zone and fixed-length data packet information to be distributed, the fixed-length data packet distribution instruction is sent to a target processing module corresponding to the target doorbell address through a doorbell interface, and the fixed-length data packet information to be distributed comprises the number K of the fixed-length data packets to be distributed and the fixed-length data packet data to be distributed.
7. The system of claim 6, wherein the system further comprises a controller configured to control the controller,
in the process of storing the fixed-length data packets, the driver determines the number of the distributable fixed-length data packets corresponding to the to-be-processed process according to the target read pointer sequence number and the target write pointer sequence number corresponding to the target annular buffer area, and the method comprises the following steps:
acquiring a target read pointer sequence number D and a target write pointer sequence number C, and if D is smaller than C, determining the number E=Y- (C-D) of distributable fixed-length data packets; if D > C, the number of distributable fixed-length packets e=d-C.
8. The system of claim 7, wherein the system further comprises a controller configured to control the controller,
in the process of storing the fixed-length data packets, the target processing module stores the fixed-length data packets to be distributed into the target annular buffer area according to the number of the distributable fixed-length data packets, and the method comprises the following steps:
the target processing module analyzes the fixed-length data packet information to be distributed, acquires the number K of the fixed-length data packets to be distributed and the fixed-length data packet data to be distributed, sequentially stores the K fixed-length data packets to be distributed in a target annular buffer area if the current E is more than or equal to K, and sequentially stores the K fixed-length data packets to be distributed in the target annular buffer area if the current E is more than or equal to K otherwise.
CN202311242401.3A 2023-09-25 2023-09-25 Software and hardware interaction system based on fixed-length data packet Active CN117196929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311242401.3A CN117196929B (en) 2023-09-25 2023-09-25 Software and hardware interaction system based on fixed-length data packet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311242401.3A CN117196929B (en) 2023-09-25 2023-09-25 Software and hardware interaction system based on fixed-length data packet

Publications (2)

Publication Number Publication Date
CN117196929A CN117196929A (en) 2023-12-08
CN117196929B true CN117196929B (en) 2024-03-08

Family

ID=88984892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311242401.3A Active CN117196929B (en) 2023-09-25 2023-09-25 Software and hardware interaction system based on fixed-length data packet

Country Status (1)

Country Link
CN (1) CN117196929B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006241A (en) * 2010-12-17 2011-04-06 天津曙光计算机产业有限公司 Method for receiving message through buffer area shared by multiple applications
CN107124286A (en) * 2016-02-24 2017-09-01 深圳市知穹科技有限公司 A kind of mass data high speed processing, the system and method for interaction
CN109933438A (en) * 2019-01-31 2019-06-25 西南电子技术研究所(中国电子科技集团公司第十研究所) High speed shared drive data receiving-transmitting system
CN110167197A (en) * 2019-04-16 2019-08-23 武汉虹信通信技术有限责任公司 GTP downlink data transmission optimization method and device
CN110704335A (en) * 2019-09-03 2020-01-17 苏州浪潮智能科技有限公司 Data reading and writing method and device based on asynchronous ring buffer
CN110855610A (en) * 2019-09-30 2020-02-28 视联动力信息技术股份有限公司 Data packet processing method and device and storage medium
CN110998649A (en) * 2017-08-04 2020-04-10 微软技术许可有限责任公司 Flexible buffer size adjustment in a graphics processor
CN111292222A (en) * 2020-01-22 2020-06-16 中国科学院新疆天文台 Pulsar de-dispersion device and method
CN113535395A (en) * 2021-07-14 2021-10-22 西安电子科技大学 Descriptor queue and memory optimization method, system and application of network storage service
CN114008588A (en) * 2019-06-26 2022-02-01 Ati科技无限责任公司 Sharing multimedia physical functions in a virtualized environment of processing units
CN116523729A (en) * 2023-06-27 2023-08-01 深流微智能科技(深圳)有限公司 Graphics processing device, graphics rendering pipeline distribution method and related devices

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2546343A (en) * 2016-01-15 2017-07-19 Stmicroelectronics (Grenoble2) Sas Apparatus and methods implementing dispatch mechanisms for offloading executable functions

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006241A (en) * 2010-12-17 2011-04-06 天津曙光计算机产业有限公司 Method for receiving message through buffer area shared by multiple applications
CN107124286A (en) * 2016-02-24 2017-09-01 深圳市知穹科技有限公司 A kind of mass data high speed processing, the system and method for interaction
CN110998649A (en) * 2017-08-04 2020-04-10 微软技术许可有限责任公司 Flexible buffer size adjustment in a graphics processor
CN109933438A (en) * 2019-01-31 2019-06-25 西南电子技术研究所(中国电子科技集团公司第十研究所) High speed shared drive data receiving-transmitting system
CN110167197A (en) * 2019-04-16 2019-08-23 武汉虹信通信技术有限责任公司 GTP downlink data transmission optimization method and device
CN114008588A (en) * 2019-06-26 2022-02-01 Ati科技无限责任公司 Sharing multimedia physical functions in a virtualized environment of processing units
CN110704335A (en) * 2019-09-03 2020-01-17 苏州浪潮智能科技有限公司 Data reading and writing method and device based on asynchronous ring buffer
CN110855610A (en) * 2019-09-30 2020-02-28 视联动力信息技术股份有限公司 Data packet processing method and device and storage medium
CN111292222A (en) * 2020-01-22 2020-06-16 中国科学院新疆天文台 Pulsar de-dispersion device and method
CN113535395A (en) * 2021-07-14 2021-10-22 西安电子科技大学 Descriptor queue and memory optimization method, system and application of network storage service
CN116523729A (en) * 2023-06-27 2023-08-01 深流微智能科技(深圳)有限公司 Graphics processing device, graphics rendering pipeline distribution method and related devices

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Cadence先进封装EDA工具高效赋能CoWoS-S硅中介层设计和签核;谷雨等;《中国集成电路》;20231031;第32卷(第10期);第76-82页 *
GPU Synthesis of RF Channeliser Outputs for a Variable Bandwidth Microwave Digital Receiver;Simon Faulkner等;《2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS)》;20190203;第1-8页 *
基于嵌入式图像处理与超声辅助检测的导盲仪设计;高卫;《中国优秀硕士学位论文全文数据库》;20210228;工程科技Ⅱ辑 C028-163 *
基于飞腾平台的GPU图形加速驱动设计与实现;李荣振等;《计算机工程与应用》;20140331;第50卷(第05期);第126-131页 *

Also Published As

Publication number Publication date
CN117196929A (en) 2023-12-08

Similar Documents

Publication Publication Date Title
US6401155B1 (en) Interrupt/software-controlled thread processing
US7913034B2 (en) DRAM access command queuing
US7752349B2 (en) Apparatus and method for performing DMA data transfer
CN114185818B (en) GPU (graphics processing Unit) memory access self-adaptive optimization method and device based on extended page table
US20040148606A1 (en) Multi-thread computer
CN113836184A (en) Service persistence method and device
US6457121B1 (en) Method and apparatus for reordering data in X86 ordering
CN114610472A (en) Multi-process management method in heterogeneous computing and computing equipment
US8972693B2 (en) Hardware managed allocation and deallocation evaluation circuit
CN110515872B (en) Direct memory access method, device, special computing chip and heterogeneous computing system
JPH0358150A (en) Memory controller
CN117196929B (en) Software and hardware interaction system based on fixed-length data packet
JPH1196072A (en) Memory access control circuit
EP2437159A1 (en) Operation apparatus and control method thereof
TW201351276A (en) Scheduling and execution of compute tasks
US8307165B1 (en) Sorting requests to the DRAM for high page locality
CN111863139B (en) Gene comparison acceleration method and system based on near-memory computing structure
JPH08212178A (en) Parallel computer
CN112711442A (en) Host command writing method, device and system and readable storage medium
JP2009059276A (en) Data processing apparatus and program
US9367487B1 (en) Mitigating main crossbar load using dedicated connections for certain traffic types
US5875299A (en) disk access apparatus for performing a stride processing of data
US6633928B2 (en) Fast process context switching buffer
CN114265694A (en) Memory replacement method and device
CN118151837A (en) FPGA-based dispersion/aggregation processing method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant