CN116188247B - Register information processing method, device, computer equipment and storage medium - Google Patents

Register information processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116188247B
CN116188247B CN202310097876.1A CN202310097876A CN116188247B CN 116188247 B CN116188247 B CN 116188247B CN 202310097876 A CN202310097876 A CN 202310097876A CN 116188247 B CN116188247 B CN 116188247B
Authority
CN
China
Prior art keywords
dma command
storage
address information
command
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310097876.1A
Other languages
Chinese (zh)
Other versions
CN116188247A (en
Inventor
汪莹
武凤霞
王子鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Granfei Intelligent Technology Co.,Ltd.
Original Assignee
Glenfly Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glenfly Tech Co Ltd filed Critical Glenfly Tech Co Ltd
Priority to CN202310097876.1A priority Critical patent/CN116188247B/en
Publication of CN116188247A publication Critical patent/CN116188247A/en
Application granted granted Critical
Publication of CN116188247B publication Critical patent/CN116188247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bus Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application relates to a register information processing method, a register information processing device, computer equipment and a storage medium. The method comprises the following steps: receiving a recovery DMA command issued by the graphics processor; acquiring first address information corresponding to a non-executed storage DMA command; acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information; if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and the recovery DMA command, namely, not executing the storage DMA command and the recovery DMA command; if the second address information is inconsistent with the first address information, the register information of the graphic processor is processed according to the unexecuted storage DMA command, and the register information of the graphic processor is processed according to the restoration DMA command, namely, the storage DMA command is executed first, and then the restoration DMA command is executed. By adopting the method, the redundant register information storage and recovery actions of the GPU can be avoided, so that the overall performance of the GPU is improved.

Description

Register information processing method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of chip development technologies, and in particular, to a register information processing method, an apparatus, a computer device, a storage medium, and a computer program product.
Background
In the design of a Graphic Processing Unit (GPU) chip, in order to simultaneously execute a plurality of tasks (tasks) in a GPU and optimize the number of commands generated by a software driving layer as much as possible, the software driving layer can send a save DMA command when task switching is needed, so that GPU hardware writes full chip image into a storage unit; when the next time the task needs to be continued, the command stream parser reads all the register contents from the storage unit, and then returns to the GPU by restore DMA command. In the process, all the register information is stored and restored by inserting the save+ restore DMA command, so that the software driver layer can be prevented from issuing all the register configuration commands again when switching back from other tasks to the current task, and the workload of the software driver layer and the number of commands required to be executed by hardware are greatly reduced.
However, after the register cache and save+ restore DMA command are introduced, because the 3D rendering scene is complex and variable, once the command optimization of the software driver layer is neglected, the GPU hardware layer receives the storage and recovery command sequences frequently. At some point, it may happen that the save DMA command of a task has just been received, and restore DMA command for that task is immediately received. For this scenario, if all commands received are also executed on a job-by-job basis, a significant portion of the store, restore actions may be redundant, severely impacting the overall performance of the GPU.
The existing GPU has redundant storage and recovery actions, and the overall performance of the GPU is affected.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a register information processing method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the overall performance of a GPU.
In a first aspect, the present application provides a register information processing method. The method comprises the following steps:
receiving a recovery DMA command issued by the graphics processor;
acquiring first address information corresponding to a non-executed storage DMA command;
acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information;
if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and restoring the DMA command;
and if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
In one embodiment, before acquiring the first address information corresponding to the unexecuted storage DMA command, the method further includes:
Judging whether a non-executed storage DMA command exists or not;
and if the unexecuted storage DMA command exists, executing the step of acquiring the first address information corresponding to the unexecuted storage DMA command.
In one embodiment, the method further comprises:
and if the unexecuted storage DMA command does not exist, processing the register information of the graphic processor according to the recovery DMA command.
In one embodiment, processing register information of a graphics processor in accordance with a resume DMA command includes:
and restoring the register information in the register of the graphic processor according to a restoring command list corresponding to the restoring DMA command.
In one embodiment, the method further comprises:
receiving a storage DMA command issued by a graphics processor;
judging whether the corresponding forced execution mark exists in the storage DMA command or not;
if the corresponding forced execution mark exists in the storage DMA command, processing the register information of the graphic processor according to the storage DMA command;
if the corresponding forced execution mark does not exist in the storage DMA command, the storage DMA command is taken as an unexecuted storage DMA command.
In one embodiment, processing register information for a graphics processor in accordance with a store DMA command includes:
And storing the register information in the register of the graphic processor according to a storage command list corresponding to the storage DMA command.
In one embodiment, processing register information for a graphics processor in accordance with a store DMA command includes:
acquiring third address information corresponding to a storage DMA command and fourth address information corresponding to a last executed storage DMA command;
comparing the third address information with the fourth address information;
if the third address information is consistent with the fourth address information, determining that the register information has updated target registers from all registers of the graphics processor;
and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command.
In one embodiment, the method further comprises:
if the third address information and the fourth address information are inconsistent, the register information in each register is read from all registers of the graphics processor, and the register information in each register is stored according to a storage command list corresponding to the storage DMA command.
In one embodiment, after processing the register information of the graphics processor according to the storage DMA command, the method further comprises:
Destroying a storage command list corresponding to the storage DMA command, and recording a corresponding destruction mark;
receiving a new storage DMA command and identifying a storage space corresponding to the new storage DMA command;
if the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphics processor as target registers, and clearing the recorded destruction mark;
and reading target register information in the target register, and storing the target register information.
In a second aspect, the present application further provides a register information processing apparatus. The device comprises:
the receiving module is used for receiving a recovery DMA command issued by the graphic processor;
the acquisition module is used for acquiring first address information corresponding to the unexecuted storage DMA command;
the comparison module is used for acquiring second address information corresponding to the recovered DMA command and comparing the second address information with the first address information;
the execution module is used for skipping the unexecuted storage DMA command and restoring the DMA command if the second address information is consistent with the first address information;
the execution module is further configured to process the register information of the graphics processor according to the non-executed storage DMA command if the second address information is inconsistent with the first address information, and process the register information of the graphics processor according to the recovery DMA command.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
receiving a recovery DMA command issued by the graphics processor;
acquiring first address information corresponding to a non-executed storage DMA command;
acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information;
if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and restoring the DMA command;
and if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
receiving a recovery DMA command issued by the graphics processor;
acquiring first address information corresponding to a non-executed storage DMA command;
Acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information;
if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and restoring the DMA command;
and if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
receiving a recovery DMA command issued by the graphics processor;
acquiring first address information corresponding to a non-executed storage DMA command;
acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information;
if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and restoring the DMA command;
and if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
The register information processing method, the register information processing device, the computer equipment, the storage medium and the computer program product receive a recovery DMA command issued by the graphics processor; acquiring first address information corresponding to a non-executed storage DMA command; acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information; if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and the recovery DMA command, namely, not executing the storage DMA command and the recovery DMA command; if the second address information is inconsistent with the first address information, the register information of the graphic processor is processed according to the unexecuted storage DMA command, and the register information of the graphic processor is processed according to the restoration DMA command, namely, the storage DMA command is executed first, and then the restoration DMA command is executed. The method can avoid the redundant register information storage and recovery actions of the GPU, thereby improving the overall performance of the GPU.
Drawings
FIG. 1 is a diagram of an application environment for a register information processing method in one embodiment;
FIG. 2 is a schematic diagram of a register cache in one embodiment;
FIG. 3 is a diagram illustrating a register cache update process according to one embodiment;
FIG. 4 is a flow chart of a register information processing method according to an embodiment;
FIG. 5 is a diagram of a store command list template in one embodiment;
FIG. 6 is a diagram of a resume command list template in one embodiment;
FIG. 7 is a block diagram of the internal architecture of a controller in one embodiment;
FIG. 8 is a flow diagram of processing a resume DMA command in one embodiment;
FIG. 9 is a flow diagram of processing a store DMA command in one embodiment;
FIG. 10 is a schematic diagram of an improved register cache architecture in one embodiment;
FIG. 11 is a flow chart illustrating the use of register state cache 2 in one embodiment;
FIG. 12 is a schematic diagram of a process flow of two memory DMA commands in front and back in one embodiment;
FIG. 13 is a diagram showing a process of changing the chip image in the register data buffer 2 and the memory cell according to one embodiment;
FIG. 14 is a block diagram showing a configuration of a register information processing apparatus in one embodiment;
fig. 15 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The register information processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the controller 102 is configured in the command stream parser 104 of the graphics processor and is capable of communicating with the command stream parser 104. The command stream parser 104 is mounted on a graphics processor chip. Graphics processors involve many modules in the graphics rendering process, including generally known output assembly units, vertex shaders, pixel shaders, rasterizing components, depth culling units, output merging units, and so forth. In hardware design, an appropriate amount of register information is generally defined for each module in advance. The software driving layer controls the working flow of each module through configuring register information so as to achieve the final drawing effect. To filter some redundant register configuration instructions, the hardware design adds a block of register caches in the command stream parser, which stores almost all of the register information involved in the pipeline stage, commonly referred to as GPU chip image information.
One possible register cache structure is shown in fig. 2. It can be seen that this register cache is divided into two major parts: register data cache (register data buffer) and register state cache (register status buffer). Each minimum unit of the register data cache part is a double word, and the numerical value of each register is recorded; and each minimum unit of the register state buffer part is bit, and the state of each register is recorded (whether the value changes or not, and 1 indicates that the value has a change, namely, the dirty state). Each minimum unit in the two parts of caches is in a one-to-one correspondence. As shown in fig. 3, when a new register configuration command is received, the register data cache and the register state cache are updated. The purpose of adding the register state buffer is to filter out the unchanged registers between the adjacent drawing commands (draw/gp/2 d command), and also to alleviate the hardware burden caused by the redundant register configuration commands sent by the software driver layer to a certain extent. When a drawing command is encountered each time, the command stream analyzer selects the register data of dirty to be sent to a subsequent pipeline module according to the state information in the register state cache, and clears the corresponding state cache bit information. When the command stream parser receives a drawing command, it first issues a read command to the register state buffer to take the dirty state of the corresponding register. And for the register with dirty=1, triggering the read instruction of the register data cache part, reading out the corresponding register data, sending the corresponding register data to a subsequent module, and resetting the register state cache of the part.
In one embodiment, as shown in fig. 4, a register information processing method is provided, which is described by taking an example that the method is applied to the controller 102 in fig. 1, and includes the following steps:
step 402, a resume DMA command issued by a graphics processor is received.
Wherein the resume DMA command is generally denoted restore DMA command.
Optionally, the controller receives a resume DMA command issued by a command stream parser of the graphics processor.
Step 404, obtaining the first address information corresponding to the unexecuted storage DMA command.
Address information is generally indicated as context base address and is used to characterize the starting memory address of a DMA command from which the task being executed by the current command can be determined. For store, restore command streams, the software driver layer may wrap them into two nested commands (known as DMA commands) for ease of recall. Unlike a normal DMA command, a DMA command containing a store and resume command contains the following information:
DMA address: for reading all commands from the memory cells. This information applies to all DMA commands;
DMA dw size: the nest command contains all dw numbers. This information applies to all DMA commands;
Context base address: the software prepares for the current task a start address for storing full chip image information pertaining to the task. Is unique information for storing and recovering DMA command.
For DMA command containing a storage command (hereinafter collectively referred to as save DMA command), a save command (for storing register information in a register cache into a memory) and a plurality of query dump command (for storing register information that does not put a register into a register cache module) are generally included therein, and finally, the previous storage command is guaranteed to be executed by a fe+ wait command pair. To achieve repeated invocation of commands, all commands contained in the save DMA command are templates (templates), and each command is not specific address information any more, but is offset information where register information to be currently stored is located in a full chip image, as shown in fig. 5. And the start address of a specific full chip image is recorded by "Context base address". Corresponding to the command list included in the save DMA command in FIG. 5, for DMA commands containing resume commands (hereinafter generally referred to as restore DMA command), one restore command and a number set register with address command are typically included therein, and only context base address is placed over restore DMA command, with only offset information on all commands included therein, as shown in FIG. 6.
Optionally, the controller queries whether the non-executed storage DMA command exists in the self-cache, and if the non-executed storage DMA command exists, identifies the first address information from the non-executed storage DMA command.
Step 406, obtaining the second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information.
Optionally, the controller identifies the second address information from the resume DMA command, and then compares the second address information with the first address information to determine whether the two address information are identical.
Step 408, if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and the resume DMA command; and if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
Alternatively, if the second address information is consistent with the first address information, the controller determines that the currently received recovery DMA command and the unexecuted storage DMA command are both executing the same task, and may skip the unexecuted storage DMA command and the recovery DMA command at the same time, and neither of them is executed. If the second address information is inconsistent with the first address information, the controller judges that the currently received recovery DMA command and the unexecuted storage DMA command are executing different tasks, the unexecuted storage DMA command of the cache is executed first, and then the currently received recovery DMA command is executed.
In the register information processing method, a recovery DMA command issued by a graphic processor is received; acquiring first address information corresponding to a non-executed storage DMA command; acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information; if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and the recovery DMA command, namely, not executing the storage DMA command and the recovery DMA command; if the second address information is inconsistent with the first address information, the register information of the graphic processor is processed according to the unexecuted storage DMA command, and the register information of the graphic processor is processed according to the restoration DMA command, namely, the storage DMA command is executed first, and then the restoration DMA command is executed. The method can avoid the redundant register information storage and recovery actions of the GPU, thereby improving the overall performance of the GPU.
In one embodiment, the internal structure of a controller is shown in fig. 7, and the controller is a small module embedded in the command stream parser of the GPU, which includes the following parts:
the input to the controller includes two parts:
store DMA commands or resume DMA commands (commonly referred to as Save/restore DMA command): when the command stream processor encounters save/restore DMA command, they are sent to the master to determine how to execute these DMA commands;
Store command template sequences in DMA commands or command template sequences in resume commands (commonly referred to as Save/restore DMA command list template data): command list in save/restore DMA is typically in template form. To be unaffected by the template update by the software, two large enough caches may be configured to store save/restore DMA command list template. When the save/restore DMA is received for the first time, command list template is read from the memory unit and stored in the corresponding cache.
Master (commonly referred to as Controller): controlling an internal state machine, and judging whether the input save DMA command is executed or temporarily resides locally; for input restore DMA command, it is determined whether the native save DMA commands before waking up are to be executed together or both DMA commands are to be skipped.
Two buffers (commonly referred to as Save/Restore DMA command list template buffer) for temporarily storing the sequence of command templates in the DMA command or the sequence of command templates in the resume command, respectively: store save/restore DMA command list template cache. When the cache size is designed, the size of the save/restore command list can be properly enlarged according to software so as to improve the adaptation capability.
A buffer (generally referred to as Last save DMA context Base address, lc_base for short) for temporarily storing the start memory address of the previous DMA command: a local 64bit variable is used to record context base address carried in the last save DMA command and compare it with restore DMA command later to determine whether the two DMA commands belong to the same task.
A buffer (generally referred to as Last save DMA stall Status, abbreviated as ls_status) for indicating whether the state information of the previous storage DMA command is temporarily stored: a local 1bit variable to record whether the last save DMA command resides locally unexecuted: "0" indicates that execution has been performed; "1" indicates that no execution is performed, and context Base address carried by the save DMA command is already recorded on LC_Base.
In one embodiment, before acquiring the first address information corresponding to the unexecuted storage DMA command, the method further includes: judging whether a non-executed storage DMA command exists or not; and if the unexecuted storage DMA command exists, executing the step of acquiring the first address information corresponding to the unexecuted storage DMA command. And if the unexecuted storage DMA command does not exist, processing the register information of the graphic processor according to the recovery DMA command.
Wherein processing register information of the graphics processor according to the resume DMA command includes: and restoring the register information in the register of the graphic processor according to a restoring command list corresponding to the restoring DMA command. The recovery command list may also be referred to as restore DMA command list.
Alternatively, as illustrated in the controller configuration of fig. 7, as shown in fig. 8, after the master receives restore DMA command, if it receives restore DMA command for the first time after power up, it needs to read restore DMA command list template from the storage unit and store it in the local cache "Restore DMA command list template buffer". Upon receipt of restore DMA command, not all command lists within are executed immediately, but rather it is determined from the local "LS_status" and "LC_Base" whether there is a native save DMA command resident that is not executed, and whether the save DMA command and current restore DMA command need to be executed. LS_Status is shown as 1 or 0, LS_Status is 1 indicating that currently a non-executed save DMA command is also temporarily stored; lc_base is staged context Base address above the native, unexecuted save DMA command. The judgment criteria are as follows: if "ls_status=1" and restore DMA command has context Base address the same as "lc_base", it means that the save DMA command of a certain task is received first, then restore DMA command of the certain task is received, then the two DMA commands are skipped directly, and then the subsequent command is executed. Care should be taken here to clear "LS_status" indicating that there is no live DMA command already resident locally. If "ls_status=1" but context Base address carried above restore DMA command is different from "lc_base", it means that the received save/restore DMA command belongs to a different task, and it is necessary to execute command list in the locally resident save DMA first, and then execute restore DMA command list just received. In execution, note that address save DMA command list is added to "lc_base", and address restore DMA command list is added to context Base address that it carries. Finally, the LS_status needs to be cleared. If "ls_status=0", then no save DMA command resides locally and is not executed, at which point restore DMA command list may be executed directly.
In this embodiment, it is determined whether there is a non-executed storage DMA command; and if the unexecuted storage DMA command exists, executing the step of acquiring the first address information corresponding to the unexecuted storage DMA command. And if the unexecuted storage DMA command does not exist, processing the register information of the graphic processor according to the recovery DMA command. The method can avoid the redundant register information storage and recovery actions of the GPU, thereby improving the overall performance of the GPU.
In one embodiment, the method further comprises: receiving a storage DMA command issued by a graphics processor; judging whether the corresponding forced execution mark exists in the storage DMA command or not; if the corresponding forced execution mark exists in the storage DMA command, processing the register information of the graphic processor according to the storage DMA command; if the corresponding forced execution mark does not exist in the storage DMA command, the storage DMA command is taken as an unexecuted storage DMA command.
Wherein processing register information of the graphics processor according to the memory DMA command includes: and storing the register information in the register of the graphic processor according to a storage command list corresponding to the storage DMA command. The list of storage commands may also be referred to as save DMA command list.
Alternatively, as illustrated in the controller structure of fig. 7, after receiving the save DMA command, if the save DMA command is received for the first time after power-up, the master needs to read save DMA command list template from the storage unit and store it in the local cache "Save DMA command list buffer" as shown in fig. 9. To prevent the hardware from skipping the last save DMA command, the software driver layer marks the last save DMA command with a force execution flag. Seeing this flag, "Save/Restore DMA controller" will force this Save DMA command received: and adding context base address to the offset of all command, and sequentially sending to a parser for parsing. If the forced execution flag is not set on the current save DMA command, the local 'LC_Base' is updated by context Base address carried by the current save DMA command, and after 'LS_Status' is set to 1, the parser is informed that the parsing of the DMA command has ended, and the next command can be started to be parsed. For Save DMA command without forced execution flag, it will not execute immediately after "Save/Restore DMA controller" is entered, but will judge if they need to execute when waiting for the next DMA to come after storing locally; for the save DMA command marked with the forced execution flag (typically, the last save DMA command), it is directly executed.
In this embodiment, a storage DMA command issued by a graphics processor is received; judging whether the corresponding forced execution mark exists in the storage DMA command or not; if the corresponding forced execution mark exists in the storage DMA command, processing the register information of the graphic processor according to the storage DMA command; if the corresponding forced execution mark does not exist in the storage DMA command, the storage DMA command is taken as an unexecuted storage DMA command. The method can avoid the redundant register information storage and recovery actions of the GPU, thereby improving the overall performance of the GPU.
In one embodiment, processing register information for a graphics processor in accordance with a store DMA command includes: acquiring third address information corresponding to a storage DMA command and fourth address information corresponding to a last executed storage DMA command; comparing the third address information with the fourth address information; if the third address information is consistent with the fourth address information, determining that the register information has updated target registers from all registers of the graphics processor; and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command. If the third address information and the fourth address information are inconsistent, the register information in each register is read from all registers of the graphics processor, and the register information in each register is stored according to a storage command list corresponding to the storage DMA command.
Optionally, a new "register state cache 2" is added to the "register cache" of fig. 2, and a 64bit register (denoted as LastSaveAddr) is added, and the structure is shown in fig. 10. The update of the state information in the "register state cache 2" is similar to that in the "register state cache 1" (i.e. "register cache" in fig. 2), and it is determined whether the state information needs to be updated to the dirty state according to the comparison result of the comparator after the register configuration information is obtained; in contrast, the read and clear actions of the "register state cache 2" are triggered by the "store command". The 64bit variable LastSaveAddr is used to record the fourth address information carried in the last received storage command, and is initialized to 0xffff, ffff. As shown in fig. 11, each time a storage command is received, the third address information CurrentSaveCmdAddr carried on the current storage command is compared with the fourth address information stored in lastsavevaddr: if the register information is the same, according to the register state cache 2, only the changed register information is read out from the register data cache and then written into the storage unit; if the two addresses are different, all register data in the register data cache are written into the storage unit in sequence, the storage address carried by the current storage command is updated into the local LastSaveAddr for the next use, and the register state cache 2 is cleared.
In this embodiment, third address information corresponding to a storage DMA command is obtained, and fourth address information corresponding to a last executed storage DMA command is obtained; comparing the third address information with the fourth address information; if the third address information is consistent with the fourth address information, determining that the register information has updated target registers from all registers of the graphics processor; and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command. The GPU can be prevented from processing redundant register information storage actions, so that the overall performance of the GPU is improved.
In one embodiment, after processing the register information of the graphics processor according to the storage DMA command, further comprising: destroying a storage command list corresponding to the storage DMA command, and recording a corresponding destruction mark; receiving a new storage DMA command and identifying a storage space corresponding to the new storage DMA command; if the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphics processor as target registers, and clearing the recorded destruction mark; and reading target register information in the target register, and storing the target register information.
Alternatively, during the application, during the destruction (destroy) and the reconstruction (create) of the command sequence (command list), the virtual address space for storing the chip image of the previous command sequence may be allocated to the newly created command sequence when reconstructing the new command sequence, for storing the chip image of the new command sequence. While the software driver layer allocates memory space for the newly created command sequence, it will typically initialize the memory space to 0. A new command "ForceSaveAll command" was introduced. As shown in fig. 12, when the software driver layer performs the destroy action after processing the previous command list1, it needs to record the destroy action and generate a destroy flag. When a new storage command list2 is sent to the hardware next, inserting a 'ForceSaveAll command' according to the previously recorded destroy action before executing the command list2, and then inserting the storage command and clearing the destroy flag; when the command stream parser parses the command sequence, for "ForceSaveAll command", all flag information in "register state cache 2" needs to be unconditionally set to dirty, and then all subsequent commands are parsed normally. The command stream parser still attempts to optimize when parsing the storage command using the flow shown in fig. 11. Since the "register state cache 2" has been set to dirty all after the previous receipt of "ForceSaveAll command", there is no problem in the result even if the address of the current store command is found to be the same as the last store command. The process of changing the chip image in the register data buffer and memory cell is shown in fig. 13. Each cell in the left column diagram represents a register data cache, the shaded cell represents corresponding data of dirty in the register state cache 2, each cell in the right column diagram represents a chip image in a storage unit, and the shaded cell represents updated data of the current step. After inserting steps 1204 and 1205, it is ensured that the contents of the register data cache at step 1207 are identical to those at step 1206.
In this embodiment, a storage command list corresponding to the storage DMA command is destroyed, and a corresponding destruction mark is recorded; receiving a new storage DMA command and identifying a storage space corresponding to the new storage DMA command; if the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphics processor as target registers, and clearing the recorded destruction mark; and reading target register information in the target register, and storing the target register information. The occurrence of a storage error of register information can be avoided.
In one embodiment, a register information processing method includes:
a storage DMA command issued by the graphics processor is received.
It is determined whether the corresponding forced execution flag exists for the storage DMA command.
If the corresponding forced execution mark exists in the storage DMA command, storing the register information in the register of the graphic processor according to a storage command list corresponding to the storage DMA command.
If the corresponding forced execution mark does not exist in the storage DMA command, the storage DMA command is taken as an unexecuted storage DMA command.
And receiving a recovery DMA command issued by the graphics processor.
It is determined whether there are unexecuted memory DMA commands.
If the unexecuted storage DMA command does not exist, restoring the register information in the register of the graphic processor according to a restoring command list corresponding to the restoring DMA command.
If the unexecuted storage DMA command exists, acquiring first address information corresponding to the unexecuted storage DMA command; acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information; if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and restoring the DMA command; and if the second address information is inconsistent with the first address information, processing the register information of the graphics processor according to the unexecuted storage DMA command, and recovering the register information in the register of the graphics processor according to a recovery command list corresponding to the recovery DMA command.
In one embodiment, a method of executing a storage DMA command includes:
and under the condition that the storage DMA command can be executed, acquiring third address information corresponding to the storage DMA command and acquiring fourth address information corresponding to the last executed storage DMA command.
And comparing the third address information with the fourth address information.
If the third address information is consistent with the fourth address information, determining that the register information has updated target registers from all registers of the graphics processor; and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command.
If the third address information and the fourth address information are inconsistent, the register information in each register is read from all registers of the graphics processor, and the register information in each register is stored according to a storage command list corresponding to the storage DMA command.
After the register information of the graphic processor is processed according to the storage DMA command, a storage command list corresponding to the storage DMA command is destroyed, and a corresponding destruction mark is recorded.
And receiving a new storage DMA command and identifying a storage space corresponding to the new storage DMA command.
If the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphics processor as target registers, and clearing the recorded destruction mark; and reading target register information in the target register, and storing the target register information.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a register information processing device for realizing the above-mentioned related register information processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the register information processing device or devices provided below may refer to the limitation of the register information processing method hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 14, there is provided a register information processing apparatus 1400, comprising: a receiving module 1401, an acquiring module 1402, a comparing module 1403 and an executing module 1404, wherein:
the receiving module 1401 is configured to receive a resume DMA command issued by the graphics processor.
The acquiring module 1402 is configured to acquire first address information corresponding to an unexecuted storage DMA command.
The comparison module 1403 is configured to obtain second address information corresponding to the recovered DMA command, and compare the second address information with the first address information.
And an execution module 1404, configured to skip the unexecuted storage DMA command and resume the DMA command if the second address information is consistent with the first address information.
The execution module 1404 is further configured to process the register information of the graphics processor according to the non-executed storage DMA command if the second address information and the first address information are inconsistent, and process the register information of the graphics processor according to the resume DMA command.
In one embodiment, the fetch module 1402 is further configured to determine if there are unexecuted storage DMA commands; and if the unexecuted storage DMA command exists, executing the step of acquiring the first address information corresponding to the unexecuted storage DMA command.
In one embodiment, if there are no unexecuted storage DMA commands, the execution module 1404 is further configured to process register information of the graphics processor in accordance with the resume DMA command.
In one embodiment, the execution module 1404 is further configured to restore register information in a register of the graphics processor according to a restore command list corresponding to the restore DMA command.
In one embodiment, the receiving module 1401 is further configured to receive a storage DMA command issued by the graphics processor.
The execution module 1404 is further configured to determine whether a corresponding forced execution flag exists for the storage DMA command; if the corresponding forced execution mark exists in the storage DMA command, processing the register information of the graphic processor according to the storage DMA command; if the corresponding forced execution mark does not exist in the storage DMA command, the storage DMA command is taken as an unexecuted storage DMA command.
In one embodiment, the executing module 1404 is further configured to store register information in a register of the graphics processor according to a storage command list corresponding to the storage DMA command.
In one embodiment, the executing module 1404 is further configured to obtain third address information corresponding to a storage DMA command, and obtain fourth address information corresponding to a last executed storage DMA command; comparing the third address information with the fourth address information; if the third address information is consistent with the fourth address information, determining that the register information has updated target registers from all registers of the graphics processor; and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command.
In one embodiment, the executing module 1404 is further configured to, if the third address information and the fourth address information are inconsistent, respectively read the register information in each register from all registers of the graphics processor, and store the register information in each register according to a storage command list corresponding to the storage DMA command.
In one embodiment, the executing module 1404 is further configured to destroy a storage command list corresponding to the storage DMA command, and record a corresponding destruction flag; receiving a new storage DMA command and identifying a storage space corresponding to the new storage DMA command; if the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphics processor as target registers, and clearing the recorded destruction mark; and reading target register information in the target register, and storing the target register information.
The respective modules in the above-described register information processing apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 15. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing DMA command data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a register information processing method.
It will be appreciated by those skilled in the art that the structure shown in fig. 15 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application is applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of: receiving a recovery DMA command issued by the graphics processor; acquiring first address information corresponding to a non-executed storage DMA command; acquiring second address information corresponding to the recovered DMA command, and comparing the second address information with the first address information; if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and restoring the DMA command; and if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: judging whether a non-executed storage DMA command exists or not; and if the unexecuted storage DMA command exists, executing the step of acquiring the first address information corresponding to the unexecuted storage DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: and if the unexecuted storage DMA command does not exist, processing the register information of the graphic processor according to the recovery DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: and restoring the register information in the register of the graphic processor according to a restoring command list corresponding to the restoring DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: receiving a storage DMA command issued by a graphics processor; judging whether the corresponding forced execution mark exists in the storage DMA command or not; if the corresponding forced execution mark exists in the storage DMA command, processing the register information of the graphic processor according to the storage DMA command; if the corresponding forced execution mark does not exist in the storage DMA command, the storage DMA command is taken as an unexecuted storage DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: and storing the register information in the register of the graphic processor according to a storage command list corresponding to the storage DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring third address information corresponding to a storage DMA command and fourth address information corresponding to a last executed storage DMA command; comparing the third address information with the fourth address information; if the third address information is consistent with the fourth address information, determining that the register information has updated target registers from all registers of the graphics processor; and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: if the third address information and the fourth address information are inconsistent, the register information in each register is read from all registers of the graphics processor, and the register information in each register is stored according to a storage command list corresponding to the storage DMA command.
In one embodiment, the processor when executing the computer program further performs the steps of: destroying a storage command list corresponding to the storage DMA command, and recording a corresponding destruction mark; receiving a new storage DMA command and identifying a storage space corresponding to the new storage DMA command; if the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphics processor as target registers, and clearing the recorded destruction mark; and reading target register information in the target register, and storing the target register information.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (12)

1. A register information processing method, the method comprising:
receiving a recovery DMA command issued by the graphics processor;
acquiring first address information corresponding to a non-executed storage DMA command;
acquiring second address information corresponding to the recovery DMA command, and comparing the second address information with the first address information;
if the second address information is consistent with the first address information, skipping the unexecuted storage DMA command and the restoration DMA command;
And if the second address information is inconsistent with the first address information, processing the register information of the graphic processor according to the unexecuted storage DMA command, and processing the register information of the graphic processor according to the recovery DMA command.
2. The method of claim 1, further comprising, prior to the obtaining the first address information corresponding to the unexecuted storage DMA command:
judging whether a non-executed storage DMA command exists or not;
and if the unexecuted storage DMA command exists, executing the step of acquiring the first address information corresponding to the unexecuted storage DMA command.
3. The method according to claim 2, wherein the method further comprises:
and if the unexecuted storage DMA command does not exist, processing the register information of the graphic processor according to the recovery DMA command.
4. A method according to any one of claims 1 or 3, wherein said processing register information of said graphics processor in accordance with said resume DMA command comprises:
and restoring the register information in the register of the graphic processor according to the restoring command list corresponding to the restoring DMA command.
5. The method according to claim 1, wherein the method further comprises:
receiving a storage DMA command issued by a graphics processor;
judging whether the storage DMA command has a corresponding forced execution mark or not;
if the corresponding forced execution mark exists in the storage DMA command, processing the register information of the graphic processor according to the storage DMA command;
and if the storage DMA command does not have the corresponding forced execution mark, taking the storage DMA command as an unexecuted storage DMA command.
6. The method of claim 5, wherein said processing register information of said graphics processor in accordance with said storage DMA command comprises:
and storing the register information in the register of the graphic processor according to a storage command list corresponding to the storage DMA command.
7. The method of claim 5, wherein said processing register information of said graphics processor in accordance with said storage DMA command comprises:
acquiring third address information corresponding to the storage DMA command and fourth address information corresponding to the last executed storage DMA command;
comparing the third address information with the fourth address information;
If the third address information is consistent with the fourth address information, determining a target register with changed register information from all registers of the graphic processor;
and reading target register information in the target register, and storing the target register information according to a storage command list corresponding to the storage DMA command.
8. The method of claim 7, wherein the method further comprises:
and if the third address information and the fourth address information are inconsistent, respectively reading the register information in each register from all registers of the graphics processor, and storing the register information in each register according to a storage command list corresponding to the storage DMA command.
9. The method of claim 5, wherein after processing the register information of the graphics processor according to the storage DMA command, further comprising:
destroying a storage command list corresponding to the storage DMA command and recording a corresponding destruction mark;
receiving a new storage DMA command, and identifying a storage space corresponding to the new storage DMA command;
If the storage space corresponding to the new storage DMA command is the same as the storage space corresponding to the storage DMA command, and the recorded destruction mark can be queried, taking all registers of the graphic processor as target registers, and clearing the recorded destruction mark;
and reading target register information in the target register, and storing the target register information.
10. A register information processing apparatus, the apparatus comprising:
the receiving module is used for receiving a recovery DMA command issued by the graphic processor;
the acquisition module is used for acquiring first address information corresponding to the unexecuted storage DMA command;
the comparison module is used for acquiring second address information corresponding to the recovery DMA command and comparing the second address information with the first address information;
the execution module is used for skipping the unexecuted storage DMA command and the recovery DMA command if the second address information is consistent with the first address information;
and the execution module is further used for processing the register information of the graphic processor according to the unexecuted storage DMA command and processing the register information of the graphic processor according to the recovery DMA command if the second address information is inconsistent with the first address information.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.
CN202310097876.1A 2023-02-06 2023-02-06 Register information processing method, device, computer equipment and storage medium Active CN116188247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310097876.1A CN116188247B (en) 2023-02-06 2023-02-06 Register information processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310097876.1A CN116188247B (en) 2023-02-06 2023-02-06 Register information processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116188247A CN116188247A (en) 2023-05-30
CN116188247B true CN116188247B (en) 2024-04-12

Family

ID=86445686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310097876.1A Active CN116188247B (en) 2023-02-06 2023-02-06 Register information processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116188247B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63191239A (en) * 1987-02-04 1988-08-08 Hitachi Ltd Method and device for instruction restart processing
US4797812A (en) * 1985-06-19 1989-01-10 Kabushiki Kaisha Toshiba System for continuous DMA transfer of virtually addressed data blocks
JP2006202271A (en) * 2004-12-22 2006-08-03 Nec Electronics Corp Stream processor and information processor
CN102937939A (en) * 2012-10-10 2013-02-20 无锡众志和达存储技术股份有限公司 DMA (Direct Memory Access) address couple pre-reading method based on SATA (Serial Advanced Technology Attachment) controller
CN106951374A (en) * 2016-01-06 2017-07-14 北京忆芯科技有限公司 Method and its device for checking block page address
CN109522254A (en) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 Arithmetic unit and method
CN111861858A (en) * 2020-07-10 2020-10-30 浪潮(北京)电子信息产业有限公司 Method and device for starting image compression function verification and related equipment
CN111880916A (en) * 2020-07-27 2020-11-03 长沙景嘉微电子股份有限公司 Multi-drawing task processing method, device, terminal, medium and host in GPU
CN114416614A (en) * 2022-01-19 2022-04-29 安徽芯纪元科技有限公司 Interrupt processing module for protecting and restoring site
CN114879584A (en) * 2022-07-05 2022-08-09 成都智明达电子股份有限公司 DMA controller boundary alignment method based on FPGA and circuit thereof
CN115237599A (en) * 2022-07-27 2022-10-25 格兰菲智能科技(武汉)有限公司 Rendering task processing method and device
CN115563021A (en) * 2022-05-16 2023-01-03 苏州忆联信息系统有限公司 Method and device for improving repeated reading performance based on solid state disk and computer equipment
CN115686625A (en) * 2021-07-31 2023-02-03 华为技术有限公司 Integrated chip and instruction processing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5057360B2 (en) * 2006-08-08 2012-10-24 ルネサスエレクトロニクス株式会社 Semiconductor device, data processing device, and access method to storage device

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4797812A (en) * 1985-06-19 1989-01-10 Kabushiki Kaisha Toshiba System for continuous DMA transfer of virtually addressed data blocks
JPS63191239A (en) * 1987-02-04 1988-08-08 Hitachi Ltd Method and device for instruction restart processing
JP2006202271A (en) * 2004-12-22 2006-08-03 Nec Electronics Corp Stream processor and information processor
CN102937939A (en) * 2012-10-10 2013-02-20 无锡众志和达存储技术股份有限公司 DMA (Direct Memory Access) address couple pre-reading method based on SATA (Serial Advanced Technology Attachment) controller
CN114968099A (en) * 2016-01-06 2022-08-30 北京忆芯科技有限公司 NVM (non-volatile memory) access method and NVM controller
CN106951374A (en) * 2016-01-06 2017-07-14 北京忆芯科技有限公司 Method and its device for checking block page address
CN109522254A (en) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 Arithmetic unit and method
CN111861858A (en) * 2020-07-10 2020-10-30 浪潮(北京)电子信息产业有限公司 Method and device for starting image compression function verification and related equipment
CN111880916A (en) * 2020-07-27 2020-11-03 长沙景嘉微电子股份有限公司 Multi-drawing task processing method, device, terminal, medium and host in GPU
CN115686625A (en) * 2021-07-31 2023-02-03 华为技术有限公司 Integrated chip and instruction processing method
CN114416614A (en) * 2022-01-19 2022-04-29 安徽芯纪元科技有限公司 Interrupt processing module for protecting and restoring site
CN115563021A (en) * 2022-05-16 2023-01-03 苏州忆联信息系统有限公司 Method and device for improving repeated reading performance based on solid state disk and computer equipment
CN114879584A (en) * 2022-07-05 2022-08-09 成都智明达电子股份有限公司 DMA controller boundary alignment method based on FPGA and circuit thereof
CN115237599A (en) * 2022-07-27 2022-10-25 格兰菲智能科技(武汉)有限公司 Rendering task processing method and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
DMA方式在TRS-80微型机和图象存储体间的实现;朱定华, 刘启茂, 刘海龙;电子技术应用;19850307(第03期);全文 *
Yi-Fu Wang ,et al.Design of 2D DMA Controller for Crypto-engine in Video Stream Data Encryption Card.2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT).2022,全文. *
一种基于反馈信息的地址寄存器提升方法;张超;吕方;王蕾;冯晓兵;;计算机研究与发展;20090415(第04期);全文 *
基于MemDMA方式的ADSP-BF533指令存储器的迂回访问;杨波, 杨俊, 王跃科, 周永彬;电子质量;20041230(第12期);全文 *
片上异构多核DSP同步与通信的实现;刘建;陈杰;敖天勇;许汉荆;;电子科技大学学报;20100730(第04期);全文 *
面向稀疏矩阵向量乘的DMA设计与验证;曹亚松等;计算机与数字工程;20191231;全文 *

Also Published As

Publication number Publication date
CN116188247A (en) 2023-05-30

Similar Documents

Publication Publication Date Title
US7545381B2 (en) Interruptible GPU and method for context saving and restoring
KR101636870B1 (en) Method and apparatus for generating minimal boot image
CN111279384B (en) Compression and decompression of indices in a graphics pipeline
CN111208933A (en) Data access method, device, equipment and storage medium
CN104216771A (en) Restarting method and device for software program
JP5057887B2 (en) Data update device, data update method, and data update program
JP2023553220A (en) Process mining for multi-instance processes
US20240045787A1 (en) Code inspection method under weak memory ordering architecture and corresponding device
CN116188247B (en) Register information processing method, device, computer equipment and storage medium
CN112882859A (en) Virtual machine synthetic backup method and system
GB2601354A (en) Apparatus and method
CN115328704A (en) File backup method, file recovery method, device, equipment and storage medium
CN111651124B (en) Parallel reconstruction method, device, equipment and medium for SSD mapping table multi-core partition
CN114153690A (en) Program memory monitoring method and device, computer equipment and storage medium
JP2009020695A (en) Information processing apparatus and system
JP2008140236A (en) Memory management system, information processor and memory management method
US7333106B1 (en) Method and apparatus for Z-buffer operations
JP2004062851A (en) Storage system for directly and redundantly writable nonvolatile memory without needing use of redundant field, and writing method therefor
CN114185645B (en) Memory processing method and device
CN112463460B (en) Erasure code-based small block data recovery method and system
CN111176572B (en) Method, device, equipment and medium for protecting stored data
CN116069536A (en) Data restoration method, device, computer equipment, storage medium and program product
JP2009271828A (en) Storage device, storage device control system, method for controlling storage device, and program
CN117349042A (en) Multi-process data communication method and communication device based on shared memory
CN118296078A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 200135, 11th Floor, Building 3, No. 889 Bibo Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Granfei Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 200135 Room 201, No. 2557, Jinke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee before: Gryfield Intelligent Technology Co.,Ltd.

Country or region before: China