CN114661354A - Instruction processing method, instruction processing device, computer equipment and storage medium - Google Patents

Instruction processing method, instruction processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN114661354A
CN114661354A CN202210390691.5A CN202210390691A CN114661354A CN 114661354 A CN114661354 A CN 114661354A CN 202210390691 A CN202210390691 A CN 202210390691A CN 114661354 A CN114661354 A CN 114661354A
Authority
CN
China
Prior art keywords
instruction
hardware
current
instructions
hardware operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210390691.5A
Other languages
Chinese (zh)
Inventor
曾丞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Original Assignee
Changsha Jingmei Integrated Circuit Design Co ltd
Changsha Jingjia Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Jingmei Integrated Circuit Design Co ltd, Changsha Jingjia Microelectronics Co ltd filed Critical Changsha Jingmei Integrated Circuit Design Co ltd
Priority to CN202210390691.5A priority Critical patent/CN114661354A/en
Publication of CN114661354A publication Critical patent/CN114661354A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses an instruction processing method, an instruction processing device, computer equipment and a storage medium, and relates to the field of computers. The instruction processing method determines the number of the operation instructions in real time, and simultaneously sends a plurality of hardware operation instructions to the kernel driver under the condition that the number of the current hardware operation instructions is larger than the preset number, so that the kernel driver controls the graphics processor to execute hardware acceleration based on the operation instructions. The technical problem that the hardware acceleration execution efficiency is low at present is solved, and the technical effect of improving the hardware acceleration execution efficiency is achieved.

Description

Instruction processing method, instruction processing device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an instruction processing method and apparatus, a computer device, and a storage medium.
Background
With the development of computer technology and image processing technology, hardware acceleration technology has come to be used, where hardware acceleration refers to a technology that allocates a large amount of work to a special hardware such as a GPU (graphics processing unit) in an operating system to perform processing, so as to reduce the amount of computation of a CPU (central processing unit) and improve the processing efficiency.
Currently, a tool commonly used for hardware acceleration is Xorg (open source software for image processing), during the process of Xorg calling exa (Ex-kaa Architecture) driving, the out-of-core driver and the in-core driver need to frequently interact with the hardware acceleration instruction, and the interval time between two adjacent hardware acceleration instructions is long, thereby causing the execution efficiency of hardware acceleration to be low.
Disclosure of Invention
In order to solve the foregoing technical problem, embodiments of the present application provide a method, an apparatus, a computer device, and a storage medium for instruction processing.
In a first aspect of an embodiment of the present application, there is provided an instruction processing method, including:
determining the instruction number of the current hardware operation instruction in real time;
if the current instruction number is larger than the preset number, the preset number of the hardware operation instructions are simultaneously sent to the kernel driver so that the kernel driver can control the graphics processor to execute hardware acceleration based on the operation instructions.
In an optional embodiment of the present application, determining the instruction number of the current hardware operation instruction in real time includes:
acquiring operation information for executing hardware operation aiming at a register in real time;
generating a corresponding hardware operation instruction according to the operation information;
the instruction number of the current hardware operation instruction is determined.
In an optional embodiment of the present application, determining the instruction number of the current hardware operation instruction includes:
caching the generated hardware operation instruction into an instruction queue;
the queue length of the current instruction queue is determined.
In an optional embodiment of the present application, if the current number of the instructions is greater than a preset number, the preset number of the hardware operation instructions are simultaneously sent to an out-of-core driver, so that the out-of-core driver controls a graphics processor to execute hardware acceleration based on each instruction information, including:
if the queue length of the current instruction queue is larger than the preset queue length, the hardware operation instruction in the current instruction queue is simultaneously sent to the kernel driver, so that the kernel driver controls the graphics processor to execute hardware acceleration based on the instruction information.
In an optional embodiment of the present application, the method further comprises:
if the current instruction number is not larger than the preset number, acquiring new operation information for executing hardware operation aiming at the register;
generating a corresponding new hardware operation instruction based on the new operation information;
updating the instruction number based on the new hardware operation instruction until the current instruction number is larger than the preset number, and simultaneously sending the current hardware operation instruction to the kernel driver so as to enable the kernel driver to execute hardware acceleration based on the instruction operation.
In an optional embodiment of the present application, the operation information at least includes: at least one of operation source address, operation target address and operation content.
In an optional embodiment of the present application, the method further comprises:
determining the highest threshold value of the number of the hardware operation instructions executed in the kernel driving unit time;
and determining the preset number according to the highest threshold value.
In an alternative embodiment of the present application, the hardware operation instructions include: at least one of a copy instruction, a fill instruction, and a fuse instruction.
In a second aspect of the embodiments of the present application, there is provided an instruction processing apparatus, including: an instruction determining module and an instruction sending module,
the instruction determining module is used for determining the instruction quantity of the current hardware operation instruction in real time;
the instruction sending module is configured to send the hardware operation instructions of the preset number to the intra-core driver simultaneously if the number of the current instructions is greater than the preset number, so that the intra-core driver controls the graphics processor to execute hardware acceleration based on each of the operation instructions.
In a third aspect of embodiments of the present application, there is provided a computer device, including: comprising a memory storing a computer program and a processor implementing the steps of the method as claimed in any one of the above when the processor executes the computer program.
In a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method as in any one of the above.
According to the instruction processing method, the number of the operation instructions is determined in real time, and under the condition that the number of the current hardware operation instructions is larger than the preset number, a plurality of hardware operation instructions are simultaneously sent to the kernel driver so that the kernel driver controls the graphics processor to execute hardware acceleration based on the operation instructions, so that the situation that the kernel driver only sends one hardware operation instruction at a time, the kernel driver only processes one hardware operation instruction at a time and needs to wait for receiving the next instruction at a time, frequent interaction between the kernel driver and the kernel driver is caused, and the kernel driver can simultaneously execute a plurality of hardware operation instructions is avoided, so that the technical problem that the execution efficiency of the current hardware acceleration is low is solved, and the technical effect of improving the instruction processing efficiency is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic view of an application scenario of an instruction processing method according to an embodiment of the present application;
FIG. 2 is a diagram illustrating a fusion operation in an instruction processing method according to an embodiment of the present application;
FIG. 3 is a first flowchart of a method for processing instructions according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for processing instructions according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for processing instructions according to an embodiment of the present application;
FIG. 6 is a fourth flowchart of an instruction processing method according to an embodiment of the present application;
FIG. 7 is a fifth flowchart of a method for processing instructions according to an embodiment of the present application;
FIG. 8 is a first block diagram of an instruction processing apparatus according to an embodiment of the present application;
FIG. 9 is a block diagram of a second exemplary embodiment of an instruction processing apparatus;
fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In carrying out the present application, the inventors have discovered that current performance is poor in executing hardware acceleration instructions. In view of the foregoing problems, an embodiment of the present application provides an instruction processing method to improve the processing efficiency of a hardware acceleration instruction.
The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In the prior art, a tool commonly used for hardware acceleration is Xorg, frequent interaction needs to be performed on hardware acceleration instructions by an extranuclear driver and an intranuclear driver in the process of calling exa driver by Xorg, and the interval time between two adjacent hardware acceleration instructions is long, so that the execution efficiency of hardware acceleration is low.
In view of this, the embodiment of the present application provides an instruction processing method, which determines the number of operation instructions in real time, and sends multiple hardware operation instructions to an intra-core driver at the same time when the number of current hardware operation instructions is greater than a preset number, so that the intra-core driver controls a graphics processor to execute hardware acceleration based on each operation instruction, thereby solving the technical problem that the execution efficiency of current hardware acceleration is low, and achieving the technical effect of improving the instruction processing efficiency.
The following briefly describes an application environment of the instruction processing method provided by the embodiment of the present application:
referring to fig. 1, an instruction processing method provided in an embodiment of the present application is applied to an instruction processing system 10, where the instruction processing system 10 at least includes a driver module for performing functions of configuring, interacting, and processing hardware acceleration instruction information, and an image processor 103 for performing acceleration operation, where the driver module at least includes a user-mode out-core driver 101 and an in-core driver 102. The out-of-core driver 101 refers to a user-mode driver for running a general user application, and the in-core driver 102 refers to a core portion of an operating system for running a driver, and the like, generally, programs between the two drivers are independent from each other and are not shared, so that when performing hardware acceleration, the out-of-core driver 101 needs to send a generated hardware acceleration instruction to the in-core driver 102 to instruct the program based on and inside the in-core driver or the graphics processor 103 to further perform hardware acceleration.
The hardware acceleration instruction and the corresponding hardware acceleration operation are various, and examples thereof include a copy operation of copying a source image into a target region, a fill operation of performing a filling process of a designated color on a source region, a fusion operation of performing a fusion process on an image of the source region and an image of the target region, and the like, which are not exhaustive here. For example, in the fusion operation, please refer to fig. 2, the source image is cut according to the mask size and then fused to the target image to obtain the target image.
Referring to fig. 3, the following embodiment specifically describes the method provided by the embodiment of the present application applied to the out-of-core area to accelerate the hardware acceleration instruction by using the out-of-core driver 101 as an execution subject. The instruction processing method provided by the embodiment of the application comprises the following steps 301-302:
step 301, the out-of-core driver determines the instruction number of the current hardware operation instruction in real time.
The hardware operation instruction may be generated by the extranuclear driver in real time according to the current configuration information, or may be sent by another device, such as a core processor, and the like. After the out-of-core driver obtains the hardware operation instruction, the obtained hardware operation instruction can be stored in a list, a set and the like in a loading or caching mode, and the number of the current hardware operation instruction is determined by the embodiment. The extra-core driver may determine the number of the operation instructions in a plurality of ways, such as a way of separate superposition calculation, a way of storing the operation instructions to the queue and counting the length of the queue, a way of storing the operation instructions to the buffer area and counting the number of the buffer area, and the like. For example, the number of currently received hardware operation instructions is updated in real time in a counting manner, or the total capacity of the list or set is fixed, the hardware operation instructions obtained each time are filled into the corresponding list or set in real time, and after the list or set is filled, it can be determined that the current instruction number is the total capacity of the list or set, for example, 20 or 50.
Step 302, if the number of the current instructions is greater than a preset number, the out-of-core driver simultaneously sends the hardware operation instructions of the preset number to the in-core driver, so that the in-core driver controls the graphics processor to execute hardware acceleration based on each operation instruction.
And after receiving each hardware operation instruction, the in-core driver determines and acquires operation data corresponding to each hardware operation instruction in a parallel mode, and sends the operation data to the graphics processor so that the graphics processor can execute hardware acceleration operation corresponding to the hardware acceleration instruction. The preset number may be set according to a user requirement, efficiency or capability of the out-of-core driver to obtain the hardware operation instruction, or may be set according to performance or efficiency of the in-core driver to execute the hardware operation instruction in unit time, and the like. The method and the device can also carry out classified sending according to the type of the hardware operation instruction when sending the operation instruction to the kernel driver, and the kernel driver can process a plurality of operation instructions of the same type in unit time after receiving the operation instructions of various types, so that the instruction processing efficiency is improved.
According to the embodiment of the application, the number of the operation instructions is determined in real time, and under the condition that the number of the current hardware operation instructions is larger than the preset number, a plurality of hardware operation instructions are simultaneously sent to the kernel driver so that the kernel driver controls the graphics processor to execute hardware acceleration based on the operation instructions, thereby avoiding the situation that the kernel driver only sends one hardware operation instruction at a time, the kernel driver only processes one hardware operation instruction at a time and needs to wait for receiving the next instruction, and causing frequent interaction between the kernel driver and the kernel driver, and the kernel driver can simultaneously execute a plurality of hardware operation instructions, thereby solving the technical problem that the execution efficiency of the current hardware acceleration is lower, and achieving the technical effect of improving the instruction processing efficiency.
Referring to fig. 4, in an alternative embodiment of the present application, the step 401 of determining, by the extranuclear driver, the instruction number of the current hardware operation instruction in real time includes the following steps 401 to 402:
step 401, the out-of-core driver acquires the operation information of executing the hardware operation aiming at the register in real time.
The operation information refers to basic configuration information corresponding to the hardware operation, for example, the hardware operation is an image fusion operation, and then the corresponding operation information may include a source image address, a target address, a fusion size, a fusion shape, and the like; if the hardware operation is a data copy operation, the corresponding operation information may include: source data address, target address, content identification or range identification of copy data, and the like; if the hardware operation is an image filling operation, the corresponding operation information may include: the source image address, the filling color, the filling size, the filling color or the filling content, etc. are not exhaustive, and can be specifically set according to the actual situation.
And 402, generating the corresponding hardware operation instruction by the out-core driver according to the operation information.
After acquiring the operation information through step 401, the extranuclear driver parses the operation information to determine the source object of the operation, the executed action, and the target object of the operation corresponding to the operation information, and finally determines the operation instruction corresponding to the source object of the operation, the executed action, and the target object of the operation. For example, the operation information is analyzed, it is determined that the source of the operation is the original image, the executed action is the copy, and the copy instruction is generated based on the operation information if the target address of the operation.
In step 403, the out-of-core driver determines the instruction number of the current hardware operation instruction.
The extracore driver may determine to obtain the corresponding number of instructions based on the same manner as in step 301, which is not described herein again. When the number of the operation instructions is determined in the method provided by the application, the types of the operation instructions can be determined first, and then the number of the operation instructions of different types can be determined respectively.
The method and the device can determine the number of the instructions stored or cached by the out-of-core driver by determining the number of the instructions of the current hardware operation instructions, and can judge when all the received instructions are sent to the in-core driver by the number of the instructions, so that the efficiency of instruction processing is improved.
In an alternative embodiment of the present application, the present application also discloses how to determine the instruction number of the current hardware operation instruction, please refer to fig. 5, where the step 403 includes steps 501 to 502:
and step 501, caching the generated hardware operation instruction into an instruction queue by the out-of-core driver.
The off-core driver caches the hardware operation instruction in a queue, or caches the hardware operation instruction in a preset instruction queue, where a storage space of the preset instruction queue may be preset. In the method, the operation instruction is cached in the queue, or the instruction queue can be established in real time according to the operation instruction received in real time, that is, one operation instruction is received, and the queue is updated once.
Step 502, the out-of-core driver determines the queue length of the current instruction queue.
The length of the queue can be determined in a plurality of ways: two determination manners are listed here, but the present application is not limited to the following manners:
the first method is as follows: if the instruction queue is in the condition of presetting the queue length, when the received operation instruction fills the preset queue, the length of the queue can be directly determined according to the preset length of the preset instruction queue; the method for determining the length of the queue can determine the length of the instruction queue only according to the length of the preset instruction queue, and the length of the cached operation instruction does not need to be calculated independently, so that the calculation efficiency can be improved, and the calculation resources can be saved.
The second method comprises the following steps: if the instruction queue is established in real time according to the real-time received operation instruction, the queue length may be determined according to the real-time received operation instruction, and the queue length may be calculated in real time according to the preset length in this manner.
The length of the instruction is calculated through the form of the queue, compared with the number of the instruction which is calculated independently, the queue can provide the number of the instruction which is calculated in a more ordered and clear mode, and the efficiency of calculating the length of the instruction is improved conveniently.
In an optional embodiment of the present application, after determining the queue length of the current instruction queue, the present application further includes:
if the queue length of the current instruction queue is larger than the preset queue length, the hardware operation instruction in the current instruction queue is simultaneously sent to the kernel driver, so that the kernel driver controls the graphics processor to execute hardware acceleration based on the instruction information.
According to the method, when the number of the hardware operation instructions is judged to be larger than the preset number, judgment is carried out in a queue form, the judgment through the queue length is more visual compared with real-time counting, and the timeliness requirement is low, so that the calculation resources of the out-of-core drive can be greatly saved, and the instruction processing efficiency is further improved.
Meanwhile, whether the number of the hardware operation instructions is larger than the preset number is judged through the form of the queue, and when all the received instructions are sent to the kernel driver can be determined, so that the resource waste caused by the fact that the number of the operation instructions sent to the kernel driver by the kernel driver is too small is avoided, the overload of the processing load of the kernel driver caused by the fact that the number of the operation instructions sent to the kernel driver by the kernel driver is too large is avoided, and the instruction processing efficiency is improved.
In an alternative embodiment of the present application, the present application further discloses how to preset the number of hardware operation instructions, please refer to fig. 6, wherein step 302 includes the following steps 601-602:
step 601, the out-of-core driver determines the highest threshold of the number of the hardware operation instructions executed in the unit time of the in-core driver.
The out-of-core driver may determine a highest threshold of the number of hardware operation instructions executed by the in-core driver in a unit time by acquiring hardware parameters of the in-core driver, where the hardware parameters at least include an instruction type, an instruction length, and the like.
If the number of the intra-core drivers interacting with the extra-core drivers is multiple, the extra-core driver determines a target intra-core driver corresponding to the current hardware operation instruction from the multiple intra-core drivers, or selects a currently idle intra-core driver from the multiple intra-core drivers, or selects a target intra-core driver from the multiple intra-core drivers according to experience by a worker, which is not particularly limited in this embodiment.
In the method provided by the application, the highest threshold for determining the number of the hardware operation instructions can also be determined according to the types of different hardware operation instructions, and the highest threshold can be determined according to different types of operation instructions, so that different highest thresholds are set for different types of operation instructions because the processing resources required for processing different types of operation instructions are different, for example, a higher highest threshold can be set for an operation instruction with a larger operation amount, and thus the processing requirements of different operation types can be considered; and by setting a lower highest threshold value for the operation instruction with a smaller operation amount, the computing resources can be saved.
Step 602, the out-of-core driver determines the preset number according to the highest threshold.
The instruction length of each type of operation instruction is different, the required processing resources are also different, and correspondingly, the highest threshold value of each type of operation instruction is also different, so that different preset quantities can be determined according to each type of operation instruction.
When the operation instruction is sent to the kernel driver, the judgment condition for sending the operation instruction is determined according to the processing capacity of the kernel driver, so that not only can a larger number of operation instructions be sent in unit time, but also the operation instructions sent in unit time can be prevented from exceeding the processing threshold of the kernel driver, and the instruction processing efficiency can be improved.
Referring to fig. 7, in an optional embodiment of the present application, after the step 301, determining, by the extracore driver, the instruction number of the current hardware operation instruction in real time, the instruction processing method further includes the following steps 701 to 703:
step 701, if the number of the current instructions is less than or equal to the preset number, the out-of-core driver continues to wait for receiving new operation information.
The out-of-core driver may determine whether the number of currently received instructions is greater than a preset number after receiving a plurality of hardware operation instructions, and continue to wait for receiving new operation information if the number of currently received instructions is less than or equal to the preset number.
Step 702, generate a corresponding new hardware operation instruction based on the new operation information.
The extracore driver may generate a corresponding new hardware operation instruction based on the same manner as in step 401, which is not described herein again. After the out-of-core driver generates the hardware operation instruction, the out-of-core driver may further store or cache the hardware operation instruction, and determine the number of the current hardware operation instruction.
And 703, updating the instruction number based on the new hardware operation instruction until the current instruction number is larger than the preset number, and simultaneously sending the current hardware operation instruction to the intra-core driver so that the intra-core driver executes hardware acceleration based on the instruction operation.
After updating the number of instructions, the extranuclear driver also judges the number of instructions, and if the current number of instructions is less than the preset number, the step 701 is continued until the current number of instructions is greater than the preset number.
According to the method, when the out-of-core driver acquires a new operation instruction, the operation instruction is not directly sent to the in-core driver, but the number of the operation instructions is judged first and then whether the operation instructions are sent is determined; if the number of the acquired hardware operation instructions is smaller than the preset number, the out-core driver waits to receive a new operation instruction, so that the time waste caused by the fact that the out-core driver only sends one operation instruction to the in-core driver every time and the in-core driver waits for a second hardware operation instruction is avoided.
Meanwhile, if the number of the hardware operation instructions is larger than the preset number, the out-core driver sends a plurality of instructions to the inner-core driver at one time, and the in-core driver acquires the plurality of instructions at one time, so that the plurality of hardware operation instructions can be executed at the same time, the technical problem that the existing hardware acceleration execution efficiency is low is solved, and the technical effect of improving the instruction processing efficiency is achieved.
By the method, the time for the kernel driver to wait for the kernel driver to send the hardware acceleration instruction can be greatly shortened, the number of times of interaction between the user state driver and the kernel state driver is reduced, meanwhile, the time for the GPU to wait for the kernel driver is reduced, and finally the utilization rate of hardware is improved.
In an optional embodiment of the present application, the hardware operation instruction processed by the out-of-core driver includes: at least one of a copy instruction, a fill instruction, and a fuse instruction.
The method provided by the application can process different types of hardware operation instructions, can meet the control requirements of users on different hardware or the processing requirements on different data, and improves the data processing compatibility.
It should be understood that, although the steps in the flowchart are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.
Referring to fig. 8, an embodiment of the present application provides an instruction processing apparatus 80, including: an instruction determination module 801 and an instruction sending module 802, wherein:
the instruction determining module 801 is configured to determine the instruction number of the current hardware operation instruction in real time;
the instruction sending module 802 is configured to send the hardware operation instructions of the preset number to the intra-core driver at the same time if the number of the current instructions is greater than the preset number, so that the intra-core driver controls the graphics processor to execute hardware acceleration based on each of the operation instructions.
In an optional embodiment of the present application, the hardware instruction determining module 801 in the instruction processing apparatus 80 is further configured to:
acquiring operation information for executing hardware operation aiming at a register in real time;
generating a corresponding hardware operation instruction according to the operation information;
the instruction number of the current hardware operation instruction is determined.
In an optional embodiment of the present application, the hardware instruction determining module 801 in the instruction processing apparatus 80 is further configured to:
determining the instruction number of the current hardware operation instruction, including:
caching the generated hardware operation instruction into an instruction queue;
the queue length of the current instruction queue is determined.
In an optional embodiment of the present application, if the current number of the instructions is greater than a preset number, the preset number of the hardware operation instructions are simultaneously sent to the out-of-core driver, so that the out-of-core driver controls the graphics processor to execute hardware acceleration based on each instruction information, and then the instruction sending module 802 in the instruction processing apparatus 80 is further configured to:
if the queue length of the current instruction queue is larger than the preset queue length, the hardware operation instruction in the current instruction queue is simultaneously sent to the kernel driver, so that the kernel driver controls the graphics processor to execute hardware acceleration based on the instruction information.
Referring to fig. 9, in an alternative embodiment of the present application, the instruction processing apparatus 80 further includes a hardware instruction update module 803, where the hardware instruction update module 803 is configured to:
if the current instruction number is not larger than the preset number, acquiring new operation information for executing hardware operation aiming at the register;
generating a corresponding new hardware operation instruction based on the new operation information;
updating the instruction number based on the new hardware operation instruction until the current instruction number is larger than the preset number, and simultaneously sending the current hardware operation instruction to the kernel driver so as to enable the kernel driver to execute hardware acceleration based on the instruction operation.
In an optional embodiment of the present application, the operation information in the instruction processing apparatus 80 includes at least: at least one of operation source address, operation target address and operation content.
In an optional embodiment of the present application, the instruction sending module 802 in the instruction processing apparatus 80 is further configured to:
determining the highest threshold value of the number of the hardware operation instructions executed in the kernel driving unit time;
and determining the preset number according to the highest threshold value.
By the aid of the device, time for the in-core driver to wait for the out-core driver to send the hardware acceleration instruction can be greatly shortened, times of interaction between the user state driver and the in-core state driver are reduced, time for the GPU to wait for the in-core driver is reduced, and finally utilization rate of hardware is improved.
For the specific limitations of the instruction processing device 80, the above limitations on the instruction processing method can be referred to, and are not described herein again. The respective modules in the instruction processing device 80 described above may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an instruction processing method as above. The method comprises the following steps: the computer program comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes any step of the instruction processing method when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, may implement any of the steps of the above instruction processing method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (11)

1. An instruction processing method, comprising:
determining the instruction quantity of the current hardware operation instruction in real time;
and if the current instruction number is larger than the preset number, simultaneously sending the preset number of the hardware operation instructions to an in-core driver so that the in-core driver controls a graphics processor to execute hardware acceleration based on each operation instruction.
2. The instruction processing method according to claim 1, comprising: the real-time determination of the instruction number of the current hardware operation instruction comprises the following steps:
acquiring operation information for executing hardware operation aiming at a register in real time;
generating the corresponding hardware operation instruction according to the operation information;
determining the instruction number of the current hardware operation instruction.
3. The instruction processing method according to claim 2, wherein said determining the instruction number of the current hardware operation instruction comprises:
caching the generated hardware operation instruction into an instruction queue;
determining a queue length of the current instruction queue.
4. The method according to claim 3, wherein if the current number of instructions is greater than a preset number, sending the preset number of hardware operation instructions to an out-of-core driver at the same time, so that the out-of-core driver controls a graphics processor to execute hardware acceleration based on each instruction information, comprises:
and if the queue length of the current instruction queue is greater than the preset queue length, the hardware operation instructions in the current instruction queue are simultaneously sent to the kernel driver, so that the kernel driver controls a graphics processor to execute hardware acceleration based on the instruction information.
5. The instruction processing method according to claim 2, further comprising:
if the current instruction number is not larger than the preset number, acquiring new operation information for executing hardware operation on a register;
generating a corresponding new hardware operation instruction based on the new operation information;
updating the instruction quantity based on the new hardware operation instruction until the current instruction quantity is larger than the preset quantity, and simultaneously sending the current hardware operation instruction to the kernel driver so that the kernel driver can execute hardware acceleration based on the operation of each instruction.
6. The instruction processing method according to claim 2, wherein the operation information includes at least: at least one of operation source address, operation target address and operation content.
7. The instruction processing method of claim 1, further comprising:
determining a highest threshold value of the number of hardware operation instructions executed in the kernel driver unit time;
and determining the preset number according to the highest threshold value.
8. The instruction processing method according to claim 1, wherein the hardware operation instruction comprises: at least one of a copy instruction, a fill instruction, and a fuse instruction.
9. An instruction processing apparatus, comprising: an instruction determining module and an instruction sending module,
the instruction determining module is used for determining the instruction quantity of the current hardware operation instruction in real time;
the instruction sending module is configured to send the hardware operation instructions of the preset number to an intra-core driver at the same time if the current number of the instructions is greater than a preset number, so that the intra-core driver controls a graphics processor to execute hardware acceleration based on each operation instruction.
10. A computer device, comprising: comprising a memory and a processor, said memory storing a computer program, characterized in that said processor realizes the steps of the method according to any one of claims 1 to 8 when executing said computer program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202210390691.5A 2022-04-14 2022-04-14 Instruction processing method, instruction processing device, computer equipment and storage medium Pending CN114661354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210390691.5A CN114661354A (en) 2022-04-14 2022-04-14 Instruction processing method, instruction processing device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210390691.5A CN114661354A (en) 2022-04-14 2022-04-14 Instruction processing method, instruction processing device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114661354A true CN114661354A (en) 2022-06-24

Family

ID=82034642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210390691.5A Pending CN114661354A (en) 2022-04-14 2022-04-14 Instruction processing method, instruction processing device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114661354A (en)

Similar Documents

Publication Publication Date Title
CN107222531B (en) Container cloud resource scheduling method
EP2620873B1 (en) Resource allocation method and apparatus of GPU
CN111104208B (en) Process scheduling management method, device, computer equipment and storage medium
CN107704310B (en) Method, device and equipment for realizing container cluster management
CN110659131B (en) Task processing method, electronic device, computer equipment and storage medium
CN111274019A (en) Data processing method and device and computer readable storage medium
US20220038355A1 (en) Intelligent serverless function scaling
CN113157411A (en) Reliable configurable task system and device based on Celery
US10733687B2 (en) Method and apparatus for data communication in virtualized environment, and processor
CN111338769A (en) Data processing method and device and computer readable storage medium
CN111294377A (en) Network request sending method of dependency relationship, terminal device and storage medium
CN113626173A (en) Scheduling method, device and storage medium
CN111310638B (en) Data processing method, device and computer readable storage medium
CN112631994A (en) Data migration method and system
CN114661354A (en) Instruction processing method, instruction processing device, computer equipment and storage medium
CN108062224B (en) Data reading and writing method and device based on file handle and computing equipment
CN114237718A (en) Instruction processing method and configuration method, device and related equipment
CN115114022A (en) Method, system, device and medium for using GPU resources
CN113805941A (en) System and method for accelerating application software by replacing instruction set
CN115033337A (en) Virtual machine memory migration method, device, equipment and storage medium
CN114911538A (en) Starting method of running system and computing equipment
CN109753363B (en) Embedded system memory management method and device
CN109634721B (en) Method and related device for starting communication between virtual machine and host
CN112685174A (en) Container creation method, device, equipment and medium
CN111143078B (en) Data processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination