CN114816652A - Command processing device and method, electronic device, and computer storage medium - Google Patents

Command processing device and method, electronic device, and computer storage medium Download PDF

Info

Publication number
CN114816652A
CN114816652A CN202110127623.5A CN202110127623A CN114816652A CN 114816652 A CN114816652 A CN 114816652A CN 202110127623 A CN202110127623 A CN 202110127623A CN 114816652 A CN114816652 A CN 114816652A
Authority
CN
China
Prior art keywords
command
users
user
buffer
commands
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110127623.5A
Other languages
Chinese (zh)
Inventor
冷祥纶
孙海涛
周俊
王文强
张国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Power Tensors Intelligent Technology Co Ltd
Original Assignee
Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Power Tensors Intelligent Technology Co Ltd filed Critical Shanghai Power Tensors Intelligent Technology Co Ltd
Priority to CN202110127623.5A priority Critical patent/CN114816652A/en
Priority to PCT/CN2021/108396 priority patent/WO2022160626A1/en
Publication of CN114816652A publication Critical patent/CN114816652A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Abstract

The present disclosure provides a command processing apparatus and method, an electronic device, and a computer storage medium, wherein the command processing apparatus includes: the system comprises a microcontroller, at least one command distributor corresponding to a user and an arithmetic unit corresponding to each command distributor; the microcontroller is used for reading a command from the buffer corresponding to the at least one user and storing the command into a command queue corresponding to the at least one user; the command distributor is used for reading the commands from the corresponding command queue and distributing the commands to the operation units corresponding to the command distributor; and the arithmetic unit is used for executing the commands distributed by the corresponding command distributor. With such a command processing apparatus, the efficiency of command processing can be improved.

Description

Command processing device and method, electronic device, and computer storage medium
Technical Field
The present disclosure relates to the technical field of computer science, and in particular, to a command processing apparatus and method, an electronic device, and a computer-readable storage medium.
Background
With the rapid development of cloud computing, the cloud end can convert physical resources into logically manageable resources through a virtualization method, so that the physical resource utilization rate of a cloud end server is improved. Artificial Intelligence (AI) computing is an important application scenario for cloud computing; as an important contributor to AI computation, an image processor (GPU) or an AI chip is often virtualized in actual use.
The existing method for virtualizing the AI chip or the GPU has the problem of low safety.
Disclosure of Invention
The embodiment of the disclosure provides at least a command processing device and method, an electronic device and a computer readable storage medium.
In a first aspect, an embodiment of the present disclosure provides a command processing apparatus, including: the system comprises a microcontroller, at least one command distributor corresponding to a user and an arithmetic unit corresponding to each command distributor; the microcontroller is used for reading a command from the buffer corresponding to the at least one user and storing the command into a command queue corresponding to the at least one user; the command distributor is used for reading the commands from the corresponding command queue and distributing the commands to the operation units corresponding to the command distributor; and the arithmetic unit is used for executing the commands distributed by the corresponding command distributor.
Therefore, the corresponding command distributor and the operation unit are distributed to different users, so that the processing of the corresponding instructions of different users can be realized, the computing resources for processing different commands are physically isolated, the potential safety hazard caused by the sharing of the computing resources among the users is reduced, and the safety of the users is improved.
In an alternative embodiment, the microcontroller is configured to read the commands from the different buffers at different time slices, for a case where there are multiple users and different users correspond to different buffers.
In an alternative embodiment, the microcontroller is configured to: sequentially taking a plurality of time slices as current time slices respectively, executing the following monitoring process in the current time slices until the current time slices are finished, and switching to the next time slice; the monitoring process comprises the following steps: monitoring whether a command exists in a buffer corresponding to a current time slice or not in the current time slice; and in response to the condition that the command exists in the buffer corresponding to the current time slice, reading the command from the buffer corresponding to the current time slice.
Therefore, the commands in different buffers are correspondingly processed in different time slices, the commands to be processed stored in different buffers are favorably and orderly processed, and meanwhile, when the commands are processed in a time division multiplexing mode, the commands of different users issued to the buffers by the host are divided according to the time slices, so that errors which possibly occur when the corresponding users issue are effectively avoided.
In an optional embodiment, for a case that there are multiple users and different users correspond to different buffers, the microcontroller is configured to poll buffers corresponding to the multiple users, and read a command from a currently polled buffer in response to that the buffer corresponding to the currently polled user is not empty.
Therefore, by polling the buffers corresponding to the plurality of users, the commands of the users can be quickly and efficiently transmitted to the command queues corresponding to the users, so that the command distributor can acquire the commands from the corresponding command queues in time and distribute the commands to the corresponding operation units, efficient processing of the commands is guaranteed, and processing efficiency is improved.
In an optional embodiment, for a case where different users correspond to different buffers and one user corresponds to multiple buffers, the microcontroller is configured to poll the multiple buffers corresponding to each user in a time slice corresponding to each user, and read a command from the currently polled buffer.
In this way, the user can read commands from different buffers under the condition that the user corresponds to a plurality of different buffers, so that the number of commands which can be acquired by the user can be increased, and the user can process more commands.
In an alternative embodiment, the microcontroller is configured to read the command from the same buffer corresponding to the plurality of users, in case that there are a plurality of users and different users correspond to the same buffer.
Therefore, a plurality of users correspond to the same buffer, the storage space of the buffer can be fully utilized, and the acquisition efficiency of the microcontroller on the commands of the users is improved.
In an optional embodiment, for a case that there are multiple users and different users correspond to the same buffer, the microcontroller is configured to access, at different time slices, storage locations corresponding to the multiple users in the same buffer, respectively, and read a command; or polling the storage positions corresponding to the plurality of users in the same buffer respectively, and reading the command.
In an alternative embodiment, the buffer comprises: a ring buffer; the ring buffer comprises a storage inlet corresponding to each command stream in at least one command stream in the same user; and the microcontroller is used for reading the command corresponding to each command stream based on the storage inlet corresponding to each command stream in the ring buffer.
Therefore, the target inlet corresponding to the command stream is used for storing the commands corresponding to different command streams on the ring buffer, so that the microcontroller can synchronously monitor a plurality of inlets of the ring buffer, the commands in different command streams can be pulled down in one processing period, and the command acquisition efficiency is improved. Meanwhile, the ring buffer can provide mutually exclusive access to the buffer area for the communication program, and is beneficial to avoiding the system overhead increased by the storage queue during frequent command distribution when in use.
In an alternative embodiment, the microcontroller is configured to store commands corresponding to at least one command stream in the same user into a command queue corresponding to the at least one command stream.
In an optional embodiment, in the case that one user corresponds to a plurality of buffers, when determining the command queue for the user, the microcontroller determines the command queue corresponding to different buffers for different buffers corresponding to the user.
In an alternative embodiment, there are M users, and N target users among the M users share the same command queue and the same command distributor; wherein M is an integer greater than 1; n is an integer less than or equal to M and greater than 1; and the microcontroller is used for reading the commands from the buffers corresponding to the N target users respectively in different time slices and storing the commands into a command queue shared by the N target users.
In an optional implementation manner, the command distributor is configured to read, in the different time slices, commands respectively corresponding to different target users from a command queue shared by the N target users, and send the commands respectively corresponding to the different target users to an arithmetic unit corresponding to the command distributor; and the operation unit corresponding to the command distributor is used for executing the commands corresponding to the N target users in different time slices.
Therefore, on the basis of space division multiplexing, multiplexing of a certain command distributor and a corresponding operation unit can be realized by a plurality of users through time division multiplexing, so that the number of users deployed on the same equipment is increased, and the flexibility of GPU resource utilization is improved.
In a second aspect, an embodiment of the present disclosure further provides a command processing method applied to a command processing apparatus, where the command processing apparatus includes: the system comprises a microcontroller, at least one command distributor corresponding to a user and an arithmetic unit corresponding to each command distributor; the command processing method comprises the following steps: the microcontroller reads the commands from the buffer corresponding to the at least one user and stores the commands into a command queue corresponding to the at least one user; the command distributor reads the commands from the corresponding command queue and distributes the commands to the operation units corresponding to the command distributor; the arithmetic unit executes the command distributed by the corresponding command distributor.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a host, a buffer, and a command processing device; the host is used for issuing a command to be executed and storing the command in a buffer corresponding to at least one user;
the command processing device is used for executing the command processing method provided by any embodiment of the second aspect.
In a fourth aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a microcontroller, a command distributor, and an arithmetic unit, implements the command processing method provided in any embodiment of the second aspect.
For the description of the effects of the above command processing method, reference is made to the description of the above command processing device, which is not repeated here.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
FIG. 1 is a schematic diagram of a command processing apparatus provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a command queue provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a specific command processing apparatus provided in an embodiment of the present disclosure;
fig. 4 shows a flowchart of a command processing method provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
Research shows that when the GPU or the AI chip is virtualized, a time slice scheduling strategy can be adopted for context switching so as to support work of a plurality of users; for example, in the case of virtualizing an AI chip into multiple users, the AI chip executes the relevant instructions of the current user within a certain time slice; once the time slice is used up, the AI chip will immediately switch to the next user and execute the relevant instructions of the next user. Because the method realizes the isolation among different users through software, the AI chip is required to frequently switch contexts (switch users), the time overhead of scheduling is large, and the processing efficiency of instructions in the users is low; meanwhile, if a malicious attack to a certain user occurs, different users share the same computing resource, so that security threats to other users are easily caused, and the security is poor.
Based on the above research, the present disclosure provides a command processing apparatus, a command processing method, a computer device, and a storage medium, which can physically isolate computing resources for executing computing tasks of different users, reduce potential safety hazards caused by sharing of the computing resources among the users, and improve the safety of the users.
The above drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above problems and the solutions proposed by the present disclosure in the following description should be the contribution of the inventor to the present disclosure in the course of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, a command processing method disclosed in the embodiments of the present disclosure will be described in detail first.
The command processing device provided by the embodiment of the disclosure can be used for a GPU, an artificial intelligence chip or other instruction processing equipment comprising an instruction processor and an execution unit.
The command processing apparatus provided by the embodiment of the present disclosure is described below by taking an example of applying the command processing apparatus provided by the embodiment of the present disclosure to a GPU, but may be applied to other types of command processing apparatuses.
Referring to fig. 1, a schematic structural diagram of a command processing apparatus provided in an embodiment of the present disclosure includes: a microcontroller 10, a command distributor 20 corresponding to each of at least one user (wherein fig. 1 shows a plurality of command distributors including a command distributor 0 to a command distributor n), and an arithmetic unit 30 corresponding to each of the command distributors 20 (wherein fig. 1 shows a plurality of arithmetic units 0 to an arithmetic unit n);
the microcontroller 10 is configured to read a command from a buffer corresponding to each user, and store the command in a command queue corresponding to each user;
a command distributor 20 for reading commands from the corresponding command queue and distributing the commands to the arithmetic unit 30 corresponding to the command distributor 20;
and an arithmetic unit 30 for executing the commands distributed by the corresponding command distributors 20.
In the disclosed embodiment, the command processing apparatus includes a microcontroller 10, a command distributor 20 corresponding to each user of at least one user, and an arithmetic unit 30 corresponding to each command distributor 20, wherein after a host issues commands of different users to a buffer corresponding to the corresponding user, the microcontroller 10 can read commands from the buffer corresponding to the same user and store the commands in a command queue corresponding to the same user, and after the command distributor 20 reads commands from the corresponding command queue, the commands are distributed to the arithmetic unit 30 corresponding to the command distributor 20 for the arithmetic unit 30 corresponding to the command distributor 20 to execute the corresponding commands, so that the processing of the commands corresponding to different users is realized by distributing the corresponding command distributor 20 and the arithmetic unit 30 to different users, thereby physically isolating computing resources for processing different commands, the potential safety hazard caused by sharing of computing resources among users is reduced, and the safety of the users is improved.
The user in the embodiments of the present disclosure may include, for example, any one of the following: a virtual machine, a computer container (container), an application, or a different function in an application.
The microcontroller 10, the command distributor 20, and the arithmetic unit 30 will be described in detail below, taking a user as an example of a virtual machine.
When a virtual machine is created, corresponding computing resources need to be allocated to the virtual machine; in the embodiment of the present disclosure, the computing resources allocated to the virtual machine include the command distributor 20 and the arithmetic unit 30 in the GPU. The number of the arithmetic units 30 is determined according to the computing resource configuration information determined when the virtual machine is created. In addition, when creating a virtual machine, a buffer corresponding to the virtual machine may be allocated to the virtual machine.
When the virtual machine runs, one or more command streams can exist in an application layer of the virtual machine at the same time; each stream includes at least one command, and the commands in the stream are issued from the host to the buffer corresponding to the virtual machine, for example.
Illustratively, when a piece of software is run in virtual machine VM1, one or more software functions of that piece of software may be run; when the virtual machine VM1 runs only one of the software functions, the host can generate a command stream for running the software function, and the command stream generates a command stream when executing a processing task; host stores the command in the stream in the buffer corresponding to the virtual machine VM 1.
When the buffer is determined, multiple streams (multiple command streams, different command streams correspond to different streams) may exist, and multiple commands may exist in each stream, so that parallel processing on the commands in different streams may exist, and the commands in the same stream may have a processing order. In order to facilitate parallel processing of commands in multiple streams and facilitate sequential processing of multiple commands in the same stream according to time of issuing the commands, in the embodiment of the present disclosure, a Ring Buffer (RBUF) is selected, so as to solve the problem of frequent lock calls (i.e., frequent operations such as storing, shifting, and releasing the commands), that is, frequent memory allocation is not needed, so that the buffer can be used repeatedly, and waste of storage space in the buffer is reduced.
In the embodiment of the present disclosure, the ring buffer includes a storage entry corresponding to each command stream in at least one command stream; and at least one command stream runs in the same user.
Illustratively, the buffers correspond to the users one-to-one;
and/or, one buffer may correspond to multiple users;
and/or one user may also correspond to multiple buffers.
Illustratively, the determined plurality of buffers may include, for example, n (n is a positive integer) numbers, respectively designated RBUF1, RBUF2, … …, RBUFn. Meanwhile, there are m (m is a positive integer) virtual machines, which may be represented as VM1, VM2, … …, VMm, for example.
Taking virtual machine VM1 as an example:
(1): in the case of one buffer for one user, VM1 may correspond to one buffer, for example, VM1 may correspond to RBUF1, and commands in all streams in VM1 may be stored in RBUF 1.
Here, commands may be issued to buffer RBUF1, for example, by: determining at least one command stream corresponding to the same user; determining at least one command corresponding to each command stream based on each command stream in the at least one command stream; and storing at least one command corresponding to each command stream in the at least one command stream into the buffer by using a storage entry which is included in the buffer and corresponds to the at least one command stream in the same user.
In particular implementations, the command streams for different applications differ. For example, when an Application (Application) for image processing is implemented by a user, a command stream for performing operation scheduling in a Process (Process) of an operating system may include the following three command streams: the method comprises the steps of image preprocessing, target recognition of the processed image by utilizing a convolutional neural network, and determination of a target object in the image. For example, when the command stream includes a convolutional neural network for performing target recognition on the processed image, the commands corresponding to the required operations such as convolution and pooling can be divided into corresponding commands such as various operators (kernels), Data Migration (Data Migration), and synchronization operations (syncronous). At this time, for example, H (H is a positive integer) command streams may be determined and sequentially numbered, which may be represented as 1, 2, … …, H; taking the first command stream (corresponding number is 1) as an example, the corresponding commands may include K commands, for example, and the ith command in the K commands may be represented as P, for example 1-i (i∈[1,K])。
(2): in the case of multiple buffers for one user, VM1 may correspond to s buffers, for example, VM1 may correspond to RBUF1, RBUF2, … … RBUFs; RBUF1, RBUF2, … … RBUFs are each used to store commands in at least one stream of VM 1.
Here, commands may be issued to buffers RBUF 1-RBUFs, for example, in the following manner: determining a command stream corresponding to a user; dividing command streams corresponding to users into a plurality of command stream groups based on the number of buffers corresponding to the users; each buffer corresponding to a user corresponds to a command stream group; and for each command stream group, storing at least one command corresponding to each command stream in the command stream group into the buffer corresponding to the command stream group by using the storage inlet corresponding to each command stream in the command stream group in the buffer corresponding to the command stream group.
For example, in the example in (1) above, if s is 2, H command streams may be divided into 2 command stream groups, where the two command stream groups correspond to RBUF1 and RBUF2, respectively, and instructions in each command stream are issued to RBUF1 or RBUF2 according to the command stream group to which each command stream belongs in the H command streams.
(3): in the case where one buffer corresponds to a plurality of users, for example, VM1 may correspond to one buffer RBUF1 in common with VM2, where RBUF1 includes the entries of each stream in VM1 and the entries of each stream in VM 2.
At this time, after determining at least one command corresponding to each command stream, the at least one command may be stored in buffer RBUF 1. In storing commands in buffer RBUF1, a memory entry corresponding to each command stream in buffer RBUF1 may also be determined, for example.
In one possible implementation, when determining the storage entries in buffer RBUF1, for example, according to the storage amount of data corresponding to the command in buffer RBUF1 (hereinafter, for convenience, the "data corresponding to the command" is collectively expressed as "command", and the substance still includes the corresponding data), the corresponding storage space may be equally allocated to a plurality of command streams, and the data transfer entry of the first storage space in the storage space corresponding to each of the plurality of command streams is used as the storage entry of the corresponding command stream. Since the storage location corresponding to the command is determined by addressing with the read pointer and the write pointer when the command is stored in buffer RBUF1, for convenience of explanation of the storage and read locations corresponding to the storage and read commands, respectively, in buffer RBUF1, the ith address in buffer RBUF1 is denoted by Loc-i as follows.
For example, in the case that there are 3 command streams and 6 groups of commands can be stored in the set buffer RBUF1, the addresses corresponding to the storage spaces of the buffer RBUF1 storing 6 groups of commands may include Loc-1, Loc-2, … …, and Loc-6, respectively. When the memory space is allocated evenly for 3 command streams, for example, addresses Loc-1, Loc-2 may be allocated to the first command stream, addresses Loc-3, Loc-4 may be allocated to the second command stream, and addresses Loc-5, Loc-6 may be allocated to the third command stream. At this time, the first address Loc-1 of the first command stream may be used as the storage entry corresponding to the first command stream, the first address Loc-3 of the second command stream may be used as the storage entry corresponding to the second command stream, and the first address Loc-5 of the third command stream may be used as the storage entry corresponding to the third command stream.
It should be noted that the number of command storage spaces corresponding to different buffers may be the same or different, and is not limited herein, and the corresponding commands may include, for example, P 1-1 、P 1-2 、P 2-1 、P 2-2 、P 3-1 And P 3-2
In another possible implementation, in determining the memory entries in buffer RBUF1, the corresponding memory space may also be allocated for multiple command streams, for example, according to empirical or actual requirements. The detailed description of the method is omitted here.
After the host issues the command to the buffer, the microcontroller 10 in the command processing apparatus can read the command from the buffer corresponding to the user and store the command in the command queue corresponding to each user.
When the buffer is a ring buffer, the microcontroller 10, when reading a command from the buffer corresponding to each user, can read the command corresponding to each command stream based on the memory entry corresponding to each command stream in the ring buffer.
Here, the storage entry of buffer RBUF1 may also be used as a read entry, and the method of reading a command is similar to the method of the above storage command, and the read command is addressed by a read pointer and a write pointer to determine a command to be read, and the specific addressing process is not described herein again.
When the microcontroller 10 reads a command from the buffer:
for the case where there is only one user, the microcontroller 10 reads the command from the buffer and distributes the command to the command distributor 20 corresponding to the user.
For the case where there are multiple users:
a: in the case that different users correspond to different buffers, and one user corresponds to one buffer:
the microcontroller 10 may read commands from the buffers respectively corresponding to the plurality of users, for example, in any of the following manners:
(a1) the method comprises the following steps The commands are read from different buffers at different time slices, respectively.
Here, the time slice has a preset duration, for example, a time required to read out at least one command in the buffer.
The microcontroller 10 sequentially takes the plurality of time slices as the current time slice, and executes the monitoring process in each time slice until the current time slice is finished, and switches to the next time slice. Wherein, the monitoring process includes: monitoring whether a command exists in a buffer corresponding to the current time slice or not in the current time slice; and in response to the condition that the command exists in the buffer corresponding to the current time slice, reading the command from the buffer corresponding to the current time slice.
At this time, in the current time slice, if there is no readable command in the read storage space, the microcontroller 10 will monitor whether the command is stored in the buffer corresponding to the current time slice until the current time slice is finished.
(a2) The method comprises the following steps And polling the buffers corresponding to a plurality of users respectively, and reading the commands from the buffer corresponding to each polled user when the buffer corresponding to each user is polled.
Here, when polling the buffers corresponding to the plurality of users, the microcontroller 10 reads the commands stored in the buffers if the polled buffers store the commands; if the polled buffer is currently empty, then the next buffer is continuously polled.
Here, in order to prevent the commands stored in a certain buffer from being too many, the microcontroller 10 spends more time reading the commands in the buffer, and affects the normal processing of other users, and a maximum duration may be set for polling; if a certain buffer is polled, the time length for reading the command from the buffer reaches the maximum time length, and the next buffer can be polled.
Illustratively, in the case where there are 6 memory spaces in both the buffer SBUF1 and the buffer SBUF2, if the SBUF1 stores the command P only in the memory space corresponding to the corresponding address Loc-1 1-1 No readable commands in other memory spaces in SBUF1, and a readable command P in corresponding address Loc-1 in SBUF2 1-2 At time 0, the first entry location of the command is read for buffer RBUF 1; after the end of the first time slice, the first entry location of the read command from buffer RBUF2 is read.
At this time, for the remaining user VMs 2, 3, … …, VMm, the process of the microcontroller 10 reading commands is similar to the process of the microcontroller 10 reading commands from the user VM1, and is not described again.
B: in the case that different users correspond to different buffers, and one user corresponds to multiple buffers:
the microcontroller 10 polls a plurality of buffers for each user within a time slice corresponding to each user and reads commands from the currently polled buffers.
The microprocessor 10 may read the command from the buffer in the same manner as in a above.
When reading commands from the buffers in the same manner as in a above, the buffers corresponding to all users may be numbered, and different buffers are polled according to the numbers or commands are read in different buffers in time slices.
When numbering the buffers corresponding to all users, for example, the buffers corresponding to the same user can be numbered in a distributed manner, for example, there are 3 users, namely VM1, VM2 and VM3, VM1, VM2 and VM3 respectively correspond to 3 buffers, and the number of the 3 buffers corresponding to VM1 is 1, 4 and 7; the number of the 3 buffers corresponding to the VM2 is respectively 2, 5 and 8; the number of the 3 buffers corresponding to the VM3 is 3, 6 and 9 respectively; in this way it is ensured that at least a part of the instructions of each user can be executed in time. Alternatively, a plurality of buffers corresponding to the same user may be numbered consecutively, for example, the number of the 3 buffers corresponding to VM1 may be 1, 2, and 3, respectively, so as to reduce the overhead required for context switching by the microcontroller 10.
The specific mode can be determined according to actual needs.
The following describes the process of microcontroller 10 reading commands in detail, taking the case where VM1 corresponds to RBUF1 and RBUF2 as an example.
In particular implementations, microcontroller 10, upon reading commands from each user's corresponding buffer RBUF1, may read commands corresponding to each command stream based on, for example, a memory entry in buffer RBUF1 corresponding to the command stream.
For RBUF1 and RBUF2 among the plurality of buffers, when the time slice is set to 2T, in the first time slice, reading of only two commands in buffer RBUF1 can be completed, and in the next time slice, for example, the buffer RBUF1 can no longer be read with commands, but context switching of the buffer is performed, and instead, command reading of the next buffer RBUF2 is performed, and two commands are read from RBUF 2.
C: under the condition that a plurality of users correspond to the same buffer, the microcontroller 10 accesses the storage positions of the plurality of users corresponding to the same buffer respectively at different time slices, and reads the commands; or polling the storage positions corresponding to the same buffer by a plurality of users respectively, and reading the commands.
Illustratively, for users VM1, VM2, and VM3, for example, the same buffers RBUF1 may correspond; for user VM4 and VM5, the same buffer RBUF2 may be corresponded, for example. In this case, the number of users corresponding to the same buffer may be the same or different, and the specific correspondence may be determined according to actual conditions, which is not limited herein.
The following description will take the same buffer RBUF1 for user VM1, VM2, and VM3 as an example: microcontroller 10 may access the respective memory locations of VM1, VM2, and VM3 on the buffer during a read command, for example, at different time slices. For example, in the first time slice, accessing the corresponding storage location of the VM1 on the buffer, and sending the command corresponding to the VM1 to the command distributor 20 corresponding to the VM 1; accessing a corresponding storage position of the VM2 on the buffer in the second time slice, and sending a command corresponding to the VM2 to a command distributor 20 corresponding to the VM 2; and accessing the corresponding storage position of the VM3 on the buffer in the third time slice, and sending the command corresponding to the VM3 to the command distributor 20 corresponding to the VM 3.
In addition, when reading the command, the microcontroller 10 may access the storage locations of the VM1, the VM2, and the VM3 corresponding to the buffers in the same time slice, read the commands corresponding to the VM1, the VM2, and the VM3 in the same time slice, and transmit the command corresponding to the VM1 to the command distributor 20 corresponding to the VM 1; sending a command corresponding to the VM2 to a command distributor 20 corresponding to the VM 2; the command corresponding to VM3 is sent to command distributor 20 corresponding to VM 3.
Alternatively, the microcontroller 10 may poll different entries in the buffer, and when polling an entry, if a command is stored in a storage location corresponding to the entry, the command is read and issued to the command distributor 20 corresponding to the entry.
In addition, since there may be a case where a new command is stored in the buffer when the microcontroller 10 reads a command from the buffer, that is, the command in the buffer is updated in real time, the microcontroller 10 may continuously read the new command from the buffer; specifically, the method for updating the commands in the buffer in real time is not described herein again.
The microcontroller 10 can store the commands in the command queue corresponding to each user after reading the commands from the buffer corresponding to each user.
Here, for example, at least one command queue may be set for the user in advance, and the maximum storage amounts of the commands in different command queues may be the same or different, which is not limited herein; generally, different command queues are used for storing commands in different streams in a user. At this time, the microcontroller 10 may read out the commands in the corresponding buffer and store the commands in a command Queue (Stream Queue) corresponding to each user. At this time, the command queue may include s, denoted as SQ1, SQ2, … …, SQs, respectively, for example.
For example, if there are 3 streams in the user, commands in the 3 streams are stored in SQ1, SQ2, and SQ3, respectively. Here, in the case where the command queue is a hardware queue, the command queue may be allocated to the user according to the number of command queues in the GPU and the number of users deployed in the GPU; in the case where the command queue is a software queue, the command queue can be dynamically created for the user according to the number of streams in the user.
The following details the process of determining a command queue for a user:
in determining the corresponding command queue for each user, for example, the method described in the following (b1), or (b2) may be employed:
(b1) the method comprises the following steps And determining a command queue for the user according to the number of the buffers corresponding to the user. In this case, the command distributors 20 for different users are different; different command distributors 20 are used to read commands from the command queue dedicated to the corresponding user. In this case, the GPUs are multiplexed using space division multiplexing.
In this case, when a plurality of buffers are associated with one user, when a command queue is determined for the user, for example, command queues corresponding to different buffers may be determined for different buffers associated with the user. For example, the user VM1 has buffer 1 and buffer 2 corresponding to each other, and each buffer can hold commands of 3 command streams in the VM1, and the commands that can be determined for buffer 1 include queues SQ1, SQ2, SQ 3; determining the command queue for buffer 2 includes queues SQ4, SQ5, SQ 6.
Illustratively, in the case of storing 4 commands in RBUF1 and 2 commands in RBUF2, when 4 commands are stored in command queue SQ1 corresponding to RBUF1, for example, P may be included 1-1 、P 1-2 、P 1-3 And P 1-4 (ii) a The command queue SQ4 to which RBUF2 corresponds stores 2 commands, which may include P, for example 2-1 And P 2-2 Referring to fig. 2, a schematic diagram of a command queue according to an embodiment of the present disclosure is shown; here, 21 denotes a command queue SQ1, 22 denotes a space indicated by SQ4, and 23 denotes a space for storing commands in the command queue.
(b2) The method comprises the following steps And distributing the same common command queue for N target users in the M users. Wherein M is an integer greater than 1; n is an integer less than or equal to M and greater than 1. In this case, the N target users use the same command distributor 20, and the same set of command queues, at different time slices. At the moment, under the condition that N is smaller than M, multiplexing the GPU by using a mode of coexistence of time division multiplexing and space division multiplexing; in the case where N is equal to M, the GPUs are multiplexed using only time division multiplexing.
In this case, when the N users share the same queue, the command processing apparatus processes the commands by using a time division multiplexing method, that is, processes the commands corresponding to the time slices in different time slices. The microcontroller 10 reads the commands from the buffers corresponding to the N target users at different time slices, and stores the commands in a command queue shared by the N target users.
Illustratively, where the N target users include virtual machine VM1, VM2, VM3, time slices may be periodically allocated, for example, in the order of VM1, VM2, and VM 3. In the first time slice, the microcontroller 10 reads out the command corresponding to the virtual machine VM1 from the buffer and stores the command into the command queue shared by the N target users, where the command queue shared by the N target users only includes the command corresponding to the virtual machine VM 1; after the first time slice is finished and the tasks in the command queue shared by the N target users are executed, in the second time slice, the microcontroller 10 reads the commands corresponding to the virtual machine VM2 from the buffer and stores the commands into the command queue shared by the N target users, where the command queue shared by the N target users only includes the commands corresponding to the virtual machine VM 2; after the second time slice is finished and the tasks in the command queue shared by the N target users are executed completely, in a third time slice, the microcontroller 10 reads the command corresponding to the virtual machine VM3 from the buffer and stores the command into the command queue shared by the N target users, where the command queue shared by the N target users only includes the command corresponding to the virtual machine VM 3; after the third time slice is finished and the task in the command queue is executed, in the fourth time slice, the user corresponding to the time slice changes to the virtual machine VM1 again, and the microcontroller 10 reads the command corresponding to the virtual machine VM1 from the buffer and stores the command into the command queue shared by the N target users, where at this time, the command queue shared by the N target users only includes the command corresponding to the virtual machine VM 1.
After the microcontroller 10 stores the commands in the command queue shared by the N target users corresponding to each target user, the command distributor 20 shared by the N target users can read the commands from the command queue shared by the corresponding N target users, and distribute the commands to the operation units 30 corresponding to the command distributors 20 shared by the N target users.
In a specific implementation, the command distributor 20 may read commands from the command queue and issue the commands to the operation unit 30 corresponding to the command distributor 20 in the following two manners (c1) and (c2), for example:
(c1) the method comprises the following steps Corresponding to b1, a corresponding command distributor 20 is assigned to each user, and the command distributors 20 corresponding to different users are different.
The command distributor 20 extracts the command to be processed by the user from the command pair queue, and then directly distributes the command to the operation unit 30 corresponding to the user.
Illustratively, where the N users include virtual machines VM1, VM2, VM3, command distributor D1 may be assigned to virtual machine VM1, command distributor D2 may be assigned to virtual machine VM2, and command distributor D3 may be assigned to virtual machine VM 3. Since the three virtual machines have respective corresponding command queues and command distributors 20, the command queues and the command distributors 20 corresponding to different virtual machines are separated integrally.
Taking the virtual machine VM1 as an example, the corresponding command queue is, for example, SQ1, and the corresponding command distributor 20 is D1. When there is a command to be processed in the command queue, the command distributor D1 may distribute the command to be processed to the arithmetic unit 30 corresponding to the virtual machine VM1 for processing. Similarly, since the virtual machine VM2 and the virtual machine VM3 correspond to the respective command queues and the command dispatcher 20, when there is a command to be processed in the command queue corresponding to each of the virtual machine VM2 and the virtual machine VM3, the command dispatcher 20 corresponding to each of the virtual machine VM2 and the virtual machine VM3 can directly dispatch the command to be processed in the corresponding command queue to the corresponding arithmetic unit 30 for processing.
(c2) The method comprises the following steps Corresponding to the above b2, the command distributor 20 common to N target users is assigned to N target users.
The command distributor 20 shared by the N target users is configured to read the commands respectively corresponding to the different target users from the command queue shared by the N target users at different time slices, and send the commands respectively corresponding to the different target users to the computing unit 30 corresponding to the command distributor 20 shared by the N target users.
For example, in the case where N target users include virtual machines VM1, VM2, VM3, virtual machines VM1, VM2, VM3 may be commonly assigned the same command dispatcher 20D 4. As can be seen from (b2), since the target users share the same command queue shared by N target users, the command distributor 20D4 only needs to read commands from the command queue shared by N target users and issue the commands to the arithmetic unit 30.
After the command distributor 20 common to the N target users distributes the command to the arithmetic unit 30 corresponding to the command distributor 20 common to the N target users, the arithmetic unit 30 may execute the command distributed by the command distributor 20 common to the corresponding N target users. The arithmetic units 30 are denoted as ALU1, ALU2, … … and ALUm, which may correspond to users, for example, each arithmetic unit 30 may include a plurality of minimum arithmetic units for processing commands, such as 32 threads running in the arithmetic unit 30, for executing commands distributed by the command distributor 20 common to the corresponding N target users, for example. The threads in the arithmetic unit 30 may include u (u is a positive integer), for example, and the arithmetic unit ALU1, the threads in the ALU1 may include a (a is a positive integer), respectively denoted as A1-1, A1-2, … …, A1-a.
However, when the arithmetic unit 30 executes the command distributed by the command distributor 20 and/or the command distributor 20 common to the N target users, the following two methods (d1) and (d2) may be adopted:
(d1) the method comprises the following steps The calculation unit 30 corresponding to each user is determined, that is, each user corresponds to a separate calculation unit 30. By using the method, the isolation among the data transmission links among a plurality of users can be further realized, so that different users do not interfere with each other when processing commands, and the security threat to the work of other users when the data transmission link of one user among the plurality of users is attacked is effectively prevented.
(d2) The method comprises the following steps The arithmetic unit 30 is shared by the target users.
In this case, since the commands corresponding to different target users are processed in different time slices, and different target users share the command distributor 20 shared by the same N target users, different target users can process the commands sequentially according to the sequence of the corresponding time slices, and only one common operation unit 30 needs to be determined. By using the method, the number of the operation units 30 determined for different users can be reduced, the waste of computing resources can be effectively reduced, the computing resources are more centralized, and the efficiency of command processing is improved.
After the operation unit 30 executes the command distributed by the command distributor 20 and/or the command distributor 20 shared by the N target users, the command processing device can complete the processing of the command to be processed.
Referring to fig. 3, an embodiment of the present disclosure further provides a schematic diagram of a specific command processing apparatus. In this example, user 1, user 2, and user 3 correspond to different ring buffers 31, respectively, where the ring buffer 311 is the ring buffer RBUF1 corresponding to user 1; the ring buffer 312 is the ring buffer RBUF2 corresponding to user 2, and RBUF1 and RBUF2 have stored the commands of user 1 and user 2 issued by host; … …, respectively; the ring buffer 313 is a ring buffer RBUFM corresponding to the user 3; the microcontroller 32 stores the commands in a plurality of command queues 33; among them, 331 includes shared command queues SQ1 and SQ2 of user 1 and user 2 in different time slices, and command distributor D1; the command distributor D1 is corresponding to an arithmetic unit ALU 1; for example, within a time slice belonging to user 1, SQ1 and SQ2 are used to store commands corresponding to different entries in RBUF1, respectively; within the timeslices belonging to user 2, SQ1 and SQ2 are used to store commands corresponding to different entries in RBUF2, respectively. 332 is a command queue SQs corresponding to the user 3; 34 is a command distributor and an arithmetic unit shared by N target users corresponding to the user 1 and the user 2, wherein 341 is a command distributor D1 shared by the N target users, 342 is an arithmetic unit ALU1 corresponding to the user 1, 343 is a threads included in the arithmetic unit, which are a1-1 to a1-a, respectively; reference numeral 35 denotes a command distributor and an arithmetic unit corresponding to the user 3, where 351 denotes the command distributor Dm and 352 denotes the arithmetic unit ALUm corresponding to the user 3.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a command processing method corresponding to the command processing apparatus, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to that of the command processing apparatus in the embodiment of the present disclosure, the implementation of the method may refer to the implementation of the apparatus, and repeated details are not repeated.
Referring to fig. 4, a flowchart of a command processing method provided in an embodiment of the present disclosure includes:
s401: the microcontroller reads the command from the buffer corresponding to at least one user and stores the command into the command queue corresponding to at least one user;
s402: the command distributor reads the commands from the corresponding command queue and distributes the commands to the operation units corresponding to the command distributor;
s403: the arithmetic unit executes the command distributed by the corresponding command distributor.
In an optional embodiment, for a case that there are multiple users and different users correspond to different buffers, the microcontroller reads a command from the buffer corresponding to the at least one user, including:
the microcontroller reads the commands from different buffers at different time slices.
In an alternative embodiment, the microcontroller reads the commands from different buffers respectively at different time slices, and the method includes:
the microcontroller sequentially takes the plurality of time slices as current time slices respectively, and executes the following monitoring process in each time slice until the current time slice is finished, and switches to the next time slice;
the monitoring process comprises the following steps: monitoring whether a command exists in a buffer corresponding to a current time slice or not in the current time slice; and responding to the command existing in the buffer corresponding to the current time slice, and reading the command from the buffer corresponding to the current time slice.
In an optional embodiment, for a case where there are multiple users and different users correspond to different buffers, the microcontroller reads a command from the buffer corresponding to the at least one user, including:
and the microcontroller polls the buffers corresponding to the plurality of users respectively and reads the command from the currently polled buffer in response to the currently polled buffer corresponding to the user not being empty.
In an optional embodiment, for a case where different users correspond to different buffers and one user corresponds to multiple buffers, the microcontroller reads a command from the buffer corresponding to the at least one user, including:
and the microcontroller polls the plurality of buffers corresponding to each user in the time slice corresponding to each user and reads a command from the currently polled buffer.
In an optional embodiment, for a case where there are multiple users and different users correspond to the same buffer, the microcontroller reads a command from the buffer corresponding to the at least one user, including:
the microcontroller reads the commands from the same buffer corresponding to the plurality of users.
In an optional embodiment, for a case where there are multiple users and different users correspond to the same buffer, the microcontroller reads the command from the same buffer corresponding to the multiple users, including:
the microcontroller respectively accesses the storage positions of a plurality of users in the same buffer in different time slices, and reads commands; or
And polling the storage positions corresponding to the plurality of users in the same buffer respectively, and reading the command.
In an alternative embodiment, the buffer comprises: a ring buffer; the ring buffer comprises storage entries corresponding to at least one command stream in the same user respectively;
the microcontroller reads a command from a buffer corresponding to the at least one user, including:
the microcontroller reads the command corresponding to each command stream based on the memory entry in the ring buffer corresponding to each command stream.
In an optional embodiment, the storing, by the microcontroller, the command in a command queue corresponding to the at least one user includes:
the microcontroller stores commands corresponding to at least one command stream in the same user into a command queue corresponding to the at least one command stream.
In an optional embodiment, in the case that one user corresponds to multiple buffers, the microcontroller stores the command in a command queue corresponding to the at least one user, and the method includes:
and when the microcontroller determines the command queue for the user, respectively determining the command queues corresponding to different buffers for different buffers corresponding to the user.
In an alternative embodiment, there are M users, and N target users among the M users share the same command queue and the same command distributor; wherein M is an integer greater than 1; n is an integer less than or equal to M and greater than 1; the microcontroller reads the commands from the buffer corresponding to the at least one user and stores the commands in the command queue corresponding to the at least one user, and the method comprises the following steps:
and the microcontroller reads commands from the buffers corresponding to the N target users respectively at different time slices and stores the commands into a command queue shared by the N target users.
In an optional implementation, the reading of the command from the corresponding command queue by the command distributor and the distribution of the command to the arithmetic unit corresponding to the command distributor include:
the command distributor reads the commands respectively corresponding to different target users from the command queue shared by the N target users in different time slices, and sends the commands respectively corresponding to different target users to the operation unit corresponding to the command distributor;
the arithmetic unit executes the command distributed by the corresponding command distributor, and comprises:
and the operation unit corresponding to the command distributor executes the commands corresponding to the N target users respectively in the different time slices.
An embodiment of the present disclosure further provides an electronic device, including: a host, a buffer, and a command processing device;
the host is used for issuing a command to be executed and storing the command in a buffer corresponding to at least one user;
the command processing device is used for executing the method provided by any command processing method embodiment of the disclosure.
The command processing device provided by the embodiment of the disclosure may include a chip, an AI chip, and the like. The electronic device provided by the embodiment of the present disclosure may include an intelligent terminal such as a mobile phone, or may also be other devices, servers, and the like that have a camera and can perform image processing, and is not limited herein.
The embodiment of the disclosure also provides a computer readable storage medium, on which a computer program is stored, and the program is executed by the microcontroller, the command distributor and the arithmetic unit to execute the method provided by any one of the embodiments of the command processing method of the disclosure.
The description of the command processing method may refer to the related description in the above method embodiments, and will not be described in detail here.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the command processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (26)

1. A command processing apparatus, comprising: the system comprises a microcontroller, at least one command distributor corresponding to a user and an arithmetic unit corresponding to each command distributor;
the microcontroller is used for reading a command from the buffer corresponding to the at least one user and storing the command into a command queue corresponding to the at least one user;
the command distributor is used for reading the commands from the corresponding command queue and distributing the commands to the operation units corresponding to the command distributor;
and the arithmetic unit is used for executing the commands distributed by the corresponding command distributor.
2. The command processing apparatus according to claim 1, wherein, for the case where there are a plurality of said users and different users correspond to different buffers,
the microcontroller is used for reading the commands from different buffers respectively at different time slices.
3. The command processing apparatus of claim 2, wherein the microcontroller is configured to:
sequentially taking a plurality of time slices as current time slices respectively, executing the following monitoring process in the current time slices until the current time slices are finished, and switching to the next time slice;
the monitoring process comprises the following steps:
monitoring whether a command exists in a buffer corresponding to the current time slice or not in the current time slice;
and responding to the command existing in the buffer corresponding to the current time slice, and reading the command from the buffer corresponding to the current time slice.
4. The command processing apparatus according to claim 1, wherein, for the case where there are a plurality of said users and different users correspond to different buffers,
the microcontroller is used for polling the buffers corresponding to the users respectively and reading commands from the currently polled buffer in response to the buffer corresponding to the currently polled user not being empty.
5. The command processing apparatus of claim 1, wherein for the case where different buffers are associated with different users and one user is associated with a plurality of buffers,
the microcontroller is used for polling a plurality of buffers corresponding to each user in a time slice corresponding to each user and reading commands from the currently polled buffers.
6. The command processing apparatus according to any one of claims 1-5, wherein the microcontroller is configured to read the commands from the same buffer corresponding to a plurality of users, in case that the users have a plurality of different users corresponding to the same buffer.
7. The command processing apparatus according to claim 6, wherein for a case where there are a plurality of users and different users correspond to the same buffer, the microcontroller is configured to access storage locations corresponding to the same buffer by the plurality of users at different time slices, and read a command; or
And polling the storage positions corresponding to the plurality of users in the same buffer respectively, and reading the command.
8. The command processing apparatus according to any one of claims 1 to 7, wherein the buffer comprises: a ring buffer; the ring buffer comprises storage entries corresponding to at least one command stream in the same user respectively;
and the microcontroller is used for reading the command corresponding to each command stream based on the storage inlet corresponding to each command stream in the ring buffer.
9. The command processing apparatus according to any of claims 1-8, wherein the microcontroller is configured to store commands corresponding to at least one command stream in the same user into a command queue corresponding to the at least one command stream.
10. The command processing apparatus according to claim 9, wherein in a case where a user corresponds to a plurality of buffers, when the microcontroller determines a command queue for the user, the microcontroller determines a command queue corresponding to different buffers for the user, respectively.
11. The command processing apparatus according to any one of claims 1-10, wherein there are M said users, and N target users among the M said users share the same command queue and the same command distributor; wherein M is an integer greater than 1; n is an integer less than or equal to M and greater than 1;
and the microcontroller is used for reading commands from the buffers corresponding to the N target users respectively at different time slices and storing the commands into a command queue shared by the N target users.
12. The command processing apparatus of claim 11, wherein the command distributor is configured to read, in the different time slices, commands respectively corresponding to different target users from a command queue shared by the N target users, and send the commands respectively corresponding to different target users to the arithmetic unit corresponding to the command distributor;
and the operation unit corresponding to the command distributor is used for executing the commands corresponding to the N target users in different time slices.
13. A command processing method is applied to a command processing device, and the command processing device comprises the following steps: the system comprises a microcontroller, at least one command distributor corresponding to a user and an arithmetic unit corresponding to each command distributor; the command processing method comprises the following steps:
the microcontroller reads the commands from the buffer corresponding to the at least one user and stores the commands into the command queue corresponding to the at least one user;
the command distributor reads the commands from the corresponding command queue and distributes the commands to the operation units corresponding to the command distributor;
the arithmetic unit executes the command distributed by the corresponding command distributor.
14. The method according to claim 13, wherein for the case that there are a plurality of users and different users correspond to different buffers, the microcontroller reads the command from the buffer corresponding to the at least one user, comprising:
the microcontroller reads the commands from different buffers at different time slices.
15. The method of claim 14, wherein the microcontroller reads the commands from different buffers at different time slices, respectively, comprising:
the microcontroller sequentially takes the plurality of time slices as current time slices respectively, and executes the following monitoring process in each time slice until the current time slice is finished, and switches to the next time slice;
the monitoring process comprises the following steps: monitoring whether a command exists in a buffer corresponding to a current time slice or not in the current time slice; and responding to the command existing in the buffer corresponding to the current time slice, and reading the command from the buffer corresponding to the current time slice.
16. The method according to claim 13, wherein for the case that there are a plurality of users and different users correspond to different buffers, the microcontroller reads the command from the buffer corresponding to the at least one user, comprising:
and the microcontroller polls the buffers corresponding to the plurality of users respectively and reads the command from the currently polled buffer in response to the currently polled buffer corresponding to the user not being empty.
17. The method according to claim 13, wherein for the case that different users correspond to different buffers and one user corresponds to multiple buffers, the microcontroller reads the command from the buffer corresponding to the at least one user, comprising:
and the microcontroller polls the plurality of buffers corresponding to each user in the time slice corresponding to each user and reads a command from the currently polled buffer.
18. The command processing method according to any one of claims 13-17, wherein for a case where there are a plurality of users and different users correspond to the same buffer, the microcontroller reads the command from the buffer corresponding to the at least one user, including:
the microcontroller reads the commands from the same buffer corresponding to the plurality of users.
19. The method of claim 18, wherein for the case that there are multiple users and different users correspond to the same buffer, the microcontroller reads the command from the same buffer corresponding to the multiple users, and the method comprises:
the microcontroller respectively accesses the storage positions of a plurality of users in the same buffer in different time slices, and reads commands; or
And polling the storage positions corresponding to the plurality of users in the same buffer respectively, and reading the command.
20. The command processing method according to any one of claims 13 to 19, wherein the buffer comprises: a ring buffer; the ring buffer comprises storage entries corresponding to at least one command stream in the same user respectively;
the microcontroller reads a command from a buffer corresponding to the at least one user, including:
the microcontroller reads the command corresponding to each command stream based on the memory entry in the ring buffer corresponding to each command stream.
21. The command processing method of any one of claims 13-20, wherein the storing of the command by the microcontroller into the command queue corresponding to the at least one user comprises:
the microcontroller stores commands corresponding to at least one command stream in the same user into a command queue corresponding to the at least one command stream.
22. The method of claim 21, wherein in the case that one user corresponds to a plurality of buffers, the storing of the command into the command queue corresponding to the at least one user by the microcontroller comprises:
and when the microcontroller determines the command queue for the user, respectively determining the command queues corresponding to different buffers for different buffers corresponding to the user.
23. The command processing method according to any one of claims 13-22, wherein there are M said users, and N target users among the M said users share the same command queue and the same command distributor; wherein M is an integer greater than 1; n is an integer less than or equal to M and greater than 1; the microcontroller reads the command from the buffer corresponding to the at least one user and stores the command in the command queue corresponding to the at least one user, and the method comprises the following steps:
and the microcontroller reads commands from the buffers corresponding to the N target users respectively at different time slices and stores the commands into a command queue shared by the N target users.
24. The command processing method of claim 23, wherein the command distributor reads the command from the corresponding command queue and distributes the command to the arithmetic unit corresponding to the command distributor, comprising:
the command distributor reads the commands respectively corresponding to different target users from the command queue shared by the N target users in different time slices, and sends the commands respectively corresponding to different target users to the operation unit corresponding to the command distributor;
the arithmetic unit executes the command distributed by the corresponding command distributor, and comprises:
and the operation unit corresponding to the command distributor executes the commands corresponding to the N target users respectively in the different time slices.
25. An electronic device, comprising: a host, a buffer, and a command processing device;
the host is used for issuing a command to be executed and storing the command in a buffer corresponding to at least one user;
the command processing apparatus is configured to execute the command processing method of any one of claims 13 to 24.
26. A computer-readable storage medium on which a computer program is stored, wherein the program, when executed by a microcontroller, a command distributor, an arithmetic unit, implements the command processing method of any one of claims 13 to 24.
CN202110127623.5A 2021-01-29 2021-01-29 Command processing device and method, electronic device, and computer storage medium Pending CN114816652A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110127623.5A CN114816652A (en) 2021-01-29 2021-01-29 Command processing device and method, electronic device, and computer storage medium
PCT/CN2021/108396 WO2022160626A1 (en) 2021-01-29 2021-07-26 Command processing apparatus and method, electronic device, and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110127623.5A CN114816652A (en) 2021-01-29 2021-01-29 Command processing device and method, electronic device, and computer storage medium

Publications (1)

Publication Number Publication Date
CN114816652A true CN114816652A (en) 2022-07-29

Family

ID=82525691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110127623.5A Pending CN114816652A (en) 2021-01-29 2021-01-29 Command processing device and method, electronic device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN114816652A (en)
WO (1) WO2022160626A1 (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100345132C (en) * 2003-07-28 2007-10-24 华为技术有限公司 Parallel processing method and system
JP2008123045A (en) * 2006-11-08 2008-05-29 Matsushita Electric Ind Co Ltd Processor
US20120229481A1 (en) * 2010-12-13 2012-09-13 Ati Technologies Ulc Accessibility of graphics processing compute resources
US9176794B2 (en) * 2010-12-13 2015-11-03 Advanced Micro Devices, Inc. Graphics compute process scheduling
US10719237B2 (en) * 2016-01-11 2020-07-21 Micron Technology, Inc. Apparatuses and methods for concurrently accessing multiple partitions of a non-volatile memory
US10387992B2 (en) * 2017-04-07 2019-08-20 Intel Corporation Apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor
CN107943686A (en) * 2017-10-30 2018-04-20 北京奇虎科技有限公司 A kind of test dispatching method, apparatus, server and storage medium
CN110083388B (en) * 2019-04-19 2021-11-12 上海兆芯集成电路有限公司 Processing system for scheduling and access method thereof
CN111708639A (en) * 2020-06-22 2020-09-25 中国科学技术大学 Task scheduling system and method, storage medium and electronic device
CN113138801B (en) * 2021-04-29 2023-08-04 上海阵量智能科技有限公司 Command distribution device, method, chip, computer device and storage medium

Also Published As

Publication number Publication date
WO2022160626A1 (en) 2022-08-04

Similar Documents

Publication Publication Date Title
Didona et al. Size-aware sharding for improving tail latencies in in-memory key-value stores
EP2892181B1 (en) Method, device and physical host for managing physical network card
CN102609298B (en) Based on network interface card virtualization system and the method thereof of hardware queue expansion
EP3253027B1 (en) Resource allocation method and apparatus for virtual machines
CN102521047B (en) Method for realizing interrupted load balance among multi-core processors
CN105242957A (en) Method and system for cloud computing system to allocate GPU resources to virtual machine
CN109726005B (en) Method, server system and computer readable medium for managing resources
CN104598298A (en) Virtual machine dispatching algorithm based on task load and current work property of virtual machine
CN102497434A (en) Establishing method of kernel state virtual network equipment and packet transmitting and receiving methods thereof
Wu et al. Container lifecycle‐aware scheduling for serverless computing
JP2022516486A (en) Resource management methods and equipment, electronic devices, and recording media
CN103425534A (en) Graphics processing unit sharing between many applications
CN111176829A (en) Flexible resource allocation for physical and virtual functions in a virtualized processing system
Tan et al. A virtual multi-channel GPU fair scheduling method for virtual machines
US11521042B2 (en) System and method to dynamically and automatically sharing resources of coprocessor AI accelerators
KR102193747B1 (en) Method for scheduling a task in hypervisor for many-core systems
CN110532060A (en) A kind of hybrid network environmental data collecting method and system
CN114816777A (en) Command processing device, method, electronic device and computer readable storage medium
Tripathi et al. A heuristic-based task scheduling policy for qos improvement in cloud
CN114816652A (en) Command processing device and method, electronic device, and computer storage medium
CN109729113A (en) Manage method, server system and the computer program product of dedicated processes resource
US8881163B2 (en) Kernel processor grouping
CN111158911A (en) Processor configuration method and device, processor and network equipment
KR20140125893A (en) Virtualized many-core environment job distribution system and method, and recording medium thereof
CN114448909A (en) Ovs-based network card queue polling method and device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination