WO2022160628A1 - 命令处理装置、方法、电子设备以及计算机可读存储介质 - Google Patents

命令处理装置、方法、电子设备以及计算机可读存储介质 Download PDF

Info

Publication number
WO2022160628A1
WO2022160628A1 PCT/CN2021/108430 CN2021108430W WO2022160628A1 WO 2022160628 A1 WO2022160628 A1 WO 2022160628A1 CN 2021108430 W CN2021108430 W CN 2021108430W WO 2022160628 A1 WO2022160628 A1 WO 2022160628A1
Authority
WO
WIPO (PCT)
Prior art keywords
command
executed
processing block
processing
microcontroller
Prior art date
Application number
PCT/CN2021/108430
Other languages
English (en)
French (fr)
Inventor
冷祥纶
胡延隆
张国栋
徐宁仪
Original Assignee
上海阵量智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海阵量智能科技有限公司 filed Critical 上海阵量智能科技有限公司
Publication of WO2022160628A1 publication Critical patent/WO2022160628A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Definitions

  • the present disclosure relates to the technical field of computer science, and in particular, to a command processing apparatus, method, electronic device, and computer-readable storage medium.
  • Embodiments of the present disclosure provide at least a command processing apparatus, method, electronic device, and computer-readable storage medium.
  • an embodiment of the present disclosure provides a command processing device, including: a microcontroller and an arithmetic unit; wherein, the microcontroller is used to obtain commands to be executed after the current time slice arrives; the to-be-executed commands are The command carries a first processing block identifier corresponding to the processing block that needs to be processed in the current time slice among the multiple processing blocks corresponding to the command to be executed; the operation unit is used to obtain the command to be executed, And based on the first processing block identifier, the processing task corresponding to the command to be executed is executed.
  • the first value determined based on the processing result of the command to be executed in the previous corresponding time slice can be determined.
  • the processing block identifier is used to determine the starting processing block to be executed in the current time slice, and execute the processing task of the command to be executed based on the starting processing block. Since multiple processing operations are divided into multiple processing blocks of finer granularity, when context switching is performed by means of time division multiplexing, the waiting time caused by the unprocessed command to be executed can be reduced, thereby improving the processing efficiency of the command. efficiency.
  • the processing device further includes a command dispatcher; the microcontroller is used to acquire commands to be executed after the current time slice arrives, and store the commands to be executed in a command queue;
  • the command distributor is configured to obtain the command to be executed from the command queue and distribute the command to be executed to the operation unit; the operation unit is configured to obtain the command to be executed, and After receiving the to-be-executed command distributed by the command distributor, the processing task corresponding to the to-be-executed command is executed based on the first processing block identifier.
  • the operation unit is configured to: determine, based on the first processing block identifier, a starting processing block to be executed in the current time slice from the plurality of processing blocks, and based on the first processing block identifier The initial processing block executes the processing task corresponding to the command to be executed.
  • the operation unit can directly pay attention to the first processing block that needs to be processed by using the first processing block identifier. There is no need to retrieve multiple processing blocks in the command to be executed to determine the processing block that needs to be processed, so that the efficiency of command processing can be effectively improved.
  • the microcontroller is configured to: read the command to be executed from the target cache corresponding to the current time slice; or acquire the command to be executed from the host.
  • the microcontroller can flexibly acquire commands to be executed from the target cache or the host according to different time slices and execution conditions of different commands to be executed.
  • the microcontroller when acquiring the command to be executed from the host, the microcontroller is used to: determine whether the command queue is idle; when the command queue is idle, monitor whether the buffer stores the to-be-executed command; in the case of monitoring that the to-be-executed command exists in the buffer, read the to-be-executed command from the buffer.
  • the microcontroller can determine whether the command queue is idle and whether there is a command to be executed in the buffer when the command queue is idle, so that the to-be-executed command can be continuously issued under the condition that the buffer continuously receives the to-be-executed command. into the command queue, so that the command processing apparatus can keep high efficiency in processing the newly issued commands to be executed continuously.
  • the buffer includes a circular buffer; the circular buffer has multiple entries; the circular buffer stores commands to be executed in different command streams through different entries; In: determining a target entry from the buffer based on the command stream corresponding to the current time slice; monitoring whether the to-be-executed command is stored in the buffer based on the determined target entry.
  • the commands corresponding to different command streams are stored in the ring buffer by using the target entries corresponding to the command streams, so that the microcontroller can synchronously monitor multiple entries of the ring buffer, and can store different command streams in one processing cycle.
  • the command in the drop down to improve the efficiency of command acquisition.
  • the ring buffer can provide the communication program with mutually exclusive access to the buffer, which is beneficial to avoid the increased system overhead of the storage queue when the frequent command allocation occurs.
  • the microcontroller is used to: determine whether there is a command to be executed in the target cache corresponding to the current time slice; there is no command to be executed in the target cache corresponding to the current time slice. Next, the command to be executed is obtained from the host.
  • the microcontroller is further configured to read the command to be executed from the target cache when the command to be executed exists in the target cache corresponding to the current time slice. command to be executed.
  • the microcontroller when the microcontroller detects that there are commands to be executed in the target cache corresponding to the current time slice, the microcontroller can read the commands to be executed from the target cache. so as to continue to complete the processing of the to-be-executed commands, so that the command processing device can perform orderly processing according to the order of the to-be-processed commands, and can avoid the backlog of unexecuted commands to be executed corresponding to the same user from causing the target Cache overload.
  • the computing unit is further configured to report the second processing block identifier corresponding to the most recently executed processing block to the microcontroller after the current time slice ends; the microcontroller , which is further configured to update the command to be executed in the command queue based on the second processing block identifier after receiving the second processing block identifier reported by the computing unit.
  • the computing unit is configured to: after the processing task corresponding to the currently executing processing block is completed, take the currently executing processing block as the most recently executed processing block. , and report the second processing block identifier corresponding to the most recently executed processing block to the microcontroller.
  • the computing unit since the computing unit can quickly determine the second processing block identifier corresponding to the most recently executed processing block after processing the command to be executed in the current time slice, the computing unit reports the second processing block identifier to the microcontroller more convenient.
  • the microcontroller updates the command to be executed in the command queue after receiving the second processing block identifier reported by the computing unit, so that when the updated command to be executed is processed in the next corresponding time slice, it can Process directly according to the updated command to be executed.
  • the computing unit is configured to: send the second processing block identifier to the command distributor; the command distributor is further configured to send the first processing block identifier to the microcontroller. Two processing block identification.
  • the operation unit can complete the sending of the second processing block identifier to the microcontroller, so that the operation unit can not be established And the communication channel between the microcontrollers, reducing the occupation of the data interface of the microcontroller.
  • the microcontroller is configured to: in the case that the second processing block identifier is the processing block identifier of the last processing block in the plurality of processing blocks, execute the command to be executed. delete from the command queue; if the second processing block identifier is not the processing block identifier corresponding to the last processing block in the plurality of processing blocks, determine the target processing block based on the second processing block identifier identifier, and replace the first processing block identifier in the command to be executed with the target processing block identifier to generate a new command to be executed; wherein, the target processing block identifier is the next processing block of the recently executed processing block.
  • the microcontroller can know the processing situation of the command to be executed more accurately and easily according to the second processing block identifier.
  • the microcontroller can determine that the command to be executed has been processed and delete it from the command queue, which can effectively avoid A possible runtime error in which a pending command is repeatedly processed in the command queue.
  • the microcontroller is further configured to, after generating the new command to be executed, store the new command to be executed in a target cache corresponding to the command to be executed .
  • the new command to be executed can be stored in the target cache, and there is no need to generate a corresponding processing solution for the new command to be executed.
  • Executing commands for processing makes the work of the microcontroller more concise and stable, thereby making the command processing device more stable when processing commands and reducing the probability of running errors.
  • an embodiment of the present disclosure further provides another command processing device, including: a microcontroller, and an operation unit; the operation unit is configured to report to the microcontroller a current time slice in response to the end of the current time slice The first processing block identifier of the current processing block of the target command executed in the current time slice; wherein, the current processing block is any processing block in at least one processing block in the target command; the microcontroller is used for After receiving the first processing block identifier reported by the operation unit, the target command is updated by using the first processing block identifier.
  • an embodiment of the present disclosure further provides a command processing method, which is applied to a command processing device, where the command processing device includes: a microcontroller and an arithmetic unit; the command processing method includes: the microcontroller is in After the current time slice arrives, the command to be executed is obtained; the command to be executed carries a first processing block identifier corresponding to the processing block that needs to be processed in the current time slice in the plurality of processing blocks corresponding to the command to be executed; The operation unit acquires the command to be executed, and executes a processing task corresponding to the command to be executed based on the first processing block identifier.
  • the command processing apparatus further includes a command distributor; the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller obtains the command after the current time slice arrives.
  • the command to be executed is stored in the command queue; the command processing method further includes: the command dispatcher obtains the command to be executed from the command queue, and distributes the command to the computing unit the to-be-executed command; the operation unit obtains the to-be-executed command, and executes the processing task corresponding to the to-be-executed command based on the first processing block identifier, including: the operation unit obtains the to-be-executed command , and after receiving the to-be-executed command distributed by the command distributor, execute the processing task corresponding to the to-be-executed command based on the first processing block identifier.
  • the performing, based on the first processing block identifier, the processing task corresponding to the command to be executed includes: the operation unit, based on the first processing block identifier, performs a The initial processing block to be executed in the current time slice is determined from the processing blocks, and based on the initial processing block, the processing task corresponding to the command to be executed is executed.
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller reads the command to be executed from the target cache corresponding to the current time slice. ; or, obtain the command to be executed from the host.
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller determining whether the command queue is idle; in the case that the command queue is idle, Monitor whether the to-be-executed command is stored in the buffer; in the case of monitoring that the to-be-executed command exists in the buffer, read the to-be-executed command from the buffer.
  • the buffer includes a circular buffer; the circular buffer has multiple entries; the circular buffer stores commands to be executed in different command streams through different entries; the microcontroller Acquiring the command to be executed after the current time slice arrives includes: the microcontroller determining a target entry from the buffer based on a command stream corresponding to the current time slice; monitoring the buffer based on the determined target entry Whether the to-be-executed command is stored in .
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller determines whether there is a command to be executed in the target cache corresponding to the current time slice; If there is no command to be executed in the target cache corresponding to the current time slice, the command to be executed is obtained from the host.
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: a situation where the command to be executed exists in the target cache corresponding to the current time slice of the microcontroller. Next, read the command to be executed from the target cache.
  • it further includes: after the current time slice ends, the computing unit reports the second processing block identifier corresponding to the most recently executed processing block to the microcontroller; After receiving the second processing block identifier reported by the computing unit, the command to be executed in the command queue is updated based on the second processing block identifier.
  • the computing unit reports the second processing block identifier corresponding to the most recently executed processing block to the microcontroller, including: After the processing task corresponding to the executed processing block is executed, the currently executing processing block is regarded as the most recently executed processing block, and the first processing block corresponding to the recently executed processing block is reported to the microcontroller. Two processing block identification.
  • it further includes: the computing unit sends the second processing block identifier to the command distributor; the command distributor sends the second processing block to the microcontroller logo.
  • the updating the command to be executed in the command queue based on the second processing block identifier includes: the microcontroller is marked as the second processing block as the second processing block identifier.
  • the command to be executed is deleted from the command queue; when the second processing block identifier is not the last processing block in the plurality of processing blocks
  • the target processing block identifier is determined based on the second processing block identifier, and the first processing block identifier in the command to be executed is replaced with the target processing block identifier, and a new processing block identifier is generated.
  • the command to be executed wherein, the target processing block identifier is the processing block identifier corresponding to the next processing block of the most recently executed processing block.
  • the microcontroller after generating the new to-be-executed command, stores the new to-be-executed command into a target cache corresponding to the to-be-executed command.
  • an embodiment of the present disclosure further provides a command processing method, which is applied to a command processing apparatus, where the command processing apparatus includes: a microcontroller and an arithmetic unit; the command processing method includes: the arithmetic unit responds At the end of the current time slice, report the first processing block identifier of the current processing block of the target command executed in the current time slice to the microcontroller; wherein, the current processing block is at least one processing block in the target command any one of the processing blocks; after receiving the first processing block identifier reported by the computing unit, the microcontroller updates the target command by using the first processing block identifier.
  • an embodiment of the present disclosure further provides an electronic device, including a host, a buffer, and a command processing device; the host is configured to issue commands to be executed, and the buffer is configured to store the commands to be executed;
  • the command processing apparatus is configured to execute the method described in any one of the implementation manners of the third aspect or the fourth aspect.
  • an embodiment of the present disclosure further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a microcontroller or an arithmetic unit, any implementation of the third aspect or the fourth aspect can be implemented method described.
  • FIG. 1 shows a schematic diagram of a command processing apparatus provided by an embodiment of the present disclosure
  • Fig. 2a and Fig. 2b are respectively schematic state diagrams of command queues provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of a command processing apparatus provided by another embodiment of the present disclosure
  • FIG. 4 shows a flowchart of a command processing method provided by an embodiment of the present disclosure
  • FIG. 5 shows a flowchart of a command processing method provided by another embodiment of the present disclosure.
  • a time slice is usually set, and the command in the command stream corresponding to a user is executed in the current time slice, and the command in the command stream corresponding to a user is executed in the current time slice.
  • context switching of different users is performed, that is, switching from the current user to the next user, and the next time slice executes the commands in the command stream corresponding to the next user. Since the command corresponding to the current user may be being executed at the end of the time slice, when switching the context, it is necessary to wait for the execution of the executing command to complete before switching to the next user, resulting in a long time during context switching. Therefore, the command corresponding to the next user needs to wait for a long time before it can be processed, resulting in the problem of low efficiency in command processing.
  • the present disclosure provides a command processing apparatus, method, electronic device, and computer-readable storage medium.
  • the initial processing block to be executed in the current execution cycle is determined according to the first processing block identifier determined in the previous execution cycle to be executed, and based on the initial processing block, the processing task of the command to be executed is executed, thereby
  • the commands are divided into finer granularity, and when time division multiplexing is performed, time division multiplexing is performed according to a finer granularity, which reduces the waiting time required for context switching and improves the efficiency of command processing.
  • the command processing apparatus provided by the embodiments of the present disclosure may be applied to a GPU, an artificial intelligence chip, or other command processing apparatuses including a command processor and an execution unit.
  • the command processing apparatus provided by the embodiment of the present disclosure will be described below by taking the command processing apparatus provided by the embodiment of the present disclosure applied to the GPU as an example.
  • a schematic structural diagram of a command processing apparatus provided by an embodiment of the present disclosure includes: a microcontroller 10 and an arithmetic unit 20; wherein,
  • the microcontroller 10 is used to obtain the command to be executed after the current time slice arrives; the command to be executed carries the corresponding processing block for indicating that the current time slice needs to be processed in the plurality of processing blocks corresponding to the command to be executed The first processing block identifier of ;
  • the operation unit 20 is configured to acquire the command to be executed, and based on the first processing block identifier, execute the processing task corresponding to the command to be executed.
  • the microcontroller 10 will obtain the command to be executed after the current time slice arrives; the command to be executed carries a command for indicating the current time slice in the processing blocks corresponding to the command to be executed
  • the operation unit 20 can execute the processing task corresponding to the to-be-executed command based on the first processing block identifier carried in the to-be-executed command. In this process, multiple processing operations corresponding to the command are divided into multiple processing blocks, and the first processing block identifier is used to indicate the processing block to be executed, so that in the processing task of executing the command to be executed, the command can be updated.
  • Fine-grained division, and time-division multiplexing is performed according to finer granularity during time-division multiplexing, which reduces the waiting time required for context switching and improves the efficiency of command processing.
  • CUDA Compute Unified Device Architecture
  • processing blocks may be equivalent to thread blocks.
  • a command distributor 30 is further included; the microcontroller 10, after acquiring the to-be-executed command, can also store the to-be-executed command in a command queue.
  • the command distributor 30 acquires the to-be-executed command from the command queue, and distributes the to-be-executed command to the operation unit.
  • the command distributor 30 obtains the command to be executed from the command queue, and distributes the command to be executed to the arithmetic unit, which is based on the first processing block in the command to be executed Identifies the execution of the to-be-executed command.
  • the command distributor 30 may be implemented by software (eg, running in the microcontroller 10 ) or implemented by hardware.
  • the computing unit 20 may be implemented by hardware, such as a graphics processing cluster (Graphics Processing Cluster, may be referred to as GPC for short) or a streaming multiprocessor (streaming Multiproeessor, may be referred to as SM for short).
  • GPC Graphics Processing Cluster
  • SM streaming Multiprocessor
  • the microcontroller 10, the arithmetic unit 20, and the command distributor 30 will be described in detail below, respectively.
  • the user in the embodiment of the present disclosure may include, for example, any of the following: a virtual machine, a computer container, an application, or different functions in the application.
  • the application program can generate a main process and multiple sub-processes for performing different processing tasks when running; each process will generate a corresponding command stream when performing the corresponding processing task. . Furthermore, when an application is run, multiple command streams are generated; each command stream includes at least one command; the command processing device provided by the embodiment of the present disclosure is based on the time division multiplexing technology, and performs processing on different command streams of different applications. Processing of time slices.
  • the application can generate multiple processes when it is run; each process can include multiple threads; each thread can perform corresponding processing tasks and generate commands corresponding to the threads.
  • different functions of the application program can be regarded as different users respectively, and the command processing apparatus provided by the implementation of the present disclosure can perform time-sliced processing on different functions in the same application program based on the time division multiplexing technology.
  • command streams which can be referred to as streams for short
  • streams can exist simultaneously in the application layer of the virtual machine; Delivered to the buffer corresponding to the user.
  • a user that is, a virtual machine, which can be represented as VM1, for example
  • VM1 a virtual machine, which can be represented as VM1, for example
  • the host can generate a thread for running the software function.
  • the thread executes the processing task, it will generate a command stream stream; the user's CPU stores the commands in the stream into the buffer corresponding to the user VM1.
  • a software function corresponding to the virtual machine VM1 may include, for example, realizing target recognition of the target object in the image, or realizing image processing functions such as determining the pose of the target object in the image, or storing or calling the image. General data processing tasks.
  • the software functions specifically implemented by the virtual machine can be determined according to the actual situation, and details are not described herein again.
  • the command to be executed is any command in the command stream.
  • the virtual machine VM1 can perform image processing on different multiple images, such as Pic1 and Pic2.
  • one command stream among the multiple command streams corresponding to the virtual machine VM1 may include, for example, multiple commands to be executed for performing image processing on the image Pic1, for example, may include performing convolution operations, pooling operations, and full operations on the image Pic1. Connection operations, etc.
  • the command to be executed includes: address information for pointing to an operand; indication information for indicating a processing task of a specific operator (kernel), and a first processing block identifier for indicating the first processing block corresponding to the command to be executed.
  • the operation unit 20 when the operation unit 20 executes the to-be-executed command, it can obtain the operand by using the address information, and process the operand according to the instruction information.
  • the process of processing can be decomposed into multiple processing operations. Different processing operations correspond to different data in the operands (such as different pixels in the image); or, different processing operations include different operations on the operands (such as multiplying the pixel value of the pixel by the weight first, and then different The result of the product of the pixel point and the weight is summed).
  • the multiple processing operations included in a command to be executed are divided into multiple processing blocks (Blocks).
  • the command to be executed will be executed through thread blocks, and each thread block includes multiple threads. After the processing operation in the processing block, the processing of the command to be executed is completed.
  • the first processing block identifier included in the command to be executed is used to point to any one of the processing blocks that have not been processed in the command to be executed. In an implementation manner, there is no requirement for the execution sequence of the processing blocks in the execution process. , the first processing block identifier can be made to point to any one of the unprocessed processing blocks in the command to be executed. If the execution process requires the execution sequence of the processing blocks, the first processing block identifier can be made to point to the first processing block that needs to be executed among the processing blocks that have not yet been processed in the command to be executed.
  • the command K1 to be executed may include, for example, performing a convolution operation on the image Pic1; when the size of Pic1 is 128*128, the operation processing of the convolution operation may be performed on each pixel in the image Pic1 respectively.
  • a processing operation that is, for the to-be-executed command K1
  • there are 128*128 processing operations wherein every 32*32 processing operations are divided into a processing block, that is, the to-be-executed command includes 4*4 processing operations processing block.
  • the 16 processing blocks can be numbered sequentially, for example, the numbers are: 0 to 15 respectively.
  • every 32*32 processing operations are allocated to one thread block, and the threads in the thread block perform operations, that is, the commands to be executed are allocated to 4*4 thread blocks to perform operations.
  • the microcontroller 10 may acquire the to-be-executed command from the command stream sent by the host; at this time, the first processing block identifier in the to-be-executed command may be It is the default state, indicating that the current need to start from the first processing block, that is, from the processing block numbered 0 to process the command K1 to be executed;
  • the first processing block identifier points to a processing block that has not been processed in the command to be executed. For example, if the processing blocks numbered 0 to 12 have been processed before the current time slice, the first processing block corresponding to the command K1 to be executed in the current time slice is identified as 13.
  • the processing tasks corresponding to different virtual machines may be the same or different; when the same virtual machine performs the same processing tasks on different images, it is not necessary to make consistent restrictions on image attributes such as the pixel size of the images. That is, different images Pic1, Pic2, etc. can be of different pixel sizes.
  • the microcontroller 10 When acquiring the to-be-executed command, the microcontroller 10 is configured to read the to-be-executed command from the target cache corresponding to the current time slice, or acquire the to-be-executed command from the host. Specifically, the method for obtaining the command to be executed by the microcontroller 10 includes the following (a1) and (a2):
  • (a1) In the case where the microcontroller obtains the command to be executed from the host, the microcontroller 10 is used to determine whether the command queue is free. When the command queue is idle, the microcontroller 10 monitors the buffer, and determines the target entry from the buffer based on the command stream corresponding to the current time slice; based on the determined target entry, monitors whether the buffer is stored to be executed. Order.
  • each command queue corresponds to a command flow.
  • Different command streams of different users can share a command queue in different time segments.
  • the command queue can be a software queue or a hardware queue; when the command queue is a software queue, the microcontroller 10 can control the generation and deletion of the command queue; when the microcontroller 10 acquires the command to be executed, The number of command streams corresponding to the user can be determined; when the number of command queues that have been generated currently is less than the number of command streams corresponding to the user, the microcontroller 10 generates a new command queue, so that the number of command queues is greater than or equal to the The number of user command streams; if a command queue has not been used for a long time, the microcontroller 10 can delete the command queue.
  • the maximum number of command streams corresponding to the virtual machine VM1 and VM2 are different, for example, when the maximum number of command streams corresponding to the virtual machine VM1 is 10 and the maximum number of command streams corresponding to the virtual machine VM2 is 6, the maximum number of command streams corresponding to the virtual machine VM1 And the comparison result of the maximum number of command streams corresponding to VM2, it is determined that there are 10 command queues. At this time, for the virtual machine VM2, the corresponding command only occupies 6 command queues, and the rest of the command queues can be vacant in the time slice corresponding to the virtual machine VM2. detection.
  • the time slice may include, for example, the time allocated to different virtual machines for processing commands.
  • the microcontroller 10 may determine the command queue for the user to use when processing the user's command according to the number of the user's command stream.
  • the command queue corresponding to the command stream A command stream corresponding to the virtual machine VM1 is stored.
  • K1 and K2 are included. Since the virtual machine VM1 can correspond to multiple command streams, the virtual machine stores the command streams in the corresponding command queues.
  • the microprocessor determines whether the command queue is free, for example, it can be determined by using the read pointers and write pointers corresponding to the multiple command queues respectively.
  • the read pointer and the write pointer in any command queue indicate the same position, the command Queue is idle.
  • the command to be executed may be obtained from the buffer.
  • the buffer is used to store the command issued by the host.
  • FIG. 2a and FIG. 2b are schematic diagrams of states of command queues provided by embodiments of the present disclosure.
  • the command queue 21 when the command queue 21 is idle, it waits to receive the command to be executed sent by the host 22 to the buffer 23, and the microcontroller 10 sends the command to be executed received in the buffer 23 to Command queue 21.
  • the dashed box 211 in the command queue 21 indicates that the command queue is free.
  • a solid line frame 241 represents a pending command
  • the pending command can be processed first, and then a new pending command sent by the host 22 is received.
  • the buffer may include, for example, a ring buffer (Ring Buffer) or a storage queue.
  • ring buffer is a first-in-first-out circular buffer, it can provide the communication program with mutually exclusive access to the buffer, and when used, it can avoid the system overhead that the storage queue increases when frequent command allocation is performed.
  • a circular buffer is selected as the command queue.
  • any ring buffer there are multiple entries for receiving commands to be executed of different command streams sent from the host (host).
  • the host sends to the virtual machine VM1 two command streams for performing target recognition processing on a group of images Pic1 and Pic2 respectively, which can be represented as s1 and s2, for example.
  • the command stream s1 includes commands K1 and K2 for performing target recognition processing on the image Pic1.
  • the specific descriptions of the commands K1 and K2 have been described in detail above and will not be repeated here.
  • a command may be issued to the buffer in the following manner: determining at least one command stream corresponding to the user; determining at least one command to be executed corresponding to each thread based on each command to be executed in the at least one command stream ; Using the storage entry corresponding to each command stream in the buffer, store at least one command to be executed corresponding to each command stream in the at least one command stream into the buffer.
  • the read pointer corresponding to the buffer can be used to query according to a plurality of preset target entries, and when the target entry corresponding to the command to be executed is addressed, use the buffer corresponding to the target entry.
  • the write pointer is written into the command to be executed issued by the host; or, the read pointer corresponding to the buffer can be used to poll multiple command storage spaces in the buffer, and the command storage space corresponding to the command storage space that can store the command to be executed can be The entry is used as the target entry, and then the command to be executed is written using the write pointer corresponding to the buffer.
  • the specific method for storing the command to be executed in the buffer can be determined according to the actual situation, and details are not repeated here.
  • the microcontroller 10 When the microcontroller 10 is monitoring the buffer, since the host is constantly issuing new commands to the buffer, and the arithmetic unit 20 is continuously processing the commands issued to the buffer in the command queue (Stream Queue) Therefore, when the microcontroller 10 is monitoring the buffer, there may be no corresponding commands to be executed when the read pointer of the buffer is used to poll the position of the target entry. At this time, the microcontroller 10 continues to monitor the buffer until the command to be executed at the corresponding position is monitored, and then reads the command to be executed from the buffer.
  • Stream Queue command queue
  • the target cache may include, for example, device memory or double-rate synchronous dynamic random access memory (Synchronous Dynamic Random Access Memory, DDR SDRAM), which is used to store the commands to be executed by the user corresponding to the non-current time slice, or, stored by the micro-controller The command to be executed read from the command queue by the server 10.
  • DDR SDRAM Synchronous Dynamic Random Access Memory
  • the microcontroller 10 transmits the commands to be executed by different users to the command queue in different time slices, so as to realize the time division multiplexing of the command processing device by different users, therefore, the commands that are not executed in one time slice are not executed.
  • the microcontroller 10 will temporarily store it in the corresponding target cache after the time slice for executing the command to be executed ends; after the next time slice for processing it arrives, the target cache will be stored in the target cache.
  • the commands to be executed in the command line are retransmitted to the command queue, so that the command distributor 30 can redistribute them to the operation unit 20 for subsequent processing.
  • the microcontroller 10 acquires the to-be-executed command, it is used to determine whether the to-be-executed command exists in the target cache corresponding to the current time slice. Taking the existence of two virtual machines VM1 and VM2 as an example, when the microcontroller 10 determines whether there is a command to be executed in the target cache corresponding to the current time slice, it includes the following two cases (b1) or (b2):
  • the context at the end of the execution of the last time slice of the current time slice can be saved to the target cache corresponding to the last time slice, and the context information includes the processing block identifier corresponding to the last processing block executed at the end of the execution, etc., And acquire the context information saved in the target cache corresponding to the current time slice, and continue to execute in the current time slice.
  • the i-th (i is a positive integer) time slice is allocated to the virtual machine VM1
  • the i+1-th time slice is allocated to the virtual machine VM2
  • the i+2-th time slice is also allocated to the virtual machine VM2.
  • machine VM1 in the ith time slice, if the command A corresponding to the virtual machine VM1 has not been executed, the command A corresponding to the virtual machine VM1 will be temporarily stored in the target cache corresponding to the command A; wait for the i+1th After the time slice ends, the microcontroller 10 pulls down the command A from the corresponding target buffer to the command queue in the i+2 th time slice. The specific processing process is not repeated here.
  • the i-th time slice (i is a positive integer) is allocated to the virtual machine VM1
  • the i+1-th time slice is allocated to the virtual machine VM2
  • the i+th time slice is allocated to the virtual machine VM2.
  • 2 time slices are also allocated to the virtual machine VM1; in the i-th time slice, if the command A corresponding to the virtual machine VM1 is executed, after waiting for the i+1-th time slice to end, the microcontroller 10 will execute the i-th time slice.
  • the microcontroller 10 After +2 time slices, it is detected that there is no command to be executed in the target cache corresponding to the current time slice, and the microcontroller 10 acquires the newly issued command to be executed from the host.
  • the to-be-executed instruction obtained from the host may be stored in a command queue, and multiple processing operations corresponding to the to-be-executed instruction may be divided into multiple processing blocks during execution, and processed according to the processing methods provided in the embodiments of the present application.
  • corresponding target caches are allocated as needed.
  • N is a positive integer greater than 1
  • corresponding target caches need to be allocated , so that if the virtual machine corresponding to any time slice has not finished executing the command to be executed, in the next time slice not corresponding to this virtual machine, the unexecuted command to be executed corresponding to the virtual machine is stored in the in the target cache corresponding to this virtual machine.
  • the target cache includes the device memory
  • the data transmission channel only allows data transmission in one direction, so it can be used for N virtual devices.
  • the machine allocates N memory units to place the unexecuted commands to be executed corresponding to different virtual machines;
  • the target cache includes the double-rate synchronous dynamic random access memory
  • the data of the double-rate synchronous dynamic random access memory is bidirectional transmission.
  • the data transfer channel allows bidirectional data transfer, such as Direct Memory Access (DMA), so there can be two virtual machines sharing a target cache, for example, you can also Set the same target cache for both virtual machines.
  • DMA Direct Memory Access
  • the command distributor 30 can acquire the to-be-executed command from the command queue, and distribute the to-be-executed command to the computing unit 20 .
  • a first processing block identifier used to indicate the to-be-executed command is carried.
  • the computing unit 20 can parse the first processing block identifier from the to-be-executed command, and use the processing block indicated by the first processing block identifier as the current time slice.
  • the starting processing block to be executed, and based on the starting processing block, the processing task corresponding to the command to be executed is executed.
  • a scheduler is deployed in the computing unit 20; the instruction to be executed is sent to the scheduler; the scheduler determines the first processing block identifier from the to-be-executed command, and then uses the first processing block identifier to retrieve the to-be-executed block identifier. Determine the initial processing block that needs to be executed first among the corresponding processing blocks in the execution command, and then use each processing operation in the processing block corresponding to the processing block identifier as a task, and assign it to each thread running in the computing unit 20. Each thread executes the computing task corresponding to the processing operation.
  • the task issued by the scheduler will carry the address information of the pixel to be processed.
  • the storage space obtains the corresponding operand, and then performs corresponding processing on the operand.
  • each operator may include multiple threads (threads); during hardware execution, a certain number of threads may be combined into processing blocks (blocks) or thread blocks, as the minimum scheduling granularity .
  • the block scheduler inside the command dispatcher 30 schedules the processing blocks in sequence. When the time slot ends, the scheduler enters the stop mode. After the block is completed, it reports to the microcontroller (MCU).
  • MCU microcontroller
  • the current operator is completed, it only needs to wait for the execution of the distributed processing block to complete, which shortens the waiting time of the current task and improves the efficiency of context switching.
  • the to-be-executed command K1 when the to-be-executed command K1 contains 128*128 processing operations, every 32*32 processing operations is determined as a processing block, then the to-be-executed command K1 is divided into: 16 processing blocks. At this time, if the total processing time of the operation unit 20 to execute the command K1 is 0.16s, the processing time for each processing block is 0.01s, and in the case that one time slice is 0.1s, the processing block can be used in 0.1s Complete the processing of the first 10 processing blocks in the command K1 to be executed, and then perform context switching between virtual machines;
  • the computing unit 20 will finish processing K1; after the computing unit 20 has finished processing K1, it will report the processed information to the microcontroller 10, and at this time, the microcontroller 10 will perform context switching, that is, switch to The next virtual machine executes the command to be executed corresponding to the next virtual machine. In this way, there will be a 0.06s delay for the command processing of the next virtual machine; in the case of multiple virtual machines, when switching the virtual machine context, for the first virtual machine farther away from the first virtual machine. There are N virtual machines. Since the first virtual machine to the N-1th virtual machine has accumulated delays when context switching is performed, the Nth virtual machine may be stuck and other phenomena.
  • a more fine-grained division is performed on a command to be executed, so that after the current time slice ends, only after the processing block being processed is executed, the context switch of the virtual machine can be performed, reducing the need for the current time slice.
  • the delay caused by context switching after the command to be executed that is being processed by the slice needs to continue to be processed after the current time slice ends.
  • the corresponding first processing block identifier can be represented as K1_0, K1_2, . -5 processing block
  • the arithmetic unit 20 will report the processed information to the microcontroller 10 after the processing block marked as K1_5 is completed, and the microcontroller 10 will perform context switching; at this time, the delay is at most 0.01 s.
  • the computing unit 20 when reporting the processed information to the microcontroller 10 , the computing unit 20 will report the second processing block identifier corresponding to the most recently executed processing block to the microcontroller 10 .
  • the computing unit 20 reports the second processing block identifier corresponding to the most recently processed processing block to the microcontroller 10, it is used for: after the processing task corresponding to the processing block currently being executed is executed, the processing block currently being executed is executed. As the most recently executed processing block, the second processing block identifier corresponding to the most recently executed processing block is reported to the microcontroller 10 .
  • the operation unit 20 when the to-be-executed command K1 is not processed by the operation unit 20, the corresponding first processing block is identified as K1_0; at the end of a time slice, the operation unit 20 is executing the tenth processing in the to-be-executed command K1. piece. The operation unit 20 will continue to complete the execution of the tenth processing block in the command K1 to be executed, and use the tenth processing block identifier K1_9 as the second processing block corresponding to the most recently executed processing block reported to the microcontroller 10 . Block ID.
  • the computing unit 20 may send the second processing block identifier to the command distributor; the command distributor sends the second processing block identifier to the microcontroller 10 .
  • this method can also directly use the established data transmission channel between the operation unit 20 and the command distributor, and the data transmission channel between the command distributor and the microcontroller 10, to further improve the multiplexing rate of the data channel .
  • the microcontroller 10 After receiving the second processing block identifier reported by the computing unit 20, the microcontroller 10 updates the command to be executed in the command queue based on the second processing block identifier.
  • the microcontroller 10 updates the command to be executed in the command queue
  • the second processing block identifier is the processing block identifier of the last processing block in the plurality of processing blocks
  • the to-be-executed command is changed from the command delete from the queue
  • the target processing block identifier is determined based on the second processing block identifier, and the first processing block identifier in the command to be executed is determined.
  • a processing block identifier is replaced with a target processing block identifier to generate a new command to be executed;
  • the target processing block identifier is the processing block identifier corresponding to the next processing block of the most recently executed processing block.
  • the microcontroller 10 updates the command to be executed in the command queue, if the first processing block identifier in the command to be executed is replaced with the target processing block identifier, the new command to be executed is stored in the same command as the command to be executed. in the corresponding target cache.
  • the microcontroller 10 After the microcontroller 10 stores the new command to be executed in the target cache corresponding to the command to be executed, it will take the next time slice as the new current time slice, and read it from the target cache corresponding to the new current time slice.
  • the command to be executed corresponding to another virtual machine is obtained, or the command to be executed corresponding to the other virtual machine is obtained from the host, thereby completing the context switching of the virtual machine.
  • the multiple commands issued from the host may also include multiple command streams with associated sequential execution sequences.
  • the command corresponding to the first command stream includes target recognition of the image, and the command corresponding to the first command stream includes determining the pose information of the identified multiple target objects in the image after obtaining the target recognition result.
  • the address of the pixel point of the corresponding image contained in the processing block corresponding to the command to be executed can be set to the address of the pixel point of the image obtained after the first command stream ends, that is, using
  • the command processing apparatus can not only complete the processing of multiple images corresponding to one virtual machine, but also complete multiple consecutive processing tasks of the same image by multiple virtual machines.
  • the embodiment of the present disclosure also provides a specific process example of using the command processing apparatus provided by the embodiment of the present disclosure to process a command.
  • a virtual machine VM1 and a virtual machine VM2 are included; wherein, multiple commands corresponding to VM1
  • One command stream s1 in the streams includes a command K1; one command stream s2 among the multiple command streams corresponding to the virtual machine VM2 includes a command K2.
  • the command K1 includes 128*128 processing operations, and the 128*128 processing operations are divided into 4*4 processing blocks, and the processing block identifiers of each processing block are: K1_0 to K1_15.
  • the command K2 includes 64*128 processing operations, the 64*128 processing operations are divided into 2*4 processing blocks, and the processing block identifiers of each processing block are K2_0 to K2_7 respectively.
  • the microcontroller pulls down the command K1 from the buffer RBUF1 corresponding to VM1, and sends the command K1 to the command queue SQ1 corresponding to VM1; at this time, K1
  • the first processing block identifier carried in is K1_0.
  • the command distributor obtains the command K1 from SQ1, and distributes the command K1 to the operation unit;
  • the operation unit after parsing the first processing block identifier K1_0 from K1, takes the first processing block in the 4*4 processing blocks as the initial processing block, and processes the initial processing block.
  • the command distributor reports the second processing block identifier K1_6 to the microcontroller.
  • the microcontroller determines that K1_7 is the target processing block identifier, and replaces the first processing block identifier K1_0 carried in the command K1 with K1_7, generates a new command K1', and stores the command K1' to In the target cache corresponding to the command stream s1.
  • the microcontroller After the microcontroller stores the command K1' in the target cache corresponding to the command stream s1, it pulls down the command K2 from the buffer RBUF2 corresponding to the VM2, and sends the command K2 to the command queue SQ1 corresponding to the VM2; , the first processing block identifier carried in K2 is K2_0.
  • the command distributor obtains the command K2 from SQ1, and distributes the command K2 to the operation unit;
  • the operation unit after parsing the first processing block identifier K2_0 from K2, takes the first processing block in the 2*4 processing blocks as the initial processing block, and processes the initial processing block.
  • the command distributor reports the second processing block identifier K2_4 to the microcontroller.
  • the microcontroller determines that K2_5 is the target processing block identification, and replaces the first processing block identification K2_0 carried in the command K2 with K2_5, generates a new command K2', and stores the command K2' to In the target cache corresponding to the command stream s2.
  • the microcontroller reads the command K1' from the target cache corresponding to the command stream s1; and sends the command K1' to the command queue SQ1 corresponding to the VM1; at this time, the first processing block carried in the K1' is identified as K1_7 .
  • the command distributor obtains the command K1' from SQ1, and distributes the command K1' to the operation unit;
  • the operation unit after parsing the first processing block identifier K1_7 from K1', takes the eighth processing block in the 4*4 processing blocks as the initial processing block, and processes the initial processing block.
  • the command distributor reports the second processing block identifier K1_13 to the microcontroller.
  • the microcontroller determines that K1_14 is the target processing block identifier, and replaces the first processing block identifier K1_7 carried in the command K1' with K1_14, generates a new command K1", and stores the command K1" into the target buffer corresponding to the command stream s1.
  • the microcontroller reads the command K2' from the target cache corresponding to the command stream s2; and sends the command K2' to the command queue SQ1 corresponding to the VM2; at this time, the first processing block carried in the K2' is identified as K2_5 .
  • the command distributor obtains the command K2' from SQ1, and distributes the command K2' to the operation unit;
  • the operation unit after parsing the first processing block identifier K2_5 from K2', takes the sixth processing block in the 2*4 processing blocks as the initial processing block, and processes the initial processing block.
  • the command distributor reports the second processing block identifier K2_7 to the microcontroller.
  • the microcontroller determines that the command K2 has been executed, and deletes it from the command queue Q1.
  • the microcontroller can monitor the RBUF2 corresponding to the command stream s2, and if there is a new command, it will continue to pull down to the command queue Q1 or the target buffer corresponding to the command stream s2.
  • the microcontroller After the end of time slice t5 and the start of the third time slice t6, the microcontroller reads the command K1" from the target cache corresponding to the command stream s1; and sends the command K1" to the command corresponding to VM1 In queue SQ1; at this time, the first processing block carried in K1" is identified as K1_14.
  • the command distributor obtains the command K1" from SQ1, and distributes the command K1" to the operation unit;
  • the operation unit after parsing the first processing block identifier K1_14 from K1", takes the 15th processing block in the 4*4 processing blocks as the initial processing block, and processes the initial processing block.
  • the command distributor reports the second processing block identifier K1_15 to the microcontroller.
  • the microcontroller determines that the command K1 has been executed, and deletes it from the command queue Q1.
  • the microcontroller can monitor the RBUF1 corresponding to the command stream s1, and if there is a new command, it will continue to pull it down to the command queue Q1 or the target buffer corresponding to the command stream s1.
  • FIG. 3 is a schematic diagram of a command processing apparatus according to another embodiment of the present disclosure.
  • 31 denotes a microcontroller
  • 32 denotes a plurality of command queues, wherein 321 denotes a command queue SQ1, and 322 denotes a command queue SQ2
  • 33 represents the command distributor
  • 34 represents the operation unit, which includes the operation unit 0, the operation unit 1, . . . and the operation unit n
  • For the target cache 352 represents the target cache corresponding to the command stream s2.
  • the commands stored in the corresponding target cache are represented as eSQ1, eSQ2, . . . , and eSQm, respectively.
  • a flowchart of a command processing method provided by an embodiment of the present disclosure includes:
  • the microcontroller obtains the command to be executed after the current time slice arrives; the command to be executed carries a first processing block corresponding to the processing block that needs to be processed in the current time slice among the plurality of processing blocks corresponding to the command to be executed identification;
  • the computing unit acquires the command to be executed, and executes the processing task corresponding to the command to be executed based on the first processing block identifier.
  • the command processing apparatus further includes a command distributor
  • the microcontroller obtains the command to be executed after the current time slice arrives, including:
  • the microcontroller obtains the command to be executed after the current time slice arrives, and stores the command to be executed in the command queue;
  • the command processing method further includes:
  • the command dispatcher obtains the command to be executed from the command queue, and distributes the command to be executed to the operation unit;
  • the computing unit acquires the command to be executed, and based on the first processing block identifier, executes the processing task corresponding to the command to be executed, including:
  • the computing unit acquires the to-be-executed command, and after receiving the to-be-executed command distributed by the command distributor, executes a processing task corresponding to the to-be-executed command based on the first processing block identifier.
  • the performing, based on the first processing block identifier, the processing task corresponding to the command to be executed includes: the operation unit, based on the first processing block identifier, performs a The initial processing block to be executed in the current time slice is determined from the processing blocks, and based on the initial processing block, the processing task corresponding to the command to be executed is executed.
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller reads the command to be executed from the target cache corresponding to the current time slice. ;
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller determining whether the command queue is idle;
  • the to-be-executed command is read from the buffer.
  • the buffer includes a circular buffer; the circular buffer has multiple entries; the circular buffer stores commands to be executed in different command streams through different entries;
  • the microcontroller obtains the command to be executed after the current time slice arrives, including:
  • the microcontroller determines a target entry from the buffer based on the command stream corresponding to the current time slice; based on the determined target entry, monitors whether the to-be-executed command is stored in the buffer.
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: the microcontroller determining whether there is a command to be executed in the target cache corresponding to the current time slice;
  • the command to be executed is acquired from the host.
  • the microcontroller obtains the command to be executed after the current time slice arrives, including: a situation where the command to be executed exists in the target cache corresponding to the current time slice of the microcontroller. Next, read the command to be executed from the target cache.
  • it also includes: after the current time slice ends, the computing unit reports the second processing block identifier corresponding to the most recently executed processing block to the microcontroller;
  • the microcontroller After receiving the second processing block identifier reported by the computing unit, the microcontroller updates the command to be executed in the command queue based on the second processing block identifier.
  • the computing unit reports the second processing block identifier corresponding to the most recently executed processing block to the microcontroller, including: After the processing task corresponding to the executed processing block is executed, the currently executing processing block is regarded as the most recently executed processing block, and the first processing block corresponding to the recently executed processing block is reported to the microcontroller. Two processing block identification.
  • the method further includes: the computing unit sending the second processing block identifier to the command distributor;
  • the command distributor sends the second processing block identification to the microcontroller.
  • the updating the command to be executed in the command queue based on the second processing block identifier includes: the microcontroller is marked as the second processing block as the second processing block identifier. In the case of the processing block identifier of the last processing block in the plurality of processing blocks, the command to be executed is deleted from the command queue;
  • the target processing block identifier is determined based on the second processing block identifier, and is added to the command to be executed The first processing block identification is replaced with the target processing block identification, and a new command to be executed is generated;
  • the target processing block identifier is the processing block identifier corresponding to the next processing block of the most recently executed processing block.
  • the microcontroller after generating the new to-be-executed command, stores the new to-be-executed command into a target cache corresponding to the to-be-executed command.
  • a flowchart of a command processing method provided by another embodiment of the present disclosure includes:
  • the computing unit In response to the end of the current time slice, the computing unit reports the first processing block identifier of the current processing block of the target command executed in the current time slice to the microcontroller; wherein, the current processing block is in at least one processing block in the target command any processing block;
  • the microcontroller After receiving the first processing block identifier reported by the computing unit, the microcontroller uses the first processing block identifier to update the target command.
  • Embodiments of the present disclosure further provide an electronic device, including a host, a buffer, and a command processing device; the host is used to issue a command to be executed, and the buffer is used to store the command to be executed; the command processing device It is used to execute the method described in any embodiment of the command processing method of the present disclosure.
  • the command processing apparatus provided by the embodiments of the present disclosure may include a chip, an AI chip, and the like.
  • the electronic devices provided by the embodiments of the present disclosure may include smart terminals such as mobile phones, or may also be other devices, servers, etc. that have a camera and can perform image processing, which are not limited here.
  • An embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a microcontroller and an arithmetic unit, the method described in any of the command processing method embodiments of the present disclosure is implemented.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • the embodiment of the present disclosure also provides a command processing method corresponding to the command processing apparatus. Reference may be made to the implementation of the device, and repeated details will not be repeated.
  • Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program codes, and the commands included in the program codes can be used to execute the steps of the command processing methods described in the foregoing method embodiments. For details, please refer to the foregoing methods. The embodiments are not repeated here.
  • the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the computer software products are stored in a storage medium, including Several commands are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本公开提供了一种命令处理装置、方法、电子设备以及计算机可读存储介质,其中,该装置包括:微控制器、及运算单元;所述微控制器,用于当前时间片到达后获取待执行命令;所述待执行命令中携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;所述运算单元,用于获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务。利用这种命令处理装置,可以提高命令处理的效率。

Description

命令处理装置、方法、电子设备以及计算机可读存储介质
相关申请的交叉引用
本申请要求在2021年1月29日提交中国专利局、申请号为202110130200.9、发明名称为“命令处理装置、方法、电子设备以及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机科学技术领域,具体而言,涉及一种命令处理装置、方法、电子设备以及计算机可读存储介质。
背景技术
在计算机科学领域中,通过虚拟化的方法,可以将物理资源转变为逻辑上可管理的资源,以提高服务器的物理资源利用率。当前,在通过部署图像处理器(Graphics Processing Unit,GPU)或者人工智能(Artificial Intelligence,AI)芯片实现虚拟化时,通常在多进程间采用时分复用的方法实现多进程中命令的同步执行。当前的命令处理方式存在处理效率低的问题。
发明内容
本公开实施例至少提供一种命令处理装置、方法、电子设备以及计算机可读存储介质。
第一方面,本公开实施例提供了一种命令处理装置,包括:微控制器、及运算单元;其中,所述微控制器,用于当前时间片到达后获取待执行命令;所述待执行命令中携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;所述运算单元,用于获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
这样,通过将命令对应的多个处理操作划分为多个处理块,可以在任一待执行命令对应的时间片内,基于该待执行命令在上一个对应的时间片内的处理结果确定的第一处理块标识,确定当前时间片内要执行的起始处理块,并基于起始处理块执行待执行命令的处理任务。由于多个处理操作被划分为较细粒度的多个处理块,因此利用时分复用的方式进行上下文切换时,可以减少由于待执行命令未处理完毕导致的等待时间,从而可以提升对命令处理的效率。
一种可选的实施方式中,所述处理装置还包括命令分发器;所述微控制器,用于当前时间片到达后获取待执行命令,并将所述待执行命令存储至命令队列中;所述命令分发器,用于从所述命令队列中获取所述待执行命令,并向所述运算单元分发所述待执行命令;所述运算单元,用于获取所述待执行命令,并在接收到所述命令分发器分发的待执行命令后,基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
一种可选的实施方式中,所述运算单元用于:基于所述第一处理块标识,从所述多个处理块中确定在所述当前时间片要执行的起始处理块,并基于所述起始处理块,执行所述待执行命令对应的处理任务。
这样,由于第一处理块标识可以指示待执行命令中包括的多个处理块中还未被处理的第一个处理块,因此利用第一处理块标识,运算单元可以直接关注需要进行处理的第一个处理块,而不需要再此对待执行命令中的多个处理块进行检索以确定需要处理的处理块,从而可以有效的提升命令处理的效率。
一种可选的实施方式中,所述微控制器用于:从与当前时间片对应的目标缓存中,读取所述待执行命令;或者,从主机获取所述待执行命令。
这样,微控制器可以灵活的依据不同的时间片、以及不同待执行命令的执行情况,从目标缓存或者主机获取待执行命令。
一种可选的实施方式中,当从主机获取待执行命令时,所述微控制器用于:确定所述命令队列是否空闲;在所述命令队列空闲的情况下,监听缓冲器中是否存储有所述待执行命令;在监听到所述缓冲器中存在所述待执行命令的情况下,从所述缓冲器中读取所述待执行命令。
这样,通过微控制器确定命令队列是否空闲、以及在命令队列空闲时缓冲器中是否存在待执行命令,可以在缓冲器不断接收到待执行命令的情况下,将待执行命令不间断的下发至命令队列中,以使得命令处理装置能够保持高效率地对不断对新下发的待执行命令进行处理。
一种可选的实施方式中,所述缓冲器包括环形缓冲器;所述环形缓冲器有多个入口;所述环形缓冲器通过不同入口存储不同命令流的待执行命令;所述微控制器用于:基于当前时间片对应的命令流,从所述缓冲器中确定目标入口;基于确定的所述目标入口,监听所述缓冲器中是否存储有所述待执行命令。
这样,利用与命令流对应的目标入口,在环形缓冲器上存储不同命令流对应的命令,从而可以使得微控制器同步监听环形缓冲器的多个入口,能够在一个处理周期内将不同命令流中的命令下拉下来,提升命令获取的效率。同时,环形缓冲器可以向通信程序提供对缓冲区的互斥访问,有利于在使用时规避存储队列在频繁的命令分配时增加的系统开销。
一种可选的实施方式中,所述微控制器用于:确定与当前时间片对应的目标缓存中是否存在待执行命令;在所述当前时间片对应的目标缓存中未存在待执行命令的情况下,从所述主机获取待执行命令。
这样,在将目标缓存中的待执行命令执行完毕后,可以从主机中获取新的待执行命令,以使命令处理装置保持动态的工作,可以更流畅地不断的对新下发待执行命令进行处理,从而使得命令处理装置在对命令进行处理时的效率更高。
一种可选的实施方式中,所述微控制器,还用于:在所述当前时间片对应的目标缓存中存在所述待执行命令的情况下,从所述目标缓存中读取所述待执行命令。
这样,由于目标缓存中存储了未执行完毕的待执行命令,因此微控制器在检测到当前时间片对应的目标缓存中存在待执行命令的情况下,可以从目标缓存中将待执行命令读取出,从而继续完成对此待执行命令的处理,以使得命令处理装置可以根据待处理命令的先后顺序进行有序的处理,并且可以避免同一用户对应的未执行完毕的待执行命令的积压造成目标缓存的过载。
一种可选的实施方式中,所述运算单元,还用于在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识;所述微控制器,还用于在接收到所述运算单元上报的所述第二处理块标识后,基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新。
一种可选的实施方式中,所述运算单元,用于:将当前正在执行的处理块对应的处理任务执行完毕后,将所述当前正在执行的处理块作为所述最近执行完的处理块,并向所述微控制器上报所述最近执行完的处理块对应的第二处理块标识。
这样,由于运算单元在当前时间片中对待执行命令进行处理后,可以较为迅速的确定最近执行完的处理块对应的第二处理块标识,因此由运算单元向微控制器上报第二处理块标识更便捷。同时,微控制器在接收到运算单元上报的第二处理块标识后对命令队列中的待执行命令进行更新,可以使得此更新后的待执行命令在下一个对应的时间片内进行处理时,可以直接根据更新后的待执行命令进行处理。
一种可选的实施方式中,所述运算单元用于:将所述第二处理块标识发送给所述命令分发器;所述命令分发器还用于向所述微控制器发送所述第二处理块标识。
这样,利用运算单元和命令分发器之间的通信通道、以及命令分发器以及微控制器之间的通信通道,运算单元可以完成向微控制器发送第二处理块标识,从而可以不建立运算单元以及微控制器之间的通信通道,减少对微控制器的数据接口的占用。
一种可选的实施方式中,所述微控制器用于:在所述第二处理块标识为所述多个处理块中最后一个处理块的处理块标识的情况下,将所述待执行命令从所述命令队列中删除;在所述第二处理块标识并非所述多个处理块中最后一个处理块对应的处理块标识的情况下,基于所述第二处理块标识,确定目标处理块标识,并将待执行命令中的第一处理块标识替换为所述目标处理块标识,生成新的待执行命令;其中,所述目标处理块标识为所述最近执行完的处理块的下一处理块对应的处理块标识。
这样,微控制器可以根据第二处理块标识更准确、且更容易地获知待执行命令的处理情况。微控制器在依据第二处理块标识为多个处理块中最后一个处理块的处理块标识的情况下,可以判断待执行命令已经处理完毕,并将其从命令队列中删除,可以有效地避免可能出现的待执行命令在命令队列中被重复处理的运行错误的情况。
一种可选的实施方式中,所述微控制器,还用于在生成所述新的待执行命令后,将所述新的待执行命令存储至与所述待执行命令对应的目标缓存中。
这样,将新的待执行命令可以存放至目标缓存中,无需再为新的待执行命令生成对应的处理方 案,而可以沿用对任一在目标缓存中的待执行命令处理的方法对新的待执行命令进行处理,使得微控制器的工作更加的简洁、稳定,从而使得命令处理装置在进行命令处理时更加稳定,减少出现运行错误的概率。
第二方面,本公开实施例还提供了另外一种命令处理装置,包括:微控制器、及运算单元;所述运算单元,用于响应于当前时间片结束,向所述微控制器上报在该当前时间片执行的目标命令的当前处理块的第一处理块标识;其中,所述当前处理块为所述目标命令中至少一个处理块中任一处理块;所述微控制器,用于接收到所述运算单元上报的第一处理块标识后,利用所述第一处理块标识更新所述目标命令。
第三方面,本公开实施例还提供一种命令处理方法,应用于命令处理装置,所述命令处理装置包括:微控制器、及运算单元;所述命令处理方法包括:所述微控制器在当前时间片到达后获取待执行命令;所述待执行命令中携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;所述运算单元获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
一种可选的实施方式中,所述命令处理装置还包括命令分发器;所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器在当前时间片到达后获取待执行命令,并将所述待执行命令存储至命令队列中;所述命令处理方法还包括:所述命令分发器从所述命令队列中获取所述待执行命令,并向所述运算单元分发所述待执行命令;所述运算单元获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务,包括:所述运算单元获取所述待执行命令,并在接收到所述命令分发器分发的待执行命令后,基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
一种可选的实施方式中,所述基于所述第一处理块标识,执行所述待执行命令对应的处理任务,包括:所述运算单元基于所述第一处理块标识,从所述多个处理块中确定在所述当前时间片要执行的起始处理块,并基于所述起始处理块,执行所述待执行命令对应的处理任务。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器从与当前时间片对应的目标缓存中,读取所述待执行命令;或者,从主机获取所述待执行命令。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器确定所述命令队列是否空闲;在所述命令队列空闲的情况下,监听缓冲器中是否存储所述待执行命令;在监听到所述缓冲器中存在所述待执行命令的情况下,从所述缓冲器中读取所述待执行命令。
一种可选的实施方式中,所述缓冲器包括环形缓冲器;所述环形缓冲器有多个入口;所述环形缓冲器通过不同入口存储不同命令流的待执行命令;所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器基于当前时间片对应的命令流,从所述缓冲器中确定目标入口;基于确定的所述目标入口,监听所述缓冲器中是否存储有所述待执行命令。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器确定与当前时间片对应的目标缓存中是否存在待执行命令;在所述当前时间片对应的目标缓存中未存在待执行命令的情况下,从所述主机获取待执行命令。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器所述当前时间片对应的目标缓存中存在所述待执行命令的情况下,从所述目标缓存中读取所述待执行命令。
一种可选的实施方式中,还包括:所述运算单元在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识;所述微控制器在接收到所述运算单元上报的所述第二处理块标识后,基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新。
一种可选的实施方式中,所述运算单元在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识,包括:所述运算单元将当前正在执行的处理块对应的处理任务执行完毕后,将所述当前正在执行的处理块作为所述最近执行完的处理块,并向所述微控制器上报所述最近执行完的处理块对应的第二处理块标识。
一种可选的实施方式中,还包括:所述运算单元将所述第二处理块标识发送给所述命令分发器;所述命令分发器向所述微控制器发送所述第二处理块标识。
一种可选的实施方式中,所述基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新,包括:所述微控制器在所述第二处理块标识为所述多个处理块中最后一个处理块的处理块标识的情况下,将所述待执行命令从所述命令队列中删除;在所述第二处理块标识并非所述多个处理块中最后一个处理块对应的处理块标识的情况下,基于所述第二处理块标识,确定目标处理块标识,并将待执行命令中的第一处理块标识替换为所述目标处理块标识,生成新的待执行命令;其中,所述目标处理块标识为所述最近执行完的处理块的下一处理块对应的处理块标识。
一种可选的实施方式中,所述微控制器在生成所述新的待执行命令后,将所述新的待执行命令存储至与所述待执行命令对应的目标缓存中。
第四方面,本公开实施例还提供一种命令处理方法,应用于命令处理装置,所述命令处理装置包括:微控制器、及运算单元;所述命令处理方法报包括:所述运算单元响应于当前时间片结束,向所述微控制器上报在该当前时间片执行的目标命令的当前处理块的第一处理块标识;其中,所述当前处理块为所述目标命令中至少一个处理块中任一处理块;所述微控制器接收到所述运算单元上报的第一处理块标识后,利用所述第一处理块标识更新所述目标命令。
第五方面,本公开实施例还提供一种电子设备,包括主机、缓冲器,以及命令处理装置;所述主机用于下发待执行命令,所述缓冲器用于存储所述待执行命令;所述命令处理装置用于执行上述第三方面或第四方面任一实施方式所述的方法。
第六方面,本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被微控制器、运算单元执行时实现上述第三方面或第四方面任一实施方式所述的方法。
关于上述命令处理方法的效果描述参见上述命令装置的说明,这里不再赘述。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种命令处理装置的示意图;
图2a和图2b分别为本公开实施例提供的命令队列的状态示意图;
图3示出了本公开另一实施例所提供的一种命令处理装置的示意图;
图4示出了本公开实施例所提供的一种命令处理方法的流程图;
图5示出了本公开另一实施例所提供的一种命令处理方法的流程图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
经研究发现,在利用时分复用的方式对不同用户对应的命令流中的命令进行处理时,通常会设置时间片,在当前的时间片内执行一个用户对应的命令流中的命令,并在此时间片结束后,进行不同用户的上下文的切换,也即从当前用户切换至下一用户,下一个时间片执行下一个用户对应的命令流中的命令。由于时间片结束时,可能存在当前用户对应的命令正在执行的情况,因此在进行上下文的切换时,需要等待正在执行的命令执行完毕才能切换至下一个用户,导致在上下文切换时存在较长时间的时延,使得下一用户对应的命令需要等待较长时间,才能够被处理,造成了对命令处理存在效率低的问题。
基于上述研究,本公开提供了一种命令处理装置、方法、电子设备以及计算机可读存储介质,通过将命令对应的多个处理操作划分为多个处理块,当轮到某个待执行命令被执行时,是按照对其 执行的上一执行周期时确定的第一处理块标识,确定当前执行周期要执行的起始处理块,并基于起始处理块,执行待执行命令的处理任务,从而将命令进行更细粒度的划分,并在时分复用时,按照更细粒度进行时分复用,减少上下文切换需要的等待时间,提升对命令处理的效率。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种命令处理方法进行详细介绍。
本公开实施例提供的命令处理装置,可以应用于GPU、人工智能芯片,或者其他包括命令处理器、执行单元的命令处理装置。
下面以将本公开实施例提供的命令处理装置应用于GPU为例对本公开实施例提供的命令处理装置加以说明。
参见图1所示,为本公开实施例提供的命令处理装置的结构示意图,包括:微控制器10、以及运算单元20;其中,
微控制器10,用于当前时间片到达后获取待执行命令;所述待执行命令中携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;
运算单元20,用于获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
本公开实施例中,微控制器10会在当前时间片到达后获取待执行命令;在该待执行命令中,携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识。运算单元20在接收到待执行命令后,能够基于待执行命令中携带的第一处理块标识,执行所述待执行命令对应的处理任务。在该过程中,将命令对应的多个处理操作划分为多个处理块,并且用第一处理块标识指示要执行的处理块,从而在执行待执行命令的处理任务中,可以将命令进行更细粒度的划分,并在时分复用时,按照更细粒度进行时分复用,减少上下文切换需要的等待时间,提升对命令处理的效率。在一个实施例中,多个处理块的划分可以参见统一计算设备架构(CUDA,Compute Unified Device Architecture)中线程块的划分,处理块可以相当于线程块。
本公开另一实施例中,还包括命令分发器30;微控制器10,在获取了待执行命令后,还可以将待执行命令存储至命令队列中。命令分发器30从所述命令队列中获取所述待执行命令,并向所述运算单元分发所述待执行命令。此处,当轮到某个待执行命令被执行时,命令分发器30从命令队列中获取待执行命令,并向运算单元分发该待执行命令,运算单元基于待执行命令中的第一处理块标识执行该待执行命令。本公开实施例中,命令分发器30可由软件实现(例如,运行在微控制器10中)或者硬件实现。在本公开实施例中,运算单元20可由硬件,例如图形处理集群(Graphics Processing  Cluster,可简称为GPC)或流式多处理器(streamingMultiproeessor,可简称为SM)实现。
下面分别对微控制器10、运算单元20、以及命令分发器30加以详细描述。
本公开实施例中的用户例如可以包括下述任一种:虚拟机、计算机容器(container)、应用程序、或者应用程序中的不同功能。
以用户为应用程序为例,应用程序在被运行时可以生成一主进程,以及多个用于执行不同处理任务的子进程;每一个进程在执行对应的处理任务时,都会生成对应的命令流。进而,一个应用程序在被运行时会生成多个命令流;每个命令流中包括至少一个命令;本公开实施例提供的命令处理装置基于时分复用技术,对不同应用程序的不同命令流进行分时间片的处理。
以用户为应用程序中的不同功能为例,应用程序在被运行时可以生成多个进程;每个进程又可以包括多个线程;每个线程能够执行对应的处理任务,生成与线程对应的命令流;进而,可以将应用程序的不同功能分别作为不同用户,利用本公开实施提供的命令处理装置基于时分复用技术,对相同应用程序中的不同功能进行分时间片的处理。
以用户为虚拟机为例,虚拟机的应用层可以同时存在一个或者多个命令流(command stream,可简称为stream);每个stream中包括至少一个命令,stream中的命令例如从主机host被下发至与用户对应的缓冲器中。
以一个用户(也即一台虚拟机,例如可以表示为VM1)为例,在用户VM1中运行一个软件时,可以运行此软件的一个或者多个软件功能;用户在仅运行其中的一个软件功能时,host可以为运行此软件功能生成一线程,该线程执行处理任务时,会生成命令流stream;用户的CPU将该stream中的命令存储至与该用户VM1对应的缓冲器中。
在具体实施中,虚拟机VM1对应的一个软件功能例如可以包括实现对图像中目标对象的目标识别,或者实现对图像中目标对象的位姿确定等图像处理功能,或者将图像进行存储或者调用的一般数据处理任务。虚拟机具体实现的软件功能可以根据实际情况确定,在此不再赘述。
待执行命令为命令流中的任一命令,以虚拟机VM1为例,虚拟机VM1可以对不同的多张图像,例如Pic1、Pic2等进行图像处理。此时,虚拟机VM1对应的多个命令流stream中的一个命令流例如可以包括对图像Pic1进行图像处理的多个待执行命令,例如可以包括对图像Pic1进行卷积运算、池化运算、全连接运算等处理。
待执行命令包括:用于指向操作数的地址信息;用于指示具体算子(kernel)的处理任务的指示信息、以及用于指示与待执行命令对应的第一处理块标识。
其中,运算单元20在执行待执行命令时,可以利用地址信息获取操作数,并按照指示信息,对操作数进行处理。在对操作数进行处理时,处理的过程可以被分解为多个处理操作。不同处理操作对应操作数中的不同数据(如图像中的不同像素点);或者,不同处理操作包括对操作数的不同操作过程(如先将像素点的像素值与权重相乘,然后对不同像素点与权重的乘积结果进行求和)。
将一个待执行命令中包括的多个处理操作划分为多个处理块(Block),在一种实施方式中,待执行命令会通过线程块执行,每个线程块包括多个线程,执行了所有处理块中的处理操作后,即完成对待执行命令的处理。
在待执行命令中包括的第一处理块标识,用于指向待执行命令中还未处理的处理块中的任一个,在一种实施方式中,对于执行过程对处理块的执行顺序没有要求的,可以使第一处理块标识指向待执行命令中还未处理的处理块中的任一个。对于执行过程对处理块的执行顺序有要求的,可以使第一处理块标识指向待执行命令中还未处理的处理块中的首个需要执行的处理块。
示例性的,待执行命令K1例如可以包括对图像Pic1进行卷积运算;在Pic1的尺寸为128*128的情况下,可以分别将对图像Pic1中的每个像素点进行卷积运算的运算处理作为一个处理操作,也即对于待执行命令K1,包含有128*128个处理操作,其中,每32*32个处理操作被划分为一个处理块,也即,该待执行命令包括4*4个处理块。可以对16个处理块顺序编号,例如分别编号为:0~15。在一种实施方式中,每32*32个处理操作被分配给一个线程块,由该线程块中的线程执行操作,也即,该待执行命令被分配给4*4个线程块执行操作。
在对该待执行命令K1进行处理的第一个时间片内,微控制器10可以从主机下发的命令流中获取待执行命令;此时,该待执行命令中的第一处理块标识可以为缺省状态,表征当前需要从第1个处理块开始,也即,从编号为0的处理块对待执行命令K1进行处理;
在对该待执行命令K1进行处理除第一个时间片之外的其他时间片内,由于已经对该待执行命令K1中的部分处理块进行了处理,因此,此时待执行命令K1对应的第一处理块标识,指向待执行命令中还未处理的处理块。例如,若在当前时间片之前已经对编号为0~12的处理块处理完毕,则待执行命令K1在当前时间片内对应的第一处理块标识为13。
此处,不同虚拟机对应的处理任务可以相同,也可以不同;对于同一虚拟机在对不同图像进行相同的处理任务时,可不对图像的像素大小等的图像属性做出一致性的限定,也即不同的图像Pic1、Pic2等可以是不同的像素大小。
微控制器10在获取待执行命令时,用于从与当前时间片对应的目标缓存中读取待执行命令,或者,从主机获取待执行命令。具体的,微控制器10获取待执行命令的方法包括下述(a1)以及(a2):
(a1):对于微控制器从主机获取待执行命令的情况,微控制器10用于确定命令队列是否空闲。在命令队列空闲的情况下,微控制器10对缓冲器进行监听,并基于当前时间片对应的命令流,从缓冲器中确定目标入口;基于确定的目标入口,监听缓冲器中是否存储有待执行命令。
其中,命令队列有多个,每个命令队列对应一命令流。不同用户的不同命令流,可以在不同时间片段共用一个命令队列。
此处,命令队列可以为软件队列,也可以为硬件队列;在命令队列为软件队列的情况下,微控制器10可以控制命令队列的生成与删除;微控制器10在获取待执行命令时,能够确定用户对应的命令流的数量;在当前已经生成的命令队列的数量小于用户对应的命令流的数量的情况下,微控制 器10生成新的命令队列,使得命令队列的数量大于或者等于该用户命令流的数量;若某个命令队列长时间没有被使用,则微控制器10可以删除该命令队列。
示例性的,在虚拟机VM1以及VM2对应的命令流最大数量相同时,例如在虚拟机VM1以及VM2对应的命令流最大数量均为6的情况下,则可以确定命令队列有6个。
在虚拟机VM1以及VM2对应的命令流最大数量不同时,例如在虚拟机VM1对应的命令流最大数量为10,虚拟机VM2对应的命令流最大数量为6的情况下,则可以根据虚拟机VM1以及VM2对应的命令流最大数量的比较结果,确定命令队列有10个。此时,对于虚拟机VM2,对应的命令存在只占用6个命令队列的情况,其余的命令队列在虚拟机VM2对应的时间片内空置即可,微控制器10例如可以不对空置的命令队列进行检测。其中,时间片例如可以包括在向不同虚拟机分配的对命令进行处理的时间。
在命令队列为硬件队列的情况下,微控制器10可以根据用户的命令流的数量,为用户确定在处理该用户的命令时所使用的命令队列。
在将虚拟机对应的命令流存储至命令队列中时,以虚拟机VM1为例,在命令流包括对图像Pic1进行图像处理时,在虚拟机VM1对应的时间片中,命令流对应的命令队列存放有与该虚拟机VM1对应的命令流。例如包括K1以及K2。由于虚拟机VM1可以对应多个命令流,因此虚拟机将命令流存储至对应的命令队列中。
示例性的,当微处理器确定命令队列是否空闲时,例如可以利用多个命令队列分别对应的读指针和写指针确定,当任一命令队列中的读指针和写指针指示同一位置时,命令队列空闲。
在命令队列空闲的情况下,认为从与空闲的命令队列对应的任务流中获取的待执行命令已经被执行完毕,需要重新从任务流中获取待执行命令。
在命令队列非空闲的情况下,认为从与命令队列对应的任务流中获取的待执行命令还未被执行完毕,需要将已经获取的待执行命令处理完后,再从主机获取待执行命令。
在从主机获取待执行命令时,例如可以从缓冲器中获取待执行命令。其中,缓冲器中用于存储主机下发的命令。
参见图2a和图2b所示,分别为本公开实施例提供的命令队列的状态示意图。在图2a中,在命令队列21空闲的情况下,等待接收主机22向缓冲器23中下发的待执行命令,并由微控制器10将缓冲器23中接收到的待执行命令下发至命令队列21。命令队列21中的虚线框211表示命令队列中空闲。在图2b中,在命令队列24非空闲的情况下,例如包含实线框241表示待处理命令,则可以先将该待处理命令处理完毕后,再接收主机22下发的新待处理命令。
此处,缓冲器例如可以包括环形缓冲器(Ring Buffer)或者存储队列。由于环形缓冲器是是一个先进先出的循环缓冲区,可以向通信程序提供对缓冲区的互斥访问,并且在使用时可以规避存储队列在频繁的命令分配时增加的系统开销,因此在本公开实施例中选用环形缓冲器作为命令队列。
对于任一环形缓冲器,有多个入口,用于接收从主机(host)下发的不同的命令流stream的待执行命令。例如,主机对虚拟机VM1下发对一组图像Pic1、Pic2分别进行目标识别处理的两个命令流,例如可以表示为s1以及s2。此时,以命令流s1为例,命令流s1包括对图像Pic1进行目标识别处理的命令K1以及K2,具体对命令K1以及K2的描述已在上文做出详细描述,在此不再重复。
此处,例如可以通过下述方式向缓冲器中下发命令:确定用户对应的至少一个命令流;基于至少一个命令流中的每个待执行命令,确定每个线程对应的至少一个待执行命令;利用缓冲器中与每个命令流对应的存储入口,将至少一个命令流中的每个命令流分别对应的至少一个待执行命令存储至缓冲器中。
在缓冲器中存储待执行命令时,例如可以利用缓冲器对应的读指针根据预先设置的多个目标入口进行查询,并在寻址到与待执行命令对应的目标入口处,利用缓冲器对应的写指针写入主机下发的待执行命令;或者,可以利用缓冲器对应的读指针对缓冲器中的多个命令存储空间进行轮询处理,并将可存放待执行命令的命令存储空间对应的入口作为目标入口,然后利用缓冲器对应的写指针写入待执行命令。在缓冲器中存储待执行命令的具体方法可以根据实际情况确定,在此不再赘述。
微控制器10在对缓冲器进行监听时,由于主机不断在向缓冲器中下发新的命令,并且运算单元20在不断地处理下发至命令队列(Stream Queue)中的缓冲器中的命令,因此微控制器10在对缓冲器进行监听时,可能存在利用缓冲器的读指针在目标入口的位置轮询时没有相应的待执行命令。此时,微控制器10继续对缓冲器进行监听,直至监听到对应位置的待执行命令,然后,从缓冲器中读取待执行命令。
(a2):对于微控制器10从与当前时间片对应的目标缓存中读取待执行命令的情况,微控制器10用于从目标缓存中读取待执行命令。
其中,目标缓存例如可以包括设备内存或者双倍速率同步动态随机存储器(Synchronous Dynamic Random Access Memory,DDR SDRAM),用于存放在非当前时间片对应的用户的待执行命令,或者,存放由微控制器10从命令队列中读取的待执行命令。
此处,由于微控制器10会在不同的时间片,将不同用户的待执行命令传输至命令队列,实现不同用户对命令处理装置的时分复用,因此,对于在一个时间片内未被执行完的待执行命令,微控制器10会在执行该待执行命令的时间片结束后,将其暂时存储在对应的目标缓存中;在下一个对其进行处理的时间片到达后,会将目标缓存中的待执行命令重新传输至命令队列,使得命令分发器30能够将其重新分发至运算单元20进行后续的处理。
具体地,微控制器10在获取待执行命令时,用于确定与当前时间片对应的目标缓存中是否存在待执行命令。以存在两个虚拟机VM1以及VM2为例,微控制器10在确定与当前时间片对应的目标缓存中是否存在待执行命令时,包括下述(b1)或(b2)两种情况:
(b1):在当前时间片对应的目标缓存中存在待执行命令的情况下,将目标缓存中的待执行命令传输至命令队列。
实施时,可以将当前时间片的上一时间片执行结束时的上下文保存至该上一时间片对应的目标缓存,该上下文信息包括执行结束时执行的最后一个处理块对应的处理块标识等,并获取该当前时间片对应的目标缓存中保存的上下文信息,并在该当前时间片继续执行。
示例性的,若将第i(i为正整数)个时间片分配给虚拟机VM1,将第i+1个时间片分配给虚拟机VM2,并将第i+2个时间片也分配给虚拟机VM1,在第i个时间片中,虚拟机VM1对应的命令A未执行完毕的情况下,虚拟机VM1对应的命令A会暂存在与命令A对应的目标缓存中;等待第i+1个时间片结束后,微控制器10会在第i+2个时间片将该命令A从对应的目标缓存中下拉至 命令队列中。具体的处理过程在此不再赘述。
(b2):在当前时间片对应的目标缓存中未存在待执行命令的情况下,从主机获取待执行命令。
具体地,在任一时间片结束后,若命令队列中的命令处理完毕,则不存在需要放置在目标缓存中等待下一对应时间片进行处理的命令,在对应的用户的下一个对应的时间片到来时,从主机获取新下发的待执行命令即可。
示例性的,仍以用户为虚拟机为例,若将第i(i为正整数)个时间片分配给虚拟机VM1,第i+1个时间片分配给虚拟机VM2,并将第i+2个时间片也分配给虚拟机VM1;在第i个时间片中,若将虚拟机VM1对应的命令A执行完毕,等待第i+1个时间片结束后,微控制器10会在第i+2个时间片检测到与当前时间片对应的目标缓存中未存在待执行命令,则微控制器10从主机中获取新下发的待执行命令。针对从主机获取的待执行指令,可以存储至命令队列,并在执行中将该待执行指令对应的多个处理操作划分为多个处理块,按照本申请实施例提供的处理方式进行处理。
此外,对于虚拟机有多台的情况,例如包括N(N为大于1的正整数)台虚拟机,则根据需要分配对应的目标缓存,在一个实施例中至多需要分配N个对应的目标缓存,以在任一时间片对应的虚拟机未将待执行命令执行完毕的情况下,在下一个不与此虚拟机对应的时间片内,将该虚拟机对应的未执行完毕的待执行命令存储至与此虚拟机对应的目标缓存中。
其中,对于目标缓存包括设备内存的情况,由于设备内存的数据为单向传输的,也即在一次数据传输的过程中,数据传输通道仅允许单方向的数据进行传输,因此可以为N台虚拟机分配N个内存单元以放置不同的虚拟机对应的未执行完毕的待执行命令;对于目标缓存包括双倍速率同步动态随机存储器的情况,由于双倍速率同步动态随机存储器的数据为双向传输的,也即在一次数据传输的过程中,数据传输通道允许双向的数据进行传输,例如直接存储器访问(Direct Memory Access,DMA),因此可以存在两台虚拟机共用一个目标缓存的情况,例如还可以为两台虚拟机设置同一目标缓存。具体的设置方法在此不再赘述。
在微控制器将待执行命令存储至命令队列中后,命令分发器30即可以从命令队列中获取到待执行命令,并向运算单元20分发待执行命令。
其中,对于待执行命令,携带了用于指示与待执行命令对应的第一处理块标识。运算单元20在接收到命令分发器30下发的待执行命令后,能够将该第一处理块标识从待执行命令中解析出来,并将第一处理块标识指示的处理块作为在当前时间片要执行的起始处理块,并基于起始处理块,执行待执行命令对应的处理任务。
示例性的,运算单元20中部署有调度器(scheduler);待执行指令被下发至调度器;调度器从待执行命令中确定第一处理块标识,然后利用第一处理块标识,从待执行命令中对应的多个处理块中确定需要首个执行的起始处理块,然后将处理块标识对应的处理块中各个处理操作作为任务,分配至在运算单元20中运行的各个线程,由各个线程执行处理操作对应的计算任务。
以处理图像任务为例,任一线程在对处理操作进行处理时,在调度器下发的任务中,会携带需要处理的像素点的地址信息,该线程能够基于该地址信息,从存储图像的存储空间获取对应的操作数,然后对操作数进行对应的处理。
在GPU实现的实施例中,每个算子(kernel)可以包括多个线程(thread);硬件执行时,可以将一定数目的线程组合成处理块(block)或线程块,作为最小的调度粒度。
命令分发器30内部的block调度器(scheduler)按顺序调度处理块,当时间片(time slot)结束时,调度器进入停止模式(stop mode),block完成之后,向微控制器(MCU)报告已运行完的处理块标识(block_id);用户上下文(context)切换,再次调用该算子时传入该报告的处理块标识;这样,即实现了处理块级别的切换,上下文切换不再需要等待当前算子完成,只需等待已分发的处理块执行完成,缩短了当前任务的等待时间,提高了上下文切换效率。
示例性的,对于待执行命令K1,在待执行命令K1包含有128*128个处理操作的情况下,将每32*32个处理操作确定为一个处理块,则此时待执行命令K1划分为16个处理块。此时,若运算单元20对待执行命令K1的总处理时间为0.16s,对每个处理块的处理时间为0.01s,在一个时间片为0.1s的情况下,利用处理块可以在0.1s时完成对待执行命令K1中前10个处理块的处理,然后进行虚拟机之间的上下文切换;
在不对待执行命令K1进行更细粒度的划分的情况下,也即在现有技术中,在一个时间片结束后,还需要等待待执行命令K1处理完毕,也即还需要等待0.06s的时间,运算单元20才会将K1处理完毕;运算单元20在将K1处理完毕后,会向微控制器10上报处理完毕的信息,此时,微控制器10才会进行上下文切换,也即切换至下一个虚拟机,执行下一个虚拟机对应的待执行命令。这样,就会导致对于下一虚拟机的命令处理存在0.06s的时延;在存在多个虚拟机的情况下,在对虚拟机上下文进行切换时,对于距离第一个虚拟机较远的第N个虚拟机,由于第一个虚拟机至第N-1个虚拟机在进行上下文切换时有不断积累的时延,因此第N个虚拟机可能产生卡顿等现象。
本公开实施例中,对一个待执行命令进行更细粒度的划分,以在当前时间片结束后,仅将正在处理的处理块执行完毕后,即可以进行虚拟机的上下文切换,减少由于当前时间片正在处理的待执行命令在当前时间片结束后要继续处理完毕才进行上下文切换导致的时延。
此时,在上述示例中,对于待执行命令K1,对应的第一处理块标识例如可以表示为K1_0、K1_2、……、K1_15,在当前时间片结束时,若运算单元20正在处理标识为K1-5的处理块,则运算单元20会将标识为K1_5的处理块处理完成后,即向微控制器10上报处理完毕的信息,微控制器10进行上下文切换;此时,时延至多为0.01s。
示例性的,运算单元20在向微控制器10上报处理完毕的信息时,会向微控制器10上报最近执行完的处理块对应的第二处理块标识。
运算单元20在向微控制器10上报最近处理完的处理块对应的第二处理块标识时,用于:将当前正在执行的处理块对应的处理任务执行完毕后,将当前正在执行的处理块作为最近执行完的处理块,并向微控制器10上报最近执行完的处理块对应的第二处理块标识。
示例性的,当待执行命令K1未被运算单元20进行处理时,对应的第一处理块标识为K1_0;在一个时间片结束时,运算单元20正在执行待执行命令K1中的第10个处理块。运算单元20会继续将第待执行命令K1中的第10个处理块执行完毕,并将第10个的处理块标识K1_9作为向微控制器10上报的最近执行完的处理块对应的第二处理块标识。
此处,运算单元20在向微控制器10上报第二处理块标识时,可以将第二处理块标识发送给命令分发器;命令分发器向微控制器10发送第二处理块标识。利用这种方式,无需为微控制器10再设置与运算单元20进行数据传输的接口,可以降低接口开销的同时,实现运算单元20与微控制器10之间的硬件隔离。此外,这种方式还可以直接使用已建立的运算单元20与命令分发器之间的数据传输通道、以及命令分发器与微控制器10之间的数据传输通道,进一步提高数据通道的复用率。
微控制器10在接收到运算单元20上报的第二处理块标识后,基于第二处理块标识,对命令队列中的待执行命令进行更新。
此处,微控制器10在对命令队列中的待执行命令进行更新时,在第二处理块标识为多个处理块中最后一个处理块的处理块标识的情况下,将待执行命令从命令队列中删除;在第二处理块标识并非多个处理块中最后一个处理块对应的处理块标识的情况下,基于第二处理块标识,确定目标处理块标识,并将待执行命令中的第一处理块标识替换为目标处理块标识,生成新的待执行命令;
其中,目标处理块标识为最近执行完的处理块的下一处理块对应的处理块标识。
微控制器10在将命令队列中的待执行命令进行更新后,若是将待执行命令中的第一处理块标识替换为目标处理块标识,则将新的待执行命令存储至与该待执行命令对应的目标缓存中。
微控制器10在将新的待执行命令存储至与该待执行命令对应的目标缓存中后,会将下一时间片作为新的当前时间片,并从新的当前时间片对应的目标缓存中读取另一虚拟机对应的待执行命令,或者从主机获取该另一虚拟机对应的待执行命令,从而完成对虚拟机的上下文切换工作。
此外,对于从主机下发的多个命令,还可以是包含有关联的先后执行顺序的多个命令流,在一种实施方式中,待执行命令中携带用于指示于待执行命令对应的多个处理块中下一需要执行的处理块对应的起始处理块标识;在另一种实施方式中,前一个命令的处理结果可以设置为后一个命令的操作数。例如第一个命令流对应的命令包括对图像的目标识别,在第一个命令流对应的命令包括在获得目标识别的结果后,确定识别得到的多个目标对象在图像中的位姿信息。此时,在第二个命令流中,待执行命令对应的处理块中包含的对应图像的像素点的地址可以设置为第一个命令流结束后得到的图像的像素点的地址,也即利用命令处理装置不但可以完成对一个虚拟机对应的多个图 像的处理,也可以完成多个虚拟机对同一图像的连续的多个处理任务。
本公开实施例还提供了一种利用本公开实施例提供的命令处理装置对命令处理的具体过程示例,在该示例中,包括虚拟机VM1和虚拟机VM2;其中,与VM1对应的多个命令流中的一个命令流s1中包括命令K1;与虚拟机VM2对应的多个命令流中的一个命令流s2中包括命令K2。命令K1中包括128*128个处理操作,且128*128个处理操作程被划分为4*4个处理块,各个处理块的处理块标识分别为:K1_0~K1_15。命令K2中包括64*128个处理操作,64*128个处理操作被划分为2*4个处理块,各个处理块的处理块标识分别为K2_0~K2_7。
(1):在第一个时间片t1到达时,微控制器从与VM1对应的缓冲器RBUF1中下拉命令K1,并将命令K1下发至与VM1对应的命令队列SQ1中;此时,K1中携带的第一处理块标识为K1_0。
命令分发器从SQ1中获取命令K1,并将命令K1分发至运算单元;
运算单元,从K1中解析到第一处理块标识K1_0后,将4*4个处理块中的第一个处理块作为初始处理块,并对初始处理块进行处理。
(2):在时间片t1结束,第二个时间片t2开始后,运算单元将K1_0~K1_5处理完毕,并正在执行K1_6。此时,运算单元继续将K1_6处理完毕,并将K1_6作为第二处理块标识上报给命令分发器。
命令分发器将第二处理块标识K1_6上报给微控制器。
微控制器基于第二处理块标识K1_6,确定K1_7为目标处理块标识,并将命令K1中携带的第一处理块标识K1_0替换为K1_7,生成新的命令K1’,并将命令K1’存储至与命令流s1对应的目标缓存中。
微控制器在将命令K1’存储至与命令流s1对应的目标缓存后,从VM2对应的缓冲器RBUF2中下拉命令K2,并将命令K2下发至与VM2对应的命令队列SQ1中;此时,K2中携带的第一处理块标识为K2_0。
命令分发器从SQ1中获取命令K2,并将命令K2分发至运算单元;
运算单元,从K2中解析到第一处理块标识K2_0后,将2*4个处理块中的第一个处理块作为初始处理块,并对初始处理块进行处理。
(3):在时间片t2结束,第三个时间片t3开始后,运算单元将K2_0~K2_3处理完毕,并正在执行K2_4。此时,运算单元继续将K2_4处理完毕,并将K2_4作为第二处理块标识上报给命令分发器。
命令分发器将第二处理块标识K2_4上报给微控制器。
微控制器基于第二处理块标识K2_4,确定K2_5为目标处理块标识,并将命令K2中携带的第一处理块标识K2_0替换为K2_5,生成新的命令K2’,并将命令K2’存储至与命令流s2对应的目标缓存中。
微控制器从命令流s1对应的目标缓存中读取命令K1’;并将命令K1’下发至与VM1对应的命令队列SQ1中;此时,K1’中携带的第一处理块标识为K1_7。
命令分发器从SQ1中获取命令K1’,并将命令K1’分发至运算单元;
运算单元,从K1’中解析到第一处理块标识K1_7后,将4*4个处理块中的第8个处理块作为初始处理块,并对初始处理块进行处理。
(4):在时间片t3结束,第三个时间片t4开始后,运算单元将K1_7~K1_12处理完毕,并正在执行K1_13。此时,运算单元继续将K1_13处理完毕,并将K1_13作为第二处理块标识上报给命令分发器。
命令分发器将第二处理块标识K1_13上报给微控制器。
微控制器基于第二处理块标识K1_13,确定K1_14为目标处理块标识,并将命令K1’中携带的第一处理块标识K1_7替换为K1_14,生成新的命令K1”,并将命令K1”存储至与命令流s1对应的目标缓存中。
微控制器从命令流s2对应的目标缓存中读取命令K2’;并将命令K2’下发至与VM2对应的命令队列SQ1中;此时,K2’中携带的第一处理块标识为K2_5。
命令分发器从SQ1中获取命令K2’,并将命令K2’分发至运算单元;
运算单元,从K2’中解析到第一处理块标识K2_5后,将2*4个处理块中的第6个处理块作为初始处理块,并对初始处理块进行处理。
(5):在时间片t4结束,第三个时间片t5开始后,运算单元将K2_5~K2_7处理完毕。此时,将K2_7作为第二处理块标识上报给命令分发器。
命令分发器将第二处理块标识K2_7上报给微控制器。
微控制器基于第二处理块标识K2_7,确定命令K2被执行完毕,将其从命令队列Q1中删除。
此时,微控制器可以监听命令流s2对应的RBUF2,若有新的命令,则继续下拉至命令队列Q1或者与命令流s2对应的目标缓存中。
(6):在时间片t5结束,第三个时间片t6开始后,微控制器从命令流s1对应的目标缓存中读取命令K1”;并将命令K1”下发至与VM1对应的命令队列SQ1中;此时,K1”中携带的第一处理块标识为K1_14。
命令分发器从SQ1中获取命令K1”,并将命令K1”分发至运算单元;
运算单元,从K1”中解析到第一处理块标识K1_14后,将4*4个处理块中的第15个处理块作为初始处理块,并对初始处理块进行处理。
(7):在时间片t6结束,第三个时间片t7开始后,运算单元将K1_14~K1_15处理完毕。此时,将K1_15作为第二处理块标识上报给命令分发器。
命令分发器将第二处理块标识K1_15上报给微控制器。
微控制器基于第二处理块标识K1_15,确定命令K1被执行完毕,将其从命令队列Q1中删除。
此时,微控制器可以监听命令流s1对应的RBUF1,若有新的命令,则继续下拉至命令队列Q1或者与命令流s1对应的目标缓存中。
通过上述过程,实现VM1和VM2对命令处理装置的时分复用。
参见图3所示,为本公开另一实施例提供的一种命令处理装置的示意图;其中,31表示微控制器;32表示多个命令队列,其中321表示命令队列SQ1,322表示命令队列SQ2;33表示命令分发器;34表示运算单元,其中包括运算单元0、运算单元1、……、以及运算单元n;35表示多个命令流分别对应的目标缓存,其中351表示命令流s1对应的目标缓存,352表示命令流s2对应的目标缓存,针对任一命令流,对应的目标缓存中,存放的命令分别表示为eSQ1、eSQ2、……、以及eSQm。
参见图4所示,为本公开实施例提供的一种命令处理方法的流程图,包括:
S401:微控制器在当前时间片到达后获取待执行命令;待执行命令中携带有用于指示与待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;
S402:运算单元获取待执行命令,并基于第一处理块标识,执行待执行命令对应的处理任务。
一种可选的实施方式中,所述命令处理装置还包括命令分发器;
所述微控制器在当前时间片到达后获取待执行命令,包括:
所述微控制器在当前时间片到达后获取待执行命令,并将所述待执行命令存储至命令队列中;
所述命令处理方法还包括:
所述命令分发器从所述命令队列中获取所述待执行命令,并向所述运算单元分发所述待执 行命令;
所述运算单元获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务,包括:
所述运算单元获取所述待执行命令,并在接收到所述命令分发器分发的待执行命令后,基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
一种可选的实施方式中,所述基于所述第一处理块标识,执行所述待执行命令对应的处理任务,包括:所述运算单元基于所述第一处理块标识,从所述多个处理块中确定在所述当前时间片要执行的起始处理块,并基于所述起始处理块,执行所述待执行命令对应的处理任务。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器从与当前时间片对应的目标缓存中,读取所述待执行命令;
或者,从主机获取所述待执行命令。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器确定所述命令队列是否空闲;
在所述命令队列空闲的情况下,监听缓冲器中是否存储有所述待执行命令;
在监听到所述缓冲器中存在所述待执行命令的情况下,从所述缓冲器中读取所述待执行命令。
一种可选的实施方式中,所述缓冲器包括环形缓冲器;所述环形缓冲器有多个入口;所述环形缓冲器通过不同入口存储不同命令流的待执行命令;
所述微控制器在当前时间片到达后获取待执行命令,包括:
所述微控制器基于当前时间片对应的命令流,从所述缓冲器中确定目标入口;基于确定的所述目标入口,监听所述缓冲器中是否存储有所述待执行命令。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器确定与当前时间片对应的目标缓存中是否存在待执行命令;
在所述当前时间片对应的目标缓存中未存在待执行命令的情况下,从所述主机获取待执行命令。
一种可选的实施方式中,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器所述当前时间片对应的目标缓存中存在所述待执行命令的情况下,从所述目标缓存中读取所述待执行命令。
一种可选的实施方式中,还包括:所述运算单元在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识;
所述微控制器在接收到所述运算单元上报的所述第二处理块标识后,基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新。
一种可选的实施方式中,所述运算单元在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识,包括:所述运算单元将当前正在执行的处理块对应的处理任务执行完毕后,将所述当前正在执行的处理块作为所述最近执行完的处理块,并向所述微控制器上报所述最近执行完的处理块对应的第二处理块标识。
一种可选的实施方式中,还包括:所述运算单元将所述第二处理块标识发送给所述命令分发器;
所述命令分发器向所述微控制器发送所述第二处理块标识。
一种可选的实施方式中,所述基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新,包括:所述微控制器在所述第二处理块标识为所述多个处理块中最后一个处理块的处理块标识的情况下,将所述待执行命令从所述命令队列中删除;
在所述第二处理块标识并非所述多个处理块中最后一个处理块对应的处理块标识的情况下,基于所述第二处理块标识,确定目标处理块标识,并将待执行命令中的第一处理块标识替换为所述 目标处理块标识,生成新的待执行命令;
其中,所述目标处理块标识为所述最近执行完的处理块的下一处理块对应的处理块标识。
一种可选的实施方式中,所述微控制器在生成所述新的待执行命令后,将所述新的待执行命令存储至与所述待执行命令对应的目标缓存中。
参见图5所示,为本公开另一实施例提供的一种命令处理方法的流程图,包括:
S501:运算单元响应于当前时间片结束,向微控制器上报在该当前时间片执行的目标命令的当前处理块的第一处理块标识;其中,当前处理块为目标命令中至少一个处理块中任一处理块;
S502:微控制器接收到运算单元上报的第一处理块标识后,利用第一处理块标识更新目标命令。
本公开实施例还提供一种电子设备,包括主机、缓冲器,以及命令处理装置;所述主机用于下发待执行命令,所述缓冲器用于存储所述待执行命令;所述命令处理装置用于执行本公开任一命令处理方法实施例中所述的方法。
本公开实施例提供的命令处理装置可以包括芯片、AI芯片等。本公开实施例提供的电子设备可以包括手机等智能终端,或者也可以是具有摄像头并可以进行图像处理的其他设备、服务器等,这里并不限制。
本公开实施例提供的一种计算机可读存储介质,其上存储有计算机程序,所述程序被微控制器、运算单元执行时实现本公开任一命令处理方法实施例中所述的方法。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与命令处理装置对应的命令处理方法,由于本公开实施例中的装置解决问题的原理与本公开实施例上述命令处理装置相似,因此方法的实施可以参见装置的实施,重复之处不再赘述。
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的命令可用于执行上述方法实施例中所述的命令处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质 包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (30)

  1. 一种命令处理装置,其特征在于,包括:微控制器、及运算单元;
    其中,所述微控制器,用于当前时间片到达后获取待执行命令;所述待执行命令中携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;
    所述运算单元,用于获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
  2. 根据权利要求1所述的命令处理装置,其特征在于,所述命令处理装置还包括命令分发器;
    所述微控制器,用于当前时间片到达后获取待执行命令,并将所述待执行命令存储至命令队列中;
    所述命令分发器,用于从所述命令队列中获取所述待执行命令,并向所述运算单元分发所述待执行命令;
    所述运算单元,用于获取所述待执行命令,并在接收到所述命令分发器分发的待执行命令后,基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
  3. 根据权利要求1或2所述的命令处理装置,其特征在于,所述运算单元用于:
    基于所述第一处理块标识,从所述多个处理块中确定在所述当前时间片要执行的起始处理块,并
    基于所述起始处理块,执行所述待执行命令对应的处理任务。
  4. 根据权利要求1-3任一项所述的命令处理装置,其特征在于,所述微控制器用于:
    从与当前时间片对应的目标缓存中,读取所述待执行命令;
    或者,从主机获取所述待执行命令。
  5. 根据权利要求1所述的命令处理装置,其特征在于,当从主机获取待执行命令时,所述微控制器用于:
    确定所述命令队列是否空闲;
    在所述命令队列空闲的情况下,监听缓冲器中是否存储有待执行指令;
    在监听到所述缓冲器中存在所述待执行命令的情况下,从所述缓冲器中读取所述待执行命令。
  6. 根据权利要求5所述的命令处理装置,其特征在于,所述缓冲器包括环形缓冲器;所述环形缓冲器有多个入口;所述环形缓冲器通过不同入口存储不同命令流的待执行命令;
    所述微控制器用于:
    基于当前时间片对应的命令流,从所述缓冲器中确定目标入口;
    基于确定的所述目标入口,监听所述缓冲器中是否存储有所述待执行命令。
  7. 根据权利要求1-6任一项所述的命令处理装置,其特征在于,所述微控制器用于:
    确定与当前时间片对应的目标缓存中是否存在待执行命令;
    在所述当前时间片对应的目标缓存中未存在待执行命令的情况下,从所述主机获取待执行命令。
  8. 根据权利要求1-6任一项所述的命令处理装置,其特征在于,所述微控制器,还用于:在所述当前时间片对应的目标缓存中存在所述待执行命令的情况下,从所述目标缓存中读取所述待执行 命令。
  9. 根据权利要求1-8任一项所述的命令处理装置,其特征在于,所述运算单元,还用于在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识;
    所述微控制器,还用于在接收到所述运算单元上报的所述第二处理块标识后,基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新。
  10. 根据权利要求9所述的命令处理装置,其特征在于,所述运算单元用于:
    将当前正在执行的处理块对应的处理任务执行完毕后,将所述当前正在执行的处理块作为所述最近执行完的处理块,并向所述微控制器上报所述最近执行完的处理块对应的第二处理块标识。
  11. 根据权利要求9或10所述的命令处理装置,其特征在于,所述运算单元用于:将所述第二处理块标识发送给命令分发器;
    所述命令分发器还用于向所述微控制器发送所述第二处理块标识。
  12. 根据权利要求9-11任一项所述的命令处理装置,其特征在于,所述微控制器用于:
    在所述第二处理块标识为所述多个处理块中最后一个处理块的处理块标识的情况下,将所述待执行命令从所述命令队列中删除;
    在所述第二处理块标识并非所述多个处理块中最后一个处理块对应的处理块标识的情况下,基于所述第二处理块标识,确定目标处理块标识,并将待执行命令中的第一处理块标识替换为所述目标处理块标识,生成新的待执行命令;
    其中,所述目标处理块标识为所述最近执行完的处理块的下一处理块对应的处理块标识。
  13. 根据权利要求12所述的命令处理装置,其特征在于,所述微控制器,还用于在生成所述新的待执行命令后,将所述新的待执行命令存储至与所述待执行命令对应的目标缓存中。
  14. 一种命令处理装置,其特征在于,包括:微控制器、及运算单元;
    所述运算单元,用于响应于当前时间片结束,向所述微控制器上报在该当前时间片执行的目标命令的当前处理块的第一处理块标识;其中,所述当前处理块为所述目标命令中至少一个处理块中任一处理块;
    所述微控制器,用于接收到所述运算单元上报的第一处理块标识后,利用所述第一处理块标识更新所述目标命令。
  15. 一种命令处理方法,其特征在于,应用于命令处理装置,所述命令处理装置包括:微控制器、及运算单元;所述命令处理方法包括:
    所述微控制器在当前时间片到达后获取待执行命令;所述待执行命令中携带有用于指示与所述待执行命令对应的多个处理块中该当前时间片需要处理的处理块对应的第一处理块标识;
    所述运算单元获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
  16. 根据权利要求15所述的命令处理方法,其特征在于,所述命令处理装置还包括命令分发器;
    所述微控制器在当前时间片到达后获取待执行命令,包括:
    所述微控制器在当前时间片到达后获取待执行命令,并将所述待执行命令存储至命令队列中;
    所述命令处理方法还包括:
    所述命令分发器从所述命令队列中获取所述待执行命令,并向所述运算单元分发所述待执行命令;
    所述运算单元获取所述待执行命令,并基于所述第一处理块标识,执行所述待执行命令对应的处理任务,包括:
    所述运算单元获取所述待执行命令,并在接收到所述命令分发器分发的待执行命令后,基于所述第一处理块标识,执行所述待执行命令对应的处理任务。
  17. 根据权利要求15或16所述的命令处理方法,其特征在于,所述基于所述第一处理块标识,执行所述待执行命令对应的处理任务,包括:所述运算单元基于所述第一处理块标识,从所述多个处理块中确定在所述当前时间片要执行的起始处理块,并基于所述起始处理块,执行所述待执行命令对应的处理任务。
  18. 根据权利要求15-17任一项所述的命令处理方法,其特征在于,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器从与当前时间片对应的目标缓存中,读取所述待执行命令;
    或者,从主机获取所述待执行命令。
  19. 根据权利要求15所述的命令处理方法,其特征在于,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器确定所述命令队列是否空闲;
    在所述命令队列空闲的情况下,监听缓冲器中是否存储有所述待执行命令;
    在监听到所述缓冲器中存在所述待执行命令的情况下,从所述缓冲器中读取所述待执行命令。
  20. 根据权利要求19所述的命令处理方法,其特征在于,所述缓冲器包括环形缓冲器;所述环形缓冲器有多个入口;所述环形缓冲器通过不同入口存储不同命令流的待执行命令;
    所述微控制器在当前时间片到达后获取待执行命令,包括:
    所述微控制器基于当前时间片对应的命令流,从所述缓冲器中确定目标入口;基于确定的所述目标入口,监听所述缓冲器中是否存储有所述待执行命令。
  21. 根据权利要求15-20任一项所述的命令处理方法,其特征在于,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器确定与当前时间片对应的目标缓存中是否存在待执行命令;
    在所述当前时间片对应的目标缓存中未存在待执行命令的情况下,从所述主机获取待执行命令。
  22. 根据权利要求15-20任一项所述的命令处理方法,其特征在于,所述微控制器在当前时间片到达后获取待执行命令,包括:所述微控制器所述当前时间片对应的目标缓存中存在所述待执行命令的情况下,从所述目标缓存中读取所述待执行命令。
  23. 根据权利要求15-22任一项所述的命令处理方法,其特征在于,还包括:所述运算单元在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识;
    所述微控制器在接收到所述运算单元上报的所述第二处理块标识后,基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新。
  24. 根据权利要求23所述的命令处理方法,其特征在于,所述运算单元在当前时间片结束后,向所述微控制器上报最近执行完的处理块对应的第二处理块标识,包括:所述运算单元将当前正在执行的处理块对应的处理任务执行完毕后,将所述当前正在执行的处理块作为所述最近执行完的处理块,并向所述微控制器上报所述最近执行完的处理块对应的第二处理块标识。
  25. 根据权利要求23或24所述的命令处理方法,其特征在于,还包括:所述运算单元将所述第二处理块标识发送给所述命令分发器;
    所述命令分发器向所述微控制器发送所述第二处理块标识。
  26. 根据权利要求23-25任一项所述的命令处理方法,其特征在于,所述基于所述第二处理块标识,对所述命令队列中的待执行命令进行更新,包括:所述微控制器在所述第二处理块标识为所述多个处理块中最后一个处理块的处理块标识的情况下,将所述待执行命令从所述命令队列中删除;
    在所述第二处理块标识并非所述多个处理块中最后一个处理块对应的处理块标识的情况下,基于所述第二处理块标识,确定目标处理块标识,并将待执行命令中的第一处理块标识替换为所述目标处理块标识,生成新的待执行命令;
    其中,所述目标处理块标识为所述最近执行完的处理块的下一处理块对应的处理块标识。
  27. 根据权利要求26所述的命令处理方法,其特征在于,还包括:所述微控制器在生成所述新的待执行命令后,将所述新的待执行命令存储至与所述待执行命令对应的目标缓存中。
  28. 一种命令处理方法,其特征在于,应用于命令处理装置,所述命令处理装置包括:微控制器、及运算单元;
    所述命令处理方法包括:
    所述运算单元响应于当前时间片结束,向所述微控制器上报在该当前时间片执行的目标命令的当前处理块的第一处理块标识;其中,所述当前处理块为所述目标命令中至少一个处理块中任一处理块;
    所述微控制器接收到所述运算单元上报的第一处理块标识后,利用所述第一处理块标识更新所述目标命令。
  29. 一种电子设备,其特征在于,包括主机、缓冲器,以及命令处理装置;
    所述主机用于下发待执行命令;
    所述缓冲器用于存储所述待执行命令;
    所述命令处理装置用于执行权利要求15至28任一项所述的命令处理方法。
  30. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被微控制器、运算单元执行时实现权利要求15至28任一项所述的命令处理方法。
PCT/CN2021/108430 2021-01-29 2021-07-26 命令处理装置、方法、电子设备以及计算机可读存储介质 WO2022160628A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130200.9A CN114816777A (zh) 2021-01-29 2021-01-29 命令处理装置、方法、电子设备以及计算机可读存储介质
CN202110130200.9 2021-01-29

Publications (1)

Publication Number Publication Date
WO2022160628A1 true WO2022160628A1 (zh) 2022-08-04

Family

ID=82526771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108430 WO2022160628A1 (zh) 2021-01-29 2021-07-26 命令处理装置、方法、电子设备以及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114816777A (zh)
WO (1) WO2022160628A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115878521B (zh) * 2023-01-17 2023-07-21 北京象帝先计算技术有限公司 命令处理系统、电子装置及电子设备
CN116521376B (zh) * 2023-06-29 2023-11-21 南京砺算科技有限公司 物理显卡的资源调度方法及装置、存储介质、终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262037A (zh) * 2010-12-13 2013-08-21 超威半导体公司 图形处理计算资源的可访问性
US20180293701A1 (en) * 2017-04-07 2018-10-11 Abhishek R. Appu Apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor
CN110046053A (zh) * 2019-04-19 2019-07-23 上海兆芯集成电路有限公司 用以分配任务的处理系统及其访存方法
CN110083388A (zh) * 2019-04-19 2019-08-02 上海兆芯集成电路有限公司 用于调度的处理系统及其访存方法
CN111708639A (zh) * 2020-06-22 2020-09-25 中国科学技术大学 任务调度系统及方法、存储介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262037A (zh) * 2010-12-13 2013-08-21 超威半导体公司 图形处理计算资源的可访问性
US20180293701A1 (en) * 2017-04-07 2018-10-11 Abhishek R. Appu Apparatus and method for dynamic provisioning, quality of service, and prioritization in a graphics processor
CN110046053A (zh) * 2019-04-19 2019-07-23 上海兆芯集成电路有限公司 用以分配任务的处理系统及其访存方法
CN110083388A (zh) * 2019-04-19 2019-08-02 上海兆芯集成电路有限公司 用于调度的处理系统及其访存方法
CN111708639A (zh) * 2020-06-22 2020-09-25 中国科学技术大学 任务调度系统及方法、存储介质及电子设备

Also Published As

Publication number Publication date
CN114816777A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
CN110489213B (zh) 一种任务处理方法及处理装置、计算机系统
US9678497B2 (en) Parallel processing with cooperative multitasking
US11550627B2 (en) Hardware accelerated dynamic work creation on a graphics processing unit
US8209690B2 (en) System and method for thread handling in multithreaded parallel computing of nested threads
US9009711B2 (en) Grouping and parallel execution of tasks based on functional dependencies and immediate transmission of data results upon availability
US9898338B2 (en) Network computer system and method for dynamically changing execution sequence of application programs
WO2022160628A1 (zh) 命令处理装置、方法、电子设备以及计算机可读存储介质
WO2006059543A1 (ja) スケジューリング方法、スケジューリング装置およびマルチプロセッサシステム
US10402223B1 (en) Scheduling hardware resources for offloading functions in a heterogeneous computing system
CN113051057A (zh) 多线程数据无锁处理方法、装置及电子设备
US20150287159A1 (en) Process synchronization between engines using data in a memory location
US20210311782A1 (en) Thread scheduling for multithreaded data processing environments
WO2016202153A1 (zh) 一种gpu资源的分配方法及系统
CN114116155A (zh) 无锁工作窃取线程调度器
Bautin et al. Graphic engine resource management
JP6372262B2 (ja) 印刷装置、およびプログラム
CN110245024B (zh) 静态存储块的动态分配系统及其方法
JP7122299B2 (ja) 処理タスクを実行するための方法、装置、デバイス、および記憶媒体
JP2002287957A (ja) キャッシュのような構造を使用してcpu設計におけるオペランド・アクセス・ステージを高速化するための方法及び装置
CN108845969B (zh) 适用于不完全对称多处理微控制器的操作控制方法及操作系统
US20220300322A1 (en) Cascading of Graph Streaming Processors
CN115618966A (zh) 用于训练机器学习模型的方法、装置、设备和介质
US9015719B2 (en) Scheduling of tasks to be performed by a non-coherent device
JP2021060707A (ja) 同期制御システムおよび同期制御方法
US20170357540A1 (en) Dynamic range-based messaging

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922225

Country of ref document: EP

Kind code of ref document: A1