WO2021179224A1 - Data processing device, data processing method and accelerator - Google Patents

Data processing device, data processing method and accelerator Download PDF

Info

Publication number
WO2021179224A1
WO2021179224A1 PCT/CN2020/078876 CN2020078876W WO2021179224A1 WO 2021179224 A1 WO2021179224 A1 WO 2021179224A1 CN 2020078876 W CN2020078876 W CN 2020078876W WO 2021179224 A1 WO2021179224 A1 WO 2021179224A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
control instruction
module
control
data
Prior art date
Application number
PCT/CN2020/078876
Other languages
French (fr)
Chinese (zh)
Inventor
韩峰
李鹏
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080004332.0A priority Critical patent/CN112602094A/en
Priority to PCT/CN2020/078876 priority patent/WO2021179224A1/en
Publication of WO2021179224A1 publication Critical patent/WO2021179224A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of computer data processing, and in particular to a data processing device, a data processing method, and an accelerator.
  • a convolutional neural network (Convolutional Neural Network, CNN) is a complex and non-linear hypothetical model.
  • the model parameters used are obtained through training and learning, and it has the ability to fit data.
  • Convolutional neural network algorithms can be applied to scenarios such as machine vision and natural language processing.
  • the CNN algorithm is implemented in an embedded system, due to the large resource consumption of neural network processing, it is necessary to fully consider computing resources and real-time performance. Therefore, it is necessary to improve the computing resource utilization rate of neural network processing.
  • one of the objectives of the embodiments of the present application is to provide a data processing device, a data processing method, and an accelerator.
  • a data processing device which includes a control module, a data loading module, and a processing module;
  • the data loading module in response to the control instruction of the control module, loads the data to be processed for processing by the processing module;
  • the processing module responds to the control instruction of the control module to process the data to be processed
  • the control module controls the data loading module and the processing module to execute different control instructions at the same time.
  • a data processing method which is applied to a data processing device, the data processing device includes a data loading module and a processing module; the method includes:
  • data processing is performed by the processing module; wherein the data loading module and the processing module execute different control instructions at the same time.
  • an accelerator is provided, including the device described in any one of the first aspect.
  • control module controls the data loading module and the processing module to execute different control instructions at the same time.
  • the processing module processes the data to be processed corresponding to the current control instruction
  • the data loading The module may be loading the to-be-processed data corresponding to the next control instruction, which is beneficial to improve the utilization of processing resources and avoid the waste of processing resources caused by the waiting process.
  • Fig. 1 is a schematic structural diagram of a first data processing device according to an exemplary embodiment of the present application.
  • Fig. 2 is a schematic structural diagram of a second data processing device according to an exemplary embodiment of the present application.
  • Fig. 3 is a schematic diagram showing an execution flow of a control instruction according to an exemplary embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a third data processing device according to an exemplary embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a fourth data processing device according to an exemplary embodiment of the present application.
  • Fig. 6 is a schematic diagram showing the execution of the first control instruction according to an exemplary embodiment of the present application.
  • Fig. 7 is a schematic diagram showing the execution of a second type of control instruction according to an exemplary embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a fifth data processing device according to an exemplary embodiment of the present application.
  • Fig. 9 is a schematic diagram showing an execution flow of a control instruction according to an exemplary embodiment of the present application.
  • Fig. 10 is a schematic diagram showing the execution of a third control instruction according to an exemplary embodiment of the present application.
  • Fig. 11 is a schematic flowchart of a data processing method according to an exemplary embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a first data processing apparatus according to an exemplary embodiment of the present application.
  • the device includes: a control module 11, a data loading module 12, and a processing module 13.
  • the data loading module is not limited to one. In other words, there may be multiple data loading modules.
  • the data processing device includes two data loading modules, namely a feature map loading module and a weight loading module.
  • the data loading module 12 in response to the control instruction of the control module 11, loads the data to be processed for the processing module 13 to process.
  • the processing module 13 responds to the control instruction of the control module 11 to process the data to be processed.
  • the control module 11 controls the data loading module 12 and the processing module 13 to execute different control instructions at the same time.
  • the control module 11 controls the data loading module 12 to process the control instruction in advance, without waiting for the processing of the previous control instruction in the entire data processing device to end.
  • the control module 11 does not need to wait for the end of the processing of the control instruction x0 in the processing module 13, the control module 11 can control the data loading module 12 to process the operation corresponding to the control instruction x1 in advance, where the control instruction x0 and the control instruction x1 are executed in sequence Control instruction.
  • control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12 and the processing module 13; the data loading module 12 responds to the control Instruction to load the to-be-processed data for processing by the processing module 13; the processing module 13 to process the to-be-processed data in response to the control instruction; wherein, in order to further improve the comprehensive utilization rate of processing resources, the The control module 11 can control the data loading module 12 and the processing module 13 to execute different control instructions at the same time. That is, the control module 11 can control the data loading module 12 and the processing module 13 to execute at the same time.
  • the loading module 12 can directly load the data corresponding to the next control instruction based on the control of the control module 11, that is, when the processing module 13 processes the data to be processed corresponding to the current control instruction, the data is loaded.
  • the module 12 is loading the to-be-processed data corresponding to the next control instruction, thereby helping to improve the utilization rate of processing resources and avoid the waste of processing resources caused by the waiting process.
  • control module 11 sends the i+1th control instruction to the data loading module 12 in response to the data loading module 12 completing the execution of the i-th control instruction; and in response to the The processing module 13 finishes executing the i-th control instruction, and sends the i+1-th control instruction to the processing module 13; where i is an integer.
  • the completion of the execution of the i-th control instruction by the data loading module 12 means that the data loading module 12 has finished loading the data to be processed corresponding to the i-th control instruction; the processing module 13 controls the i-th control instruction.
  • the completion of the instruction execution means that the processing module 13 has processed the to-be-processed data corresponding to the i-th control instruction.
  • the data loading module 12 loads the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction.
  • the processing module 13 processes the to-be-processed data corresponding to the i-th control instruction to obtain a processing result.
  • the processing module 13 processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction
  • the data loading module 12 if the data loading module 12 has finished loading the i-th control instruction After the data to be processed corresponding to the instruction, there is no need to wait for the processing module 13 to complete the execution of the i-th control instruction, and the data loading module 12 can directly receive the i+1-th control instruction sent by the control module 11 , And load the data to be processed corresponding to the i+1th control instruction in response to the i+1th control instruction; this embodiment further reduces the time for the data loading module 12 to wait for the next control instruction to avoid waiting time To deal with the waste of resources.
  • the processing module 13 includes a systolic array; in response to the i-th control instruction, the processing module 13 writes the to-be-processed data corresponding to the i-th control instruction into the systolic array.
  • the systolic array performs operations on the to-be-processed data to obtain the processing result.
  • the data process of the data to be processed is realized through the hardware structure, which is beneficial to improve the processing efficiency.
  • FIG. 2 is a schematic structural diagram of a second data processing device according to an exemplary embodiment of this application.
  • the device includes: a control module 11, a data loading module 12, a processing module 13, and a data writing back module 14.
  • the data loading module 12 in response to the control instruction of the control module 11, loads the data to be processed for the processing module 13 to process.
  • the processing module 13 responds to the control instruction of the control module 11 to process the data to be processed.
  • the data write-back module 14 writes the processing result of the data to be processed into the external storage module in response to the control instruction of the control module 11.
  • the control module 11 controls the data loading module 12, the processing module 13 and the data writing back module 14 to execute different control instructions at the same time.
  • control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12, the processing module 13, and the data write-back module 14; the data The loading module 12 loads the data to be processed by the processing module 13 in response to the control instruction; the processing module 13 processes the data to be processed in response to the control instruction to obtain a processing result; In response to the control instruction, the data write-back module 14 writes the processing result into the external storage module.
  • control module 11 may control the data loading module 12, the processing module 13 and the data write-back module 14 to execute different control instructions at the same time, that is, The control module 11 can control the data loading module 12, the processing module 13 and the data write-back module 14 to execute a non-same control instruction at the same time; the data loading module 12 is different from the current control instruction after loading.
  • the processing module 13 After the corresponding data to be processed, there is no need to wait for the processing module 13 to process the data to be processed corresponding to the current control instruction, and the data loading module 12 can directly load the next control instruction based on the control of the control module 11 Corresponding to-be-processed data; after the processing module 13 processes the data corresponding to the current control instruction, there is no need to wait for the data write-back module 14 to write the processing result corresponding to the current control instruction into the external storage module, so The processing module 13 can directly process the data to be processed corresponding to the next control instruction based on the control of the control module 11; that is, the data write-back module 14 is writing the processing result corresponding to the previous control instruction
  • the processing module 13 may be processing the data corresponding to the current control instruction, and the data loading module 12 may be loading the data corresponding to the next control instruction, thereby helping to improve the utilization of processing resources and avoid waiting The process causes a waste of processing resources.
  • control module 11 sends the i+1th control instruction to the data loading module 12 in response to the data loading module 12 completing the execution of the i-th control instruction; and in response to the The processing module 13 completes the execution of the i-th control instruction and sends the i+1th control instruction to the processing module 13; and, in response to the data write-back module 14 completing the execution of the i-th control instruction, sends the i+1 control instructions are sent to the data write-back module 14; where i is an integer.
  • the completion of the execution of the i-th control instruction by the data loading module 12 means that the data loading module 12 has finished loading the data to be processed corresponding to the i-th control instruction; the processing module 13 controls the i-th control instruction.
  • the completion of the instruction execution means that the processing module 13 has processed the to-be-processed data corresponding to the i-th control instruction; the completion of the data write-back module 14 execution of the i-th control instruction means that the data write-back module 14 is completed The operation of writing the processing result corresponding to the i-th control instruction into the external storage module.
  • the data loading module 12 loads the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction.
  • the processing module 13 processes the to-be-processed data corresponding to the i-th control instruction to obtain a processing result.
  • the data write-back module 14 writes the processing result corresponding to the i-th control instruction into the external storage module.
  • the processing module 13 processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction
  • the data loading module 12 has finished loading the i-th control instruction
  • the data loading module 12 can directly receive the i+1-th control instruction sent by the control module 11 , And load the to-be-processed data corresponding to the i+1th control instruction; this embodiment further reduces the waiting time of the data loading module 12, and avoids waste of processing resources caused by the waiting time.
  • the processing module 13 when the data write-back module 14 writes the processing result corresponding to the i-th control instruction into the external storage module in response to the i-th control instruction, if the processing module 13 has processed the The data to be processed corresponding to the i-th control instruction does not need to wait for the data write-back module 14 to complete the execution of the i-th control instruction, and the processing module 13 can directly receive the i+th sent by the control module 11. 1 control instruction, and in response to the i+1 control instruction, process the data to be processed corresponding to the i+1 control instruction; this embodiment further reduces the waiting time of the data processing module 13 , To avoid waste of processing resources caused by waiting time.
  • FIG. 3 is a schematic diagram showing the processing time of the data loading module 12, the processing module 13, and the data writing back module 14; the data loading module 12 has finished loading the control instruction 1 After the corresponding to-be-processed data, there is no need to wait for the processing module 13 to process the to-be-processed data corresponding to the control instruction 1 and the data write-back module 14 to write the processing result corresponding to the control instruction 1, you can directly obtain control Instruction 2 and load the data to be processed corresponding to the control instruction 2, thereby further reducing the time for the data loading module 12 to wait for the next control instruction, and avoiding the waste of processing resources caused by the waiting time; accordingly, the processing module 13 is processing After finishing the to-be-processed data corresponding to the control instruction 1, after obtaining the processing result, without waiting for the data write-back module 14 to write the processing result into the external storage module, you can directly obtain the control instruction 2 and process the pending data corresponding to the control instruction 2. The data is processed, thereby further reducing the time
  • the embodiment of the present application does not impose any restriction on the specific type of the external storage module, and specific settings can be made according to actual application scenarios.
  • it can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and programmable Read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and programmable Read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the data processing device can be applied to related processing based on a convolutional neural network, which is a machine learning algorithm, which is widely used in target recognition, target detection, and image semantics.
  • a convolutional neural network which is a machine learning algorithm, which is widely used in target recognition, target detection, and image semantics.
  • Computer vision tasks such as segmentation.
  • the structure of a neural neural network usually includes an input layer, one or more hidden layers, and an output layer.
  • the operations in the hidden layer include, but are not limited to, convolution operations, pooling operations, or activation operations; in general, convolution
  • the hidden layers in the neural network can be named according to the type of operation. For example, the hidden layer for convolution operation can be classified as a convolutional layer, and the hidden layer for pooling operation can be classified as a pooling layer or a hidden layer for activation operation. Can be classified as an active layer.
  • the data processing device provided by the embodiment of the present application can perform convolution operation of the convolutional layer, pooling operation of the pooling layer, and activation operation of the activation layer, so as to accelerate the operation process of the deep neural network through hardware and reduce the depth.
  • the calculation time of the neural network improves the calculation efficiency.
  • the objects processed by the convolutional neural network include but are not limited to images, audio or text, etc., and different types of hidden layers in the convolutional neural network correspond to different operating parameters, such as the operating parameters of the convolutional layer It is a convolution kernel, the operating parameter of the activation layer is an activation function, and the operating parameter of the pooling layer is a pooling parameter.
  • the convolutional neural network is applied to the field of image processing, and the data processing device is used to perform the convolution operation of the convolutional layer in the convolutional neural network as an example for description: the convolutional operation process of the convolutional layer is Perform vector inner product operations on the image to be processed based on the convolution kernel to obtain a feature map; then in this scenario, the control module 11 sends the data loading module 12, the processing module 13, and the data write-back module 14 respectively Sending a convolution operation control instruction, the data loading module 12 loads the image to be processed and the convolution kernel in response to the convolution operation control instruction of the control module 11; the processing module 13 responds to the convolution operation of the control module 11
  • the product operation control instruction is to perform a convolution operation on the image to be processed through the convolution kernel to obtain a feature map; the data write-back module 14 responds to the convolution operation control instruction of the control module 11 to convert the feature Figure is written into an external storage module; where the control module 11 controls the data loading module 12, the processing module 13
  • the data loading module 12 may include an object loading unit and a parameter loading unit.
  • the object loading unit is used to load the image to be processed, and the parameter loading unit is used to load the volume.
  • the product core, the object loading unit and the parameter loading unit are loaded at the same time, which is beneficial to improve the loading efficiency.
  • the control module 11 includes at least one instruction slot for caching the control instruction.
  • the instruction slot includes a set of instruction cache flags and control status signals.
  • the instruction cache flag is used to indicate whether the instruction cached in the instruction slot is valid.
  • the instruction cache flag can indicate which control instruction is cached by the instruction slot and whether the cached control instruction is valid.
  • the set of control state signals is used to indicate the working state of the corresponding module, or indicate the working state of other instruction slots related to the instruction slot.
  • the working status refers to whether the processing or operation of the corresponding module is completed, or whether the instruction cache operation of the corresponding instruction slot is completed.
  • the control module 11 In response to the completion of the execution of the i-th control instruction by the data loading module 12, the control module 11 sends the i+1th control instruction buffered in the instruction slot to the data loading module 12; and in response to all The processing module 13 completes the execution of the i-th control instruction, and sends the i+1-th control instruction cached in the instruction slot to the processing module 13; and responds to the data write-back module 14 for the i-th control instruction After the control instruction is executed, the i+1th control instruction buffered in the instruction slot is sent to the data write-back module 14.
  • the command slot can record the status of the control command cached in the command slot, and the control module 11 can control the execution process of the control command according to the status of the control command recorded in at least one command slot.
  • the data loading module 12, the processing module 13 and the data writing back module 14 each correspond to an instruction slot, and one of the instruction slots corresponds to the data loading module 12 as an example for description.
  • the control module 11 can cache the i+1th control instruction to the instruction slot.
  • the state of the control instruction recorded in the instruction slot is changed to indicate that the i+1th control instruction is valid, which indicates that the i+th control instruction is valid.
  • One control command has not been sent to the data loading module 12 yet.
  • the state of the control instruction recorded in the instruction slot ensures that the control module 11 can cache different control instructions in an orderly manner, so as to ensure that the data loading module 12, the processing module 13 and the data
  • the write-back module 14 sends different control commands in an orderly manner.
  • the state of the control module 11 recorded in the instruction slot may be represented by a control instruction state signal, which indicates whether the control instruction buffered in the corresponding instruction slot is valid.
  • a control instruction state signal indicates whether the control instruction buffered in the corresponding instruction slot is valid.
  • the control module 11 can cache the i+1-th control instruction to the instruction.
  • the status value of the control command status signal is set to a value that characterizes validity, which is used to indicate that the i+1th control command cached in the command slot is valid.
  • the data loading module 12, the processing module 13 and the data writing back module 14 respectively correspond to an instruction slot, whether the control instruction cached in the instruction slot is valid, and whether the module corresponding to the at least one instruction slot is completed
  • the operation of the control instruction corresponding to the instruction slot is related.
  • a control instruction completion signal may be used to indicate whether the module corresponding to the at least one instruction slot has completed the operation of the control instruction corresponding to the at least one instruction slot; wherein, the "at least one instruction slot corresponds to "Whether the module of "has completed the operation of the control instruction corresponding to the at least one instruction slot” refers to whether the module corresponding to the at least one instruction slot has completed receiving the control instruction corresponding to the instruction slot.
  • the control module 11 includes at least one instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the corresponding module to complete the corresponding
  • the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is invalid; otherwise, the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is valid.
  • the control module 11 can cache different control commands in an orderly manner, so as to ensure that the data loading module 12, the processing module 13 and the data writing back module 14 are available. Send different control commands in sequence.
  • the data loading module 12, the processing module 13 and the data write-back module 14 respectively correspond to an instruction slot, and the caching process of the control instruction in the instruction slot corresponds to the data loading module 12,
  • the processing module 13 and the data write-back module 14 are transferred in sequence; in one example, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, corresponding to the first instruction slot of the data processing module 13 The third instruction slot 112 and the fourth instruction slot 113 corresponding to the data write-back module 14; when the i-th control instruction in the second instruction slot 111 is invalid, the i+1th control instruction is cached to the second instruction slot 111.
  • the i+1-th control instruction cached in the second instruction slot 111 is cached to the third instruction slot 112
  • the i-th control instruction in the fourth instruction slot 113 is When a control instruction is invalid, the i+1th control instruction cached in the third instruction slot 112 is cached to the fourth instruction slot 113. It can be seen that the i+1th control instruction is sequentially cached in the second instruction slot 111. In the third instruction slot 112 and the fourth instruction slot 113.
  • control instruction cached in the instruction slot is valid, in addition to whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the instruction slot, is also related to whether the at least one instruction slot completes the corresponding control
  • the operation of the instruction is related; the "whether at least one instruction slot completes the operation of the corresponding control instruction" refers to whether the control instruction in the instruction slot is cached in another instruction slot.
  • two control instruction completion signals can be set, one of which is used to indicate whether the module corresponding to the at least one instruction slot has completed the operation of the control instruction corresponding to the at least one instruction slot, and the other A control instruction completion signal is used to indicate whether the at least one instruction slot has completed the operation of the corresponding control instruction.
  • the control module 11 includes at least one instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and at least two control instruction completion signals; wherein, when one of the control instruction completion signals indicates the corresponding When the module completes the operation of the corresponding control instruction, and another control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid; Otherwise, the control command status signal indicates that the control command buffered in the corresponding command slot is valid.
  • This embodiment uses the control command status signal and the two control command completion signals to ensure that the control module 11 can cache different control commands in an orderly manner to ensure that the data loading module 12, the processing module 13 and the data write-back module 14 Send different control commands in an orderly manner.
  • the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13 and corresponding to the The data is written back to the fourth instruction slot 113 of the module 14.
  • the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are all used to: after the control module 11 has sent the i-th control instruction at different times, respectively, cache the corresponding The i+1th control instruction.
  • the instruction slot can cache the instruction control word corresponding to the control instruction after decoding, and the corresponding control instruction does not need to be cached.
  • the second instruction slot 111 is used to buffer the i+1th control instruction sent to the data loading module 12 after the control module 11 has sent the i-th control instruction;
  • the three instruction slot 112 is used to buffer the i+1th control instruction sent to the processing module 13 after the control module 11 has sent the i-th control instruction;
  • the fourth instruction slot 113 It is used to buffer the (i+1)th control instruction sent to the data write-back module 14 after the control module 11 finishes sending the i-th control instruction. It should be noted that the time points at which the i+1th control instruction is cached in each of the above-mentioned instruction slots may be different.
  • sending the i-th control instruction means that the control module 11 is sending the i-th control instruction to the data loading module 12 and the third instruction slot 111 corresponding to the After the data write-back module 14 corresponding to the processing module 13 or the fourth instruction slot 113 corresponding to the instruction slot 112, the data loading module 12, the processing module 13, or the data write-back module 14 is completed Receive the i-th control instruction.
  • the control module 11 caches the i+1th control instruction to the second instruction slot 111 in response to the i-th control instruction in the second instruction slot 111 being invalid; And in response to the completion of the execution of the i-th control instruction by the data loading module 12, the i+1-th control instruction in the second instruction slot 111 is sent to the data loading module 12.
  • the second command slot 111 includes a second valid signal, a data loading module completion signal, and a third command slot completion signal;
  • the second valid signal is used to instruct the control of the buffer in the second command slot 111 Whether the instruction is valid;
  • the data loading module completion signal is used to indicate whether the data loading module 12 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the data loading module 12)
  • the third instruction slot completion signal is used to indicate whether the third instruction slot 112 has completed caching the control instruction (that is, whether the control module 11 has cached the cache instruction in the second instruction slot 111 To the third instruction slot 112).
  • the control module 11 may cache the i+1th control instruction to the first The second instruction slot 111; when the i+1th control instruction is cached in the second instruction slot 111, the i+1th control instruction has not been sent to the data loading module 12 and cached to the The third command slot 112, therefore, the state value of the second valid signal is set to a value representing valid, and the state values of the data loading module completion signal and the third command slot completion signal are set to be representing not The completed value.
  • the control module 11 in response to the i-th control instruction in the third instruction slot 112 being invalid, caches the i+1-th control instruction in the second instruction slot 111 to all The third instruction slot 112; and in response to the processing module 13 completing the execution of the i-th control instruction, the i+1th control instruction in the third instruction slot 112 is sent to the processing module 13.
  • the third instruction slot 112 includes a third valid signal, a processing module completion signal, and a fourth instruction slot 113 completion signal.
  • the third valid signal is used to indicate whether the control instruction buffered in the third instruction slot 112 is valid;
  • the processing module completion signal is used to indicate whether the processing module 13 has completed receiving the control instruction (that is, the Whether the control module 11 has sent the control instruction to the processing module 13);
  • the fourth instruction slot 113 completion signal is used to indicate whether the fourth instruction slot 113 has completed buffering the control instruction (that is, the Whether the control module 11 has cached the cache instruction in the third instruction slot 112 to the fourth instruction slot 113).
  • the status values of the processing module completion signal and the fourth instruction slot 113 completion signal are set to represent At the same time, the state value of the third valid signal is set to a value characterizing invalidity, and the control module 11 may cache the i+1th control instruction to the third instruction slot 112 ;
  • the i+1th control instruction is cached to the third instruction slot 112
  • the i+1th control instruction has not been sent to the processing module 13 and cached to the fourth instruction slot 113
  • the state value of the third valid signal is set to a value that characterizes validity
  • the state value of the processing module completion signal and the state value of the fourth command slot 113 completion signal is set to a value that characterizes incompleteness.
  • the control module 11 caches the i+1-th control instruction in the third instruction slot 112 to all The fourth instruction slot 113; and in response to the completion of the execution of the i-th control instruction by the data write-back module 14, the i+1th control instruction in the fourth instruction slot 113 is sent to the processing module 13 .
  • the fourth command slot 113 includes a fourth valid signal and a data write-back module 14 completion signal; the fourth valid signal is used to indicate whether the control command buffered in the fourth command slot 113 is valid; The data write-back module 14 completion signal is used to indicate whether the data write-back module 14 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the data write-back module 14).
  • the status value of the completion signal of the data write-back module 14 is set to the value that characterizes the completion, and at the same time, the status of the fourth valid signal The value is set to a value that characterizes invalidity.
  • the control module 11 may cache the i+1th control instruction to the fourth instruction slot 113; when the i+1th control instruction is cached to the fourth instruction slot 113; In the case of four command slots 113, the i+1 has not been sent by the control module 11 to the data write-back module 14. At this time, the state value of the fourth valid signal is set to a valid value, and The state value of the completion signal of the data write-back module 14 is set to a value representing incompleteness.
  • the third instruction slot 112 corresponding to the processing module 13 and corresponding to the data write-back
  • the fourth instruction slot 113 of the module 14 enables the control module 11 to load the data module 12, the processing module 13 and the data through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, respectively.
  • the data write-back module 14 performs control to ensure that the data loading module 12, the processing module 13 and the data write-back module 14 can execute different control commands at the same time, thereby reducing the corresponding waiting time and benefiting Improve the utilization of processing resources.
  • FIG. 5 is a schematic structural diagram of a fourth data processing device according to an exemplary embodiment of this application.
  • the embodiment shown in FIG. 5 is based on the embodiment shown in FIG. 4
  • the control module 11 also includes a first instruction slot 114, which is used to cache decoded control instructions; wherein, the decoded control instructions can be represented by corresponding instruction control words.
  • the control module 11 decodes the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot 114 being invalid, the decoded i+1-th control instruction is invalid
  • the control instruction is cached in the first instruction slot 114; and in response to the i-th control instruction decoded in the second instruction slot 111 being invalid, the i+1-th control instruction decoded in the first instruction slot 114 Cache to the second instruction slot 111.
  • the first command slot 114 includes a first valid signal and a second command slot completion signal; the first valid signal is used to indicate whether the control command buffered in the first command slot 114 is valid; The second instruction slot completion signal is used to indicate whether the second instruction slot 111 has completed caching the control instruction (that is, whether the control module 11 has cached the cache instruction in the first instruction slot 114 to the The second command slot 111).
  • the status value of the second command slot completion signal is set to a value that characterizes completion, and at the same time, the first valid signal
  • the state value is set to a value characterizing invalidity.
  • the control module 11 can cache the decoded i+1th control instruction to the first instruction slot 114; when the decoded i+1th control instruction is cached to In the first instruction slot 114, the i+1th control instruction has not been cached by the control module 11 to the second instruction slot 111, and the state value of the first valid signal is set to indicate valid And the state value of the completion signal of the second command slot is set to a value that characterizes incompleteness.
  • control instructions cached in the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are decoded control instructions.
  • the execution process of the decoded control instructions cached in the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 please refer to the description in the embodiment shown in FIG. 4. I won't repeat them here.
  • the control module 11 can load the data into the module 12 and the data through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, respectively.
  • the processing module 13 and the data write-back module 14 are controlled to ensure that the data loading module 12, the processing module 13 and the data write-back module 14 can execute different control commands at the same time, thereby reducing the corresponding The waiting time is conducive to improving the utilization rate of processing resources.
  • the data processing device provided in this embodiment of the application can be applied to a convolutional neural network to target objects (the target objects include but are not limited to images, audio, video, text, etc.)
  • the convolution operation of the convolutional layer, the pooling operation of the pooling layer, or the activation operation of the activation layer can be performed to accelerate the calculation process of the deep neural network through hardware, reduce the calculation time of the deep neural network, and improve Operational efficiency.
  • the convolutional neural network is applied to the field of image processing, and the data processing device is used to perform the convolution operation of the convolutional layer in the convolutional neural network as an example: the control module 11 receives a convolution operation control instruction And distributed to the data loading module 12, the processing module 13, and the data writing back module 14.
  • the distribution process of the control instruction by the control module 11 is divided into 4 stages, namely, a decoding stage, a loading stage, an execution stage, and a storage stage. This embodiment implements the control of the execution process of different control instructions through the above four stages.
  • the decoding stage corresponds to the first instruction slot 114, and the control module 11 caches the decoded control instruction into the first instruction slot 114.
  • the loading stage corresponds to the second instruction slot 111.
  • the control module 11 caches the decoded control instruction in the first instruction slot 114 into the second instruction slot 111, and then transfers the second instruction slot 111 to the second instruction slot 111.
  • the buffered decoded control instruction is sent to the data loading module 12.
  • the execution stage corresponds to the third instruction slot 112.
  • the control module 11 caches the decoded control instruction in the second instruction slot 111 into the third instruction slot 112, and then caches the third instruction slot 112
  • the decoded control instruction is sent to the processing module 13.
  • the storage stage corresponds to the fourth instruction slot 113.
  • the control module 11 caches the decoded control instruction in the third instruction slot 112 into the fourth instruction slot 114, and then caches the fourth instruction slot 114 The decoded control instruction is sent to the data writing back module 14.
  • each instruction slot buffers different control instructions at different times, and the execution of the control instruction is controlled through the control instruction status signal and the control instruction completion signal in the instruction slot.
  • Figure 7 illustrate the buffering of the control instructions in different time periods in each instruction slot as an example:
  • the control module 11 decodes the control instruction a and caches the decoded control instruction a in the first instruction slot 114 corresponding to the decoding stage; at this time, in the first instruction slot 114, the The first valid signal indicates that the decoded control instruction a is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not finished buffering the decoded control instruction a.
  • the control module 11 caches the decoded control instruction a in the first instruction slot 114 into the second instruction slot 111 corresponding to the loading stage, and then stores the decoded control instruction a in the second instruction slot 111
  • the control instruction a is sent to the data loading module 12; at this time, in the second instruction slot 111, the second valid signal indicates that the decoded control instruction a is valid, and the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction a.
  • the decoded control instruction a in the first instruction slot 114 has been cached in the second instruction slot 111, the first valid signal in the first instruction slot 114 indicates the decoded control instruction a is invalid, the second instruction slot completion signal instructs the second instruction slot 111 to finish caching the decoded control instruction a, then the control control module 11 can decode the control instruction b and cache the decoded control instruction b to the first In the instruction slot 114, correspondingly, in the first instruction slot 114, the first valid signal indicates that the decoded control instruction b is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not completed buffering the decoding After the control instruction b.
  • the control module 11 caches the decoded control instruction a in the second instruction slot 111 into the third instruction slot 112 corresponding to the execution stage, and then stores the decoded control instruction in the third instruction slot 112 a is sent to the processing module 13; at this time, in the third instruction slot, a third valid signal indicates that the decoded control instruction a is valid, and the processing module completion signal indicates that the processing module 13 has completed receiving the decoding
  • the subsequent control instruction a and the fourth instruction slot completion signal indicate that the fourth instruction slot 113 has not completed buffering the decoded control instruction a.
  • the second valid signal in the second instruction slot indicates the decoded control instruction a is invalid
  • the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction a
  • the third instruction slot completion signal indicates that the third instruction slot 112 has completed buffering the decoded control instruction a.
  • Control instruction a the control module 11 can cache the decoded control instruction b in the first instruction slot 114 in the second instruction slot 111.
  • the load module 12 also When the decoded control instruction a is executed and the decoded control instruction b cannot be issued, then in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, and the data loading module The completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction b.
  • the control module 11 can decode the control instruction c and cache the decoded control instruction c to the first In an instruction slot 114, correspondingly, in the first instruction slot, the first valid signal indicates that the decoded control instruction c is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not completed buffering the decoding After the control instruction c.
  • the control module 11 caches the decoded control instruction a in the third instruction slot 112 into the fourth instruction slot 113 corresponding to the storage stage, and then stores the decoded control instruction a in the fourth instruction slot 113 Send to the data writing back module 14.
  • the decoded control instruction b cannot be issued to the data loading module 12, and it is still cached in the second instruction slot;
  • the second valid signal indicates that the decoded control instruction b is valid
  • the data loading module completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction b
  • the third The instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction b.
  • the data loading module 12 has finished executing the decoded control instruction a, and the control module 11 sends the decoded control instruction b in the second instruction slot to the loading module 12;
  • the second valid signal indicates that the decoded control instruction b is valid
  • the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction b
  • the third The instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction b.
  • the control module 11 caches the decoded control instruction b in the second instruction slot 111 into the third instruction slot 112, and then after the processing module 13 finishes executing the decoded control instruction b , Sending the decoded control instruction b buffered in the third instruction slot 112 to the processing module 13; at this time, in the third instruction slot, the third valid signal indicates that the decoded control instruction b is valid, The processing module completion signal indicates that the processing module 13 has completed receiving the decoded control instruction b, and the fourth instruction slot completion signal indicates that the fourth instruction slot 113 has not completed buffering the decoded control instruction b.
  • the second valid signal in the second instruction slot 111 indicates the decoded control Instruction b is invalid
  • the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction b
  • the third instruction slot completion signal indicates that the third instruction slot 112 has completed buffering the decoded control instruction b
  • the control module 11 may cache the decoded control instruction c in the first instruction slot 114 into the second instruction slot 111. In a possible situation, the loading module 12 is still performing the decoded control instruction c.
  • the second valid signal in the second instruction slot 111 indicates that the decoded control instruction c is valid, and the data The loading module completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction c, and the third instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction c.
  • the control module 11 can decode the control instruction d and cache the decoded control instruction d to the first In the instruction slot 114, correspondingly, the first valid signal in the first instruction slot 114 indicates that the decoded control instruction d is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not completed buffering the decoding After the control instruction d.
  • the control module 11 caches the decoded control instruction b in the third instruction slot 112 into the fourth instruction slot 113 corresponding to the storage stage, and then performs the data write-back module 14 After the decoded control instruction a is executed, the decoded control instruction b buffered in the fourth instruction slot 113 is sent to the data write-back module 14.
  • the second instruction slot 111 The second valid signal indicates that the decoded control instruction c is valid, the data loading module completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction c, and the third instruction slot completion signal indicates the third instruction slot 112 has not finished caching the decoded control instruction c.
  • the data to be processed includes objects to be processed and operating parameters, where the objects to be processed include but are not limited to images, audio, or text; the operating parameters include but are not limited to convolution kernels, Pooling parameters or activation functions.
  • the processing module 13 includes a systolic array; in response to the i-th control instruction, the processing module 13 writes the object to be processed and operating parameters corresponding to the i-th control instruction into the systolic array, respectively In this step, the object to be processed and the operating parameter are calculated through the systolic array to obtain the processing result.
  • the data processing device is used to perform the convolution operation of the image as an example: the object to be processed is the image, the operating parameter is the convolution kernel, and the processing module 13 converts the The image and the convolution kernel are written into the systolic array, and the image and the convolution kernel are multiplied, accumulated and added by the systolic array to obtain a convolved image.
  • the to-be-processed data that needs to be loaded includes at least two parts: the to-be-processed object and the operating parameters. Therefore, in order to further improve the efficiency of data loading, please refer to FIG. 8, which is The present application shows a schematic diagram of a fifth data processing device according to an exemplary embodiment.
  • the data loading module 12 includes an object loading unit 121 and a parameter loading unit 122.
  • the control module 11 sends the (i+1)th control instruction to the object loading unit 121; After the i control instruction is executed, the i+1th control instruction is sent to the parameter loading unit 122.
  • the object loading unit 121 and the parameter loading unit 122 respectively load the object to be processed and the operating parameter, and perform the loading at the same time, which is beneficial to improving the loading efficiency.
  • the completion of the execution of the i-th control instruction by the object loading unit 121 means that the object loading unit 121 has finished loading the object to be processed corresponding to the i-th control instruction;
  • the completion of the execution of the control instruction means that the parameter loading unit 122 has finished loading the operating parameter corresponding to the i-th control instruction.
  • the object loading unit 121 loads the object to be processed corresponding to the i-th control instruction; the parameter loading unit 122 loads the i-th control instruction in response to the i-th control instruction.
  • the processing module 13 processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, if the object loading unit 121 has finished loading the i-th control instruction The object to be processed corresponding to the instruction does not need to wait for the processing module 13 to complete the execution of the i-th control instruction, and the object loading unit 121 can directly receive the i+1-th control instruction sent by the control module 11.
  • the parameter loading unit 122 can directly receive the i+1-th control instruction sent by the control module 11, and load the operating parameter corresponding to the i+1-th control instruction; this embodiment The waiting time of the object loading unit 121 and the parameter loading unit 122 for the i+1th control instruction is further reduced, and the waste of processing resources caused by the excessive waiting time is avoided.
  • the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a third instruction slot 112 corresponding to the data write-back module. 14 of the fourth instruction slot 113; when the data loading module 12 includes an object loading unit 121 and a parameter loading unit 122, correspondingly, for the second instruction slot 111, the control module 11 responds to the second The i-th control instruction in the instruction slot 111 is invalid, and the i+1-th control instruction is cached in the second instruction slot 111; and in response to the object loading unit 121 and the parameter loading unit 122, the i-th control instruction is After the control instruction is executed, the i+1th control instruction in the second instruction slot 111 is sent to the data loading module 12.
  • the second command slot 111 includes a second valid signal, an object loading unit completion signal, a parameter loading unit completion signal, and a third command slot completion signal; the second valid signal is used to instruct the second command slot 111 Whether the control instruction buffered in the central storage is valid; the object loading unit completion signal is used to indicate whether the object loading unit 121 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the The object loading unit 121); the parameter loading unit completion signal is used to indicate whether the parameter loading unit 122 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the parameter loading Unit 122); the third instruction slot completion signal is used to indicate whether the third instruction slot 112 has completed caching the control instruction (that is, whether the control module 11 has already stored the second instruction slot 111 The cache instruction is cached to the third instruction slot 112).
  • the object loading unit completion signal and the parameter loading unit completion signal And the state value of the completion signal of the third command slot is set to a value that characterizes completion.
  • the state value of the second valid signal is set to a value that characterizes invalidity.
  • the control module 11 may The i+1th control instruction is cached in the second instruction slot 111; when the i+1th control instruction is cached in the second instruction slot 111, the i+1th control instruction It has not been sent to the object loading unit 121, the parameter loading unit 122, and cached in the third instruction slot 112. Therefore, the state value of the second valid signal is set to a value representing valid, and the The status values of the data loading module completion signal and the third instruction slot completion signal are set to values that represent incompleteness.
  • the target data to be processed may be divided into at least two parts.
  • the data to be processed is a part of the target data to be processed.
  • the control module 11 passes at least two parts.
  • the control instruction performs control, and a control instruction instructs a part of the target data to be processed to realize the processing of the target data to be processed. Since the target data to be processed is divided into at least two parts, the data loading module 12 When loading the data to be processed based on one of the control instructions, only a part of the target data to be processed is loaded, which is beneficial to improve the loading efficiency, so that the processing module 13 does not need to wait for the data loading module 12 to load the complete data to be processed. By processing the target data, the loaded data to be processed can be processed faster, and the processing efficiency can be further improved.
  • the control is controlled by at least two control instructions, and one control instruction instructs a part of the target data to be processed to realize the processing of the target data to be processed, Based on this, the control instruction includes a result write-back control instruction and a result non-write-back control instruction.
  • the result non-write-back control instruction is used to instruct the processing module 13 to not send the processing result to the data write-back module 14 after obtaining the processing result, but to cache the processing result.
  • the result is not written back to the processing result corresponding to the control instruction is not the final processing result that is finally written into the external storage module by the data writing back module 14, but a part of the final processing result; the result is not written back to the control after receiving the result.
  • the processing module 13 processes the corresponding data to be processed according to the result without writing back the control instruction, obtains the processing result and caches it, and then generates an end signal sent to the control signal after the cache is completed, so The control module 11 receives an end signal sent by the processing module 13 after buffering the processing result, and the end signal indicates that the result is not written back control instruction that has been executed in the data processing device.
  • the result write-back control instruction is used to instruct the processing module 13 to send all processing results related to the target data to be processed to the data write-back module 14.
  • the processing module 13 processes the corresponding data to be processed according to the result write-back control instruction to obtain the processing result, and integrates all processing results related to the target data to be processed and sends it to the data write-back Module 14, the data write-back module 14 writes to the external storage module.
  • the data write-back module 14 After the write operation is completed, the data write-back module 14 generates an end signal and sends it to the control module 11, and the control module 11 receives The data write-back module 14 sends an end signal after writing the processing result, and the end signal indicates that the result write-back control instruction has been executed in the data processing device.
  • each part is instructed by a control instruction, so that the data loading module 12 only needs to load all the data to be processed when loading the data to be processed corresponding to the control instruction.
  • the part of the target data to be processed is beneficial to improve the loading efficiency, and reduces the time that the processing module 13 waits for the data loading module 12 to load the target data to be processed, so that the processing can more quickly perform the loading process.
  • the processing of the data to be processed is conducive to improving processing efficiency.
  • control module 11 may return the end signal corresponding to the control instruction to the external control module to notify the external control module that the control instruction has been executed, so that the external control module can write to the external
  • the final processing result of the storage module goes to the next processing step.
  • control instruction since the control instruction includes a result write-back control instruction and a result-not-write-back control instruction, the two control instructions have different processing methods for the acquired processing results, and their ways of sending the end signal are also different.
  • the results are written back.
  • the control instruction is sent by the data write-back module 14 after the completion of the instruction, and the result is not written back control instruction is sent by the processing module 13 after the completion of the instruction; in an exemplary scenario, The data write-back module 14 is writing the final processing result A corresponding to the target data A to be processed into the external storage module.
  • the control module 11 may have processed a part of the to-be-processed data B 1 of the to-be-processed target data B and generated the end signal B 1 to be sent to the control module 11.
  • said control signal module 11 the end B is returned to the external control module 1, the external control module 1 directly to the next step based on the end of the signal B, the signal a while ignoring the end has not been received, the external control module
  • the processing steps based on the final processing result A may be skipped directly, which may lead to errors in the processing flow.
  • the control module 11 returns the end signal corresponding to the control instruction to the external control module according to the order in which the control instruction is received and the first-in-first-out principle.
  • the receiving order and the first-in-first-out principle it is determined that the currently received end signal is not currently to be sent, and the currently received end signal is buffered first, and the end signal is not returned to the external control until it is the turn of the end signal to be sent Module.
  • the end signal is processed in order to ensure that the control signal received from the external control module first, and the corresponding end signal is first returned to the external control module, thereby ensuring the accuracy and orderly progress of the data processing process. .
  • the target data C to be processed is divided into two parts, including the data to be processed c1 and the data to be processed c2, and the control instructions include a result write-back control instruction and a result non-write-back control Instruction, the result not written back control instruction corresponds to the to-be-processed data c1, and the result write-back control instruction corresponds to the to-be-processed c2.
  • the data loading module 12 loads the data to be processed c1 based on the result not writing back control instruction, and the processing module 13 performs the processing on the data to be processed based on the result not writing back control instruction.
  • c1 performs processing to obtain the processing result c1, and caches the processing result c1;
  • the data loading module 12 loads the data to be processed c2 based on the data write-back control instruction, and the processing module 13 writes back the control instruction based on the result Process the to-be-processed data c2 to obtain the processing result c2, and then integrate the processing result c1 and the processing result c2 to obtain the final processing result (c1, c2) corresponding to the to-be-processed target data C, and then use the
  • the data write-back module 14 writes the final processing result (c1, c2) into the external storage module.
  • the data to be processed includes objects to be processed and operating parameters, where the objects to be processed include but are not limited to images, audio, or text; the operating parameters include but are not limited to convolution kernels, Pooling parameters or activation functions.
  • the target object to be processed may be divided into at least two parts and the target operating parameter may be divided into at least two parts.
  • the target object to be processed is a part of the target object to be processed, and the operating parameter Is part of the target operating parameters.
  • control instruction includes a result write-back control instruction and a result-not-write-back control instruction
  • the result-not-write-back control instruction is used to instruct the processing module 13 not to perform the processing after obtaining the processing result.
  • the result is sent to the data write-back module 14, but the processing result is cached.
  • the processing result corresponding to the control instruction is not written back to the final processing of the data write-back module 14 to the external storage module.
  • the result is part of the final processing result;
  • the result write-back control instruction is used to instruct the processing module 13 to send all processing results related to the target object to be processed and the target operating parameter to the data
  • the write-back module 14 the processing module 13 integrates all the processing results related to the target object to be processed and the target operating parameters according to the result write-back control instruction, and sends the result to the data write-back module 14.
  • the data write-back module 14 writes to the external storage module.
  • each part is instructed by a control instruction, so that the data loading module 12 loads all the corresponding control instructions.
  • the object to be processed and the operating parameters are described, only a part of the object to be processed and the target operating parameters need to be loaded, which is beneficial to improve the loading efficiency and reduces the processing module 13 waiting for the data loading module 12 to load.
  • the time of the target data to be processed enables the processing to process the loaded data to be processed more quickly, which is beneficial to improve processing efficiency.
  • the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, which corresponds to the data processing module 13
  • the third instruction slot 112 and the fourth instruction slot 113 corresponding to the data write-back module 14; the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are used to cache control instructions.
  • the control instruction is a result non-write-back control instruction
  • there is no need to cache the data non-write-back control instruction to the fourth instruction slot 113 and there is no need to send the data non-write-back instruction to the data write-back.
  • the module 14, that is, the data write-back module 14 does not need to execute the result non-write-back control instruction.
  • the data processing device provided in this embodiment of the application can be applied to a convolutional neural network to target objects (the target objects include but are not limited to images, audio, video, text, etc.)
  • the convolution operation of the convolutional layer, the pooling operation of the pooling layer, or the activation operation of the activation layer can be performed to accelerate the calculation process of the deep neural network through hardware, reduce the calculation time of the deep neural network, and improve Operational efficiency.
  • the control module 11 receives a convolution operation control instruction And distributed to the data loading module 12, the processing module 13 and the data write-back module 14; among them, referring to FIG. 10, the control module 11 is divided into 4 stages for the distribution process of the control instructions, namely Decoding phase, loading phase, execution phase and storage phase.
  • the decoding stage corresponds to the first instruction slot 114
  • the loading stage corresponds to the second instruction slot 111
  • the execution stage corresponds to the third instruction slot 112
  • the storage stage corresponds to the fourth instruction slot.
  • control instruction when the control instruction is the result of non-write-back control instruction, there is no need to cache the data-non-write-back control instruction to the fourth instruction slot 113, and there is no need to send the data non-write-back instruction to The data write-back module 14, that is, the data write-back module 14 does not need to execute the result-not-write-back control instruction; when the control instruction is a result-write-back control instruction, it needs to be cached to the fourth The instruction slot 113 and sent to the data write-back module 14 are executed by the data write-back module 14.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units.
  • Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Ordinary technicians in this neighborhood can understand and implement without creative work.
  • an embodiment of the present application also provides a data processing method, which is applied to a data processing device, and the data processing device includes a data loading module and a processing module; the method includes:
  • step S101 in response to a control instruction, data is loaded by the data loading module for the processing module to perform data processing.
  • step S102 in response to the control instruction, data processing is performed by the processing module; wherein the data loading module and the processing module execute different control instructions at the same time.
  • the method further includes:
  • the processing result of the to-be-processed data is written into an external storage module; wherein the data loading module, the processing module, and the data write-back module execute different control instructions at the same time.
  • the method further includes:
  • the i+1th control instruction is sent to the data loading module; and in response to the completion of the execution of the i-th control instruction by the processing module, the i+1 control instructions are sent to the processing module; where i is an integer.
  • the step S101 includes: in response to the i-th control instruction, loading the to-be-processed data corresponding to the i-th control instruction.
  • the step S102 includes: in response to the i-th control instruction, processing the to-be-processed data corresponding to the i-th control instruction to obtain a processing result.
  • the device further includes a data write-back module.
  • the method also includes:
  • the i+1-th control instruction is sent to the data write-back module.
  • the processing result corresponding to the i-th control instruction is written into the external storage module through the data write-back module.
  • it further includes:
  • the processing module processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, there is no need to wait for the processing module to finish executing the i-th control instruction.
  • the data loading module receives the i+1th control instruction.
  • it further includes:
  • the processing module receives the i+1th control instruction.
  • it further includes:
  • the control instruction is cached through at least one instruction slot, and the execution process of the control instruction is controlled according to the state of the control instruction recorded in the at least one instruction slot.
  • it further includes:
  • the control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction status signal, wherein the control instruction status signal is used to indicate whether the control instruction cached in the corresponding instruction slot is valid.
  • it further includes:
  • the control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction completion signal; wherein, the at least one control instruction completion signal is used to indicate whether the module corresponding to the at least one instruction slot has completed the The operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes the operation of the corresponding control instruction.
  • it further includes:
  • the control instruction is cached through at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the corresponding module to complete the operation of the corresponding control instruction, the The control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid.
  • it further includes:
  • the control instruction is cached by at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the at least one instruction slot to complete the operation of the corresponding control instruction , The control command status signal indicates that the control command buffered in the corresponding command slot is invalid.
  • it further includes:
  • the control instruction is cached by at least one instruction slot; wherein the i+1th control instruction sent to the data loading module, the processing module, and the data write-back module is obtained from the instruction slot.
  • it further includes:
  • it further includes:
  • the i+1-th control instruction is cached in the second instruction slot.
  • the sending the i+1th control instruction to the data loading module in response to the completion of the execution of the i-th control instruction by the data loading module includes:
  • the (i+1)th control instruction in the second instruction slot is sent to the data loading module.
  • it further includes:
  • the i+1-th control instruction in the second instruction slot is cached to the third instruction slot.
  • the sending the i+1th control instruction to the processing module in response to the completion of the execution of the i-th control instruction by the processing module includes:
  • the i+1-th control instruction in the third instruction slot is sent to the processing module.
  • it further includes:
  • the i+1-th control instruction in the third instruction slot is cached to the fourth instruction slot.
  • the sending the i+1th control instruction to the data write-back module in response to the completion of the execution of the i-th control instruction by the data write-back module includes:
  • the i+1-th control instruction in the fourth instruction slot is sent to the processing module.
  • it further includes:
  • Decode the i-th control instruction in response to the decoded i-th control instruction in the first instruction slot being invalid, buffer the decoded i+1-th control instruction to the first instruction slot; and,
  • the i+1-th control instruction decoded in the first instruction slot is cached in the second instruction slot.
  • control instruction includes a result write-back control instruction and a result non-write-back control instruction.
  • the to-be-processed data is a part of the to-be-processed target data; the to-be-processed target data is divided into at least two parts.
  • the result not writing back control instruction is used to instruct the processing module to cache the processing result
  • the result write-back control instruction is used to instruct the processing module to send all processing results related to the target data to be processed to the data write-back module.
  • the step S102 includes:
  • control instruction is not written back, and the corresponding data to be processed is processed, and the processing result is obtained and cached;
  • the corresponding to-be-processed data is processed to obtain the processing result, and all the processing results related to the to-be-processed target data are integrated and sent to the data write-back module.
  • it further includes:
  • control instruction is a control instruction not to write back the result, receiving an end signal sent by the processing module after buffering the processing result;
  • control instruction is the result write-back control instruction
  • the instruction or the result write-back control instruction is executed in the data processing device.
  • it further includes:
  • the end signal corresponding to the control instruction is returned to the external control module.
  • it further includes:
  • the currently received end signal is buffered.
  • the data to be processed includes the object to be processed and operating parameters.
  • the object to be processed includes any one of the following: image, audio, or text;
  • the operating parameter includes any one of the following: a convolution kernel, a pooling parameter, or an activation function.
  • the data loading module includes an object loading unit and a parameter loading unit.
  • the step S101 includes:
  • the operating parameter corresponding to the i-th control instruction is loaded by the parameter loading unit.
  • the step S102 includes:
  • the step S102 includes:
  • the object to be processed and the operating parameters corresponding to the i-th control instruction are respectively written into the systolic array, and the object to be processed and the operating parameter are combined through the systolic array.
  • the parameters are calculated and the processing result is obtained.
  • an embodiment of the present application also provides an accelerator, which includes any of the above-mentioned devices.
  • the accelerator can be applied to various neural networks, such as convolutional neural networks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiments of the present application provide a data processing device, a data processing method, and an accelerator. Said device comprises a control module, a data loading module and a processing module; the data loading module loads, in response to a control instruction of the control module, data to be processed so as to allow the processing module to process same; the processing module processes, in response to a control instruction of the control module, said data; and the control module controls the data loading module and the processing module to execute different control instructions at the same time. In this embodiment, the control module controls the data loading module and the processing module to execute different control instructions at the same time, increasing the utilization rate of processing resources and avoiding waste of processing resources caused by a waiting process.

Description

数据处理装置、数据处理方法以及加速器Data processing device, data processing method and accelerator 技术领域Technical field
本申请涉及计算机数据处理领域,尤其涉及一种数据处理装置、数据处理方法以及加速器。This application relates to the field of computer data processing, and in particular to a data processing device, a data processing method, and an accelerator.
背景技术Background technique
随着技术的推进,各类产品实现算法或者程序产品朝精细化方向发展,使得各类产品实现算法或者程序产品的处理过程较为复杂繁琐,需要耗费较大的计算或处理资源,因此如何保证处理或计算资源的综合利用成为一个亟待解决的技术问题。With the advancement of technology, all kinds of products realize algorithms or program products are developing in the direction of refinement, making the processing process of all kinds of products realizing algorithms or program products more complicated and cumbersome, which requires a lot of calculation or processing resources, so how to ensure the processing Or the comprehensive utilization of computing resources has become a technical problem to be solved urgently.
作为例子,卷积神经网络(Convolutional Neural Network,CNN)是一种复杂且非线性的假设模型,使用的模型参数通过训练学习得到,具有拟合数据的能力。卷积神经网络算法能够应用在机器视觉和自然语言处理等场景,CNN算法在嵌入式系统实现时,由于神经网络的处理对资源的消耗较大,需要充分考虑计算资源以及实时性。因此,有必要提高神经网络处理的计算资源利用率。As an example, a convolutional neural network (Convolutional Neural Network, CNN) is a complex and non-linear hypothetical model. The model parameters used are obtained through training and learning, and it has the ability to fit data. Convolutional neural network algorithms can be applied to scenarios such as machine vision and natural language processing. When the CNN algorithm is implemented in an embedded system, due to the large resource consumption of neural network processing, it is necessary to fully consider computing resources and real-time performance. Therefore, it is necessary to improve the computing resource utilization rate of neural network processing.
发明内容Summary of the invention
有鉴于此,本申请实施例的目的之一是提供一种数据处理装置、数据处理方法以及加速器。In view of this, one of the objectives of the embodiments of the present application is to provide a data processing device, a data processing method, and an accelerator.
首先,根据本申请实施例的第一方面,提供一种数据处理装置,包括控制模块、数据加载模块和处理模块;First, according to the first aspect of the embodiments of the present application, a data processing device is provided, which includes a control module, a data loading module, and a processing module;
所述数据加载模块,响应于所述控制模块的控制指令,加载待处理数据以供所述处理模块进行处理;The data loading module, in response to the control instruction of the control module, loads the data to be processed for processing by the processing module;
所述处理模块,响应于所述控制模块的控制指令,进行待处理数据 的处理;The processing module responds to the control instruction of the control module to process the data to be processed;
所述控制模块,控制所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。The control module controls the data loading module and the processing module to execute different control instructions at the same time.
根据本申请实施例的第二方面,提供一种数据处理方法,应用于数据处理装置上,所述数据处理装置包括数据加载模块和处理模块;所述方法包括:According to a second aspect of the embodiments of the present application, a data processing method is provided, which is applied to a data processing device, the data processing device includes a data loading module and a processing module; the method includes:
响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理;以及,In response to the control instruction, load data through the data loading module for the processing module to perform data processing; and,
响应于所述控制指令,通过所述处理模块进行数据处理;其中,所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。In response to the control instruction, data processing is performed by the processing module; wherein the data loading module and the processing module execute different control instructions at the same time.
根据本申请实施例的第三方面,提供一种加速器,包括第一方面任意一项所述的装置。According to a third aspect of the embodiments of the present application, an accelerator is provided, including the device described in any one of the first aspect.
本申请实施例具有如下有益效果:The embodiments of the present application have the following beneficial effects:
本实施例中,所述控制模块控制所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令,所述处理模块在处理当前的控制指令相应的待处理数据时,所述数据加载模块可能在加载下一条控制指令对应的待处理数据,从而有利于提高处理资源的利用率,避免等待过程造成处理资源的浪费。In this embodiment, the control module controls the data loading module and the processing module to execute different control instructions at the same time. When the processing module processes the data to be processed corresponding to the current control instruction, the data loading The module may be loading the to-be-processed data corresponding to the next control instruction, which is beneficial to improve the utilization of processing resources and avoid the waste of processing resources caused by the waiting process.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the application.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造 性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.
图1是本申请根据一示例性实施例示出的第一种数据处理装置的结构示意图。Fig. 1 is a schematic structural diagram of a first data processing device according to an exemplary embodiment of the present application.
图2是本申请根据一示例性实施例示出的第二种数据处理装置的结构示意图。Fig. 2 is a schematic structural diagram of a second data processing device according to an exemplary embodiment of the present application.
图3是本申请根据一示例性实施例示出的控制指令的执行流程示意图。Fig. 3 is a schematic diagram showing an execution flow of a control instruction according to an exemplary embodiment of the present application.
图4是本申请根据一示例性实施例示出的第三种数据处理装置的结构示意图。Fig. 4 is a schematic structural diagram of a third data processing device according to an exemplary embodiment of the present application.
图5是本申请根据一示例性实施例示出的第四种数据处理装置的结构示意图。Fig. 5 is a schematic structural diagram of a fourth data processing device according to an exemplary embodiment of the present application.
图6是本申请根据一示例性实施例示出第一种控制指令的执行进行示意图。Fig. 6 is a schematic diagram showing the execution of the first control instruction according to an exemplary embodiment of the present application.
图7是本申请根据一示例性实施例示出第二种控制指令的执行进行示意图。Fig. 7 is a schematic diagram showing the execution of a second type of control instruction according to an exemplary embodiment of the present application.
图8是本申请根据一示例性实施例示出的第五种数据处理装置的结构示意图。Fig. 8 is a schematic structural diagram of a fifth data processing device according to an exemplary embodiment of the present application.
图9是本申请根据一示例性实施例示出的控制指令的执行流程示意图。Fig. 9 is a schematic diagram showing an execution flow of a control instruction according to an exemplary embodiment of the present application.
图10是本申请根据一示例性实施例示出的第三种控制指令的执行进行示意图。Fig. 10 is a schematic diagram showing the execution of a third control instruction according to an exemplary embodiment of the present application.
图11是本申请根据一示例性实施例示出的一种数据处理方法的流程示意图。Fig. 11 is a schematic flowchart of a data processing method according to an exemplary embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
基于相关技术中的问题,请参阅图1,本申请实施例提供了一种数据处理装置,图1为本申请根据一示例性实施例示出的第一种数据处理装置的结构示意图。所述装置包括:控制模块11、数据加载模块12和处理模块13。在另一实施方式中,数据加载模块并不限于一个。也就是说,数据加载模块可以具有多个。例如,如果将该数据处理装置用于卷积运算的数据处理,那么数据处理装置包括两个数据加载模块,即特征图加载模块和权重加载模块。Based on a problem in the related art, please refer to FIG. 1. An embodiment of the present application provides a data processing apparatus. FIG. 1 is a schematic structural diagram of a first data processing apparatus according to an exemplary embodiment of the present application. The device includes: a control module 11, a data loading module 12, and a processing module 13. In another embodiment, the data loading module is not limited to one. In other words, there may be multiple data loading modules. For example, if the data processing device is used for data processing of convolution operation, the data processing device includes two data loading modules, namely a feature map loading module and a weight loading module.
所述数据加载模块12,响应于所述控制模块11的控制指令,加载待处理数据以供所述处理模块13进行处理。The data loading module 12, in response to the control instruction of the control module 11, loads the data to be processed for the processing module 13 to process.
所述处理模块13,响应于所述控制模块11的控制指令,进行待处理数据的处理。The processing module 13 responds to the control instruction of the control module 11 to process the data to be processed.
所述控制模块11,控制所述数据加载模块12和所述处理模块13在同一时刻执行不同的控制指令。在另一实施方式中,所述控制模块11控制数据加载模块12提前处理控制指令,而不需要等待上一控制指令在整个数据处理装置中的处理结束。例如,控制模块11不需要等待控制指令x0在处理模块13中的处理结束,控制模块11可以控制数据加载模块12提前处理控制指令x1对应的操作,其中控制指令x0和控制指令x1为顺序执行的控制指令。The control module 11 controls the data loading module 12 and the processing module 13 to execute different control instructions at the same time. In another embodiment, the control module 11 controls the data loading module 12 to process the control instruction in advance, without waiting for the processing of the previous control instruction in the entire data processing device to end. For example, the control module 11 does not need to wait for the end of the processing of the control instruction x0 in the processing module 13, the control module 11 can control the data loading module 12 to process the operation corresponding to the control instruction x1 in advance, where the control instruction x0 and the control instruction x1 are executed in sequence Control instruction.
在本实施例中,所述控制模块11从外部模块接收控制指令,并将所述控制指令发送给所述数据加载模块12和所述处理模块13;所述数据加 载模块12响应于所述控制指令,加载供所述处理模块13处理的待处理数据;所述处理模块13响应于所述控制指令,对所述待处理数据进行处理;其中,为了进一步提高处理资源的综合利用率,所述控制模块11可以控制所述数据加载模块12和所述处理模块13在同一时刻执行不同的控制指令,即所述控制模块11可以控制所述数据加载模块12和所述处理模块13在同一时刻执行非同一条控制指令,所述数据加载模块12在加载完与当前的控制指令相应的待处理数据之后,无需等待所述处理模块13处理完与当前的控制指令相应的待处理数据,所述数据加载模块12即可基于所述控制模块11的控制,直接加载下一条控制指令相应的数据,即是说,所述处理模块13在处理当前的控制指令相应的待处理数据时,所述数据加载模块12在加载下一条控制指令对应的待处理数据,从而有利于提高处理资源的利用率,避免等待过程造成处理资源的浪费。In this embodiment, the control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12 and the processing module 13; the data loading module 12 responds to the control Instruction to load the to-be-processed data for processing by the processing module 13; the processing module 13 to process the to-be-processed data in response to the control instruction; wherein, in order to further improve the comprehensive utilization rate of processing resources, the The control module 11 can control the data loading module 12 and the processing module 13 to execute different control instructions at the same time. That is, the control module 11 can control the data loading module 12 and the processing module 13 to execute at the same time. Not the same control instruction, after the data loading module 12 has loaded the data to be processed corresponding to the current control instruction, there is no need to wait for the processing module 13 to process the data to be processed corresponding to the current control instruction. The loading module 12 can directly load the data corresponding to the next control instruction based on the control of the control module 11, that is, when the processing module 13 processes the data to be processed corresponding to the current control instruction, the data is loaded The module 12 is loading the to-be-processed data corresponding to the next control instruction, thereby helping to improve the utilization rate of processing resources and avoid the waste of processing resources caused by the waiting process.
在一实施例中,所述控制模块11响应于所述数据加载模块12对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块12;以及响应于所述处理模块13对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块13;其中,i为整数。In one embodiment, the control module 11 sends the i+1th control instruction to the data loading module 12 in response to the data loading module 12 completing the execution of the i-th control instruction; and in response to the The processing module 13 finishes executing the i-th control instruction, and sends the i+1-th control instruction to the processing module 13; where i is an integer.
需要说明的是,所述数据加载模块12对第i条控制指令执行完毕是指所述数据加载模块12加载完第i条控制指令对应的待处理数据;所述处理模块13对第i条控制指令执行完毕是指所述处理模块13处理完所述第i条控制指令对应的待处理数据。It should be noted that the completion of the execution of the i-th control instruction by the data loading module 12 means that the data loading module 12 has finished loading the data to be processed corresponding to the i-th control instruction; the processing module 13 controls the i-th control instruction. The completion of the instruction execution means that the processing module 13 has processed the to-be-processed data corresponding to the i-th control instruction.
所述数据加载模块12响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据。所述处理模块13响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。其中,当所述处理模块13响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,若所述数据加载模块12已加载完所述第i条控制指令对应的待处理数据之后,无需等待所述处理模块13对所述第i条控 制指令执行完毕,所述数据加载模块12即可直接接收所述控制模块11发送的第i+1条控制指令,并响应于所述第i+1条控制指令加载第i+1条控制指令对应的待处理数据;本实施例进一步减少所述数据加载模块12等待下一条控制指令的时间,避免等待时间造成的处理资源的浪费。The data loading module 12 loads the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction. In response to the i-th control instruction, the processing module 13 processes the to-be-processed data corresponding to the i-th control instruction to obtain a processing result. Wherein, when the processing module 13 processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, if the data loading module 12 has finished loading the i-th control instruction After the data to be processed corresponding to the instruction, there is no need to wait for the processing module 13 to complete the execution of the i-th control instruction, and the data loading module 12 can directly receive the i+1-th control instruction sent by the control module 11 , And load the data to be processed corresponding to the i+1th control instruction in response to the i+1th control instruction; this embodiment further reduces the time for the data loading module 12 to wait for the next control instruction to avoid waiting time To deal with the waste of resources.
在一实施例中,所述处理模块13包括脉动阵列;所述处理模块13响应于所述第i条控制指令,将所述第i条控制指令对应的待处理数据写入脉动阵列中,通过所述脉动阵列对所述待处理数据进行运算,得到所述处理结果。本实施例实现通过硬件结构来实现对待处理数据的数据过程,有利于提高处理效率。In one embodiment, the processing module 13 includes a systolic array; in response to the i-th control instruction, the processing module 13 writes the to-be-processed data corresponding to the i-th control instruction into the systolic array. The systolic array performs operations on the to-be-processed data to obtain the processing result. In this embodiment, the data process of the data to be processed is realized through the hardware structure, which is beneficial to improve the processing efficiency.
请参阅图2,为本申请根据一示例性实施例示出的第二种数据处理装置的结构示意图。所述装置包括:控制模块11、数据加载模块12、处理模块13和数据写回模块14。Please refer to FIG. 2, which is a schematic structural diagram of a second data processing device according to an exemplary embodiment of this application. The device includes: a control module 11, a data loading module 12, a processing module 13, and a data writing back module 14.
所述数据加载模块12,响应于所述控制模块11的控制指令,加载待处理数据以供所述处理模块13进行处理。The data loading module 12, in response to the control instruction of the control module 11, loads the data to be processed for the processing module 13 to process.
所述处理模块13,响应于所述控制模块11的控制指令,进行待处理数据的处理。The processing module 13 responds to the control instruction of the control module 11 to process the data to be processed.
所述数据写回模块14,响应于所述控制模块11的控制指令,将所述待处理数据的处理结果写入外部存储模块。The data write-back module 14 writes the processing result of the data to be processed into the external storage module in response to the control instruction of the control module 11.
所述控制模块11,控制所述数据加载模块12、所述处理模块13和所述数据写回模块14在同一时刻执行不同的控制指令。The control module 11 controls the data loading module 12, the processing module 13 and the data writing back module 14 to execute different control instructions at the same time.
在本实施例中,所述控制模块11从外部模块接收控制指令,并将所述控制指令发送给所述数据加载模块12、所述处理模块13和所述数据写回模块14;所述数据加载模块12响应于所述控制指令,加载供所述处理模块13处理的待处理数据;所述处理模块13响应于所述控制指令,对所述待处理数据进行处理,得到处理结果;所述数据写回模块14响应于所述控制指令,将所述处理结果写入外部存储模块。In this embodiment, the control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12, the processing module 13, and the data write-back module 14; the data The loading module 12 loads the data to be processed by the processing module 13 in response to the control instruction; the processing module 13 processes the data to be processed in response to the control instruction to obtain a processing result; In response to the control instruction, the data write-back module 14 writes the processing result into the external storage module.
其中,为了进一步提高处理资源的综合利用率,所述控制模块11可以控制所述数据加载模块12、所述处理模块13和所述数据写回模块14在同一时刻执行不同的控制指令,即所述控制模块11可以控制所述数据加载模块12、所述处理模块13和所述数据写回模块14在同一时刻执行非同一条控制指令;所述数据加载模块12在加载完与当前的控制指令相应的待处理数据之后,无需等待所述处理模块13处理完与当前的控制指令相应的待处理数据,所述数据加载模块12即可基于所述控制模块11的控制,直接加载下一条控制指令相应的待处理数据;所述处理模块13在处理完与当前的控制指令相应的数据之后,无需等待所述数据写回模块14将与当前的控制指令相应的处理结果写入外部存储模块,所述处理模块13即可基于所述控制模块11的控制,直接处理下一条控制指令相应的待处理数据;即是说,所述数据写回模块14在将上一条控制指令对应的处理结果写入外部存储模块时,所述处理模块13可能在处理当前的控制指令相应的数据,所述数据加载模块12可能在加载下一条控制指令对应的数据,从而有利于提高处理资源的利用率,避免等待过程造成处理资源的浪费。Among them, in order to further improve the comprehensive utilization of processing resources, the control module 11 may control the data loading module 12, the processing module 13 and the data write-back module 14 to execute different control instructions at the same time, that is, The control module 11 can control the data loading module 12, the processing module 13 and the data write-back module 14 to execute a non-same control instruction at the same time; the data loading module 12 is different from the current control instruction after loading. After the corresponding data to be processed, there is no need to wait for the processing module 13 to process the data to be processed corresponding to the current control instruction, and the data loading module 12 can directly load the next control instruction based on the control of the control module 11 Corresponding to-be-processed data; after the processing module 13 processes the data corresponding to the current control instruction, there is no need to wait for the data write-back module 14 to write the processing result corresponding to the current control instruction into the external storage module, so The processing module 13 can directly process the data to be processed corresponding to the next control instruction based on the control of the control module 11; that is, the data write-back module 14 is writing the processing result corresponding to the previous control instruction When the external storage module is used, the processing module 13 may be processing the data corresponding to the current control instruction, and the data loading module 12 may be loading the data corresponding to the next control instruction, thereby helping to improve the utilization of processing resources and avoid waiting The process causes a waste of processing resources.
在一实施例中,所述控制模块11响应于所述数据加载模块12对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块12;以及响应于所述处理模块13对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块13;以及,响应于所述数据写回模块14对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块14;其中,i为整数。In one embodiment, the control module 11 sends the i+1th control instruction to the data loading module 12 in response to the data loading module 12 completing the execution of the i-th control instruction; and in response to the The processing module 13 completes the execution of the i-th control instruction and sends the i+1th control instruction to the processing module 13; and, in response to the data write-back module 14 completing the execution of the i-th control instruction, sends the i+1 control instructions are sent to the data write-back module 14; where i is an integer.
需要说明的是,所述数据加载模块12对第i条控制指令执行完毕是指所述数据加载模块12加载完第i条控制指令对应的待处理数据;所述处理模块13对第i条控制指令执行完毕是指所述处理模块13处理完所述第i条控制指令对应的待处理数据;所述数据写回模块14对第i条控制指令执行完毕是指所述数据写回模块14完成将第i条控制指令对应的处理结果 写入外部存储模块的操作。It should be noted that the completion of the execution of the i-th control instruction by the data loading module 12 means that the data loading module 12 has finished loading the data to be processed corresponding to the i-th control instruction; the processing module 13 controls the i-th control instruction. The completion of the instruction execution means that the processing module 13 has processed the to-be-processed data corresponding to the i-th control instruction; the completion of the data write-back module 14 execution of the i-th control instruction means that the data write-back module 14 is completed The operation of writing the processing result corresponding to the i-th control instruction into the external storage module.
所述数据加载模块12响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据。所述处理模块13响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。所述数据写回模块14响应于所述第i条控制指令,将所述第i条控制指令对应的处理结果写入外部存储模块。The data loading module 12 loads the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction. In response to the i-th control instruction, the processing module 13 processes the to-be-processed data corresponding to the i-th control instruction to obtain a processing result. In response to the i-th control instruction, the data write-back module 14 writes the processing result corresponding to the i-th control instruction into the external storage module.
其中,当所述处理模块13响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,若所述数据加载模块12已加载完所述第i条控制指令对应的待处理数据之后,无需等待所述处理模块13对所述第i条控制指令执行完毕,所述数据加载模块12即可直接接收所述控制模块11发送的第i+1条控制指令,并加载第i+1条控制指令对应的待处理数据;本实施例进一步减少所述数据加载模块12的等待时间,避免等待时间造成的处理资源的浪费。Wherein, when the processing module 13 processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, if the data loading module 12 has finished loading the i-th control instruction After the data to be processed corresponding to the instruction, there is no need to wait for the processing module 13 to complete the execution of the i-th control instruction, and the data loading module 12 can directly receive the i+1-th control instruction sent by the control module 11 , And load the to-be-processed data corresponding to the i+1th control instruction; this embodiment further reduces the waiting time of the data loading module 12, and avoids waste of processing resources caused by the waiting time.
进一步地,当所述数据写回模块14响应于所述第i条控制指令将所述第i条控制指令对应的处理结果写入外部存储模块时,若所述处理模块13已处理完所述第i条控制指令对应的待处理数据,无需等待所述数据写回模块14对所述第i条控制指令执行完毕,所述处理模块13即可直接接收所述控制模块11发送的第i+1条控制指令,并响应于所述第i+1条控制指令,对所述第i+1条控制指令对应的待处理数据进行处理;本实施例进一步减少所述数据处理模块13的等待时间,避免等待时间造成的处理资源的浪费。Further, when the data write-back module 14 writes the processing result corresponding to the i-th control instruction into the external storage module in response to the i-th control instruction, if the processing module 13 has processed the The data to be processed corresponding to the i-th control instruction does not need to wait for the data write-back module 14 to complete the execution of the i-th control instruction, and the processing module 13 can directly receive the i+th sent by the control module 11. 1 control instruction, and in response to the i+1 control instruction, process the data to be processed corresponding to the i+1 control instruction; this embodiment further reduces the waiting time of the data processing module 13 , To avoid waste of processing resources caused by waiting time.
在一个例子中,请参阅图3,为所述数据加载模块12、所述处理模块13以及所述数据写回模块14处理控制指令的时间示意图;所述数据加载模块12在加载完控制指令1对应的待处理数据之后,无需等待所述处理模块13对控制指令1对应的待处理数据处理完毕以及所述数据写回模块14对控制指令1对应的处理结果写入完毕,即可直接获取控制指令2并加 载控制指令2对应的待处理数据,从而进一步减少所述数据加载模块12等待下一条控制指令的时间,避免等待时间造成的处理资源的浪费;相应地,所述处理模块13在处理完控制指令1对应的待处理数据,得到处理结果之后,无需等待所述数据写回模块14将所述处理结果写入外部存储模块,即可直接获取控制指令2并处理控制指令2对应的待处理数据,从而进一步减少所述处理模块13等待下一条控制指令的时间,避免等待时间造成的处理资源的浪费。In an example, please refer to FIG. 3, which is a schematic diagram showing the processing time of the data loading module 12, the processing module 13, and the data writing back module 14; the data loading module 12 has finished loading the control instruction 1 After the corresponding to-be-processed data, there is no need to wait for the processing module 13 to process the to-be-processed data corresponding to the control instruction 1 and the data write-back module 14 to write the processing result corresponding to the control instruction 1, you can directly obtain control Instruction 2 and load the data to be processed corresponding to the control instruction 2, thereby further reducing the time for the data loading module 12 to wait for the next control instruction, and avoiding the waste of processing resources caused by the waiting time; accordingly, the processing module 13 is processing After finishing the to-be-processed data corresponding to the control instruction 1, after obtaining the processing result, without waiting for the data write-back module 14 to write the processing result into the external storage module, you can directly obtain the control instruction 2 and process the pending data corresponding to the control instruction 2. The data is processed, thereby further reducing the time for the processing module 13 to wait for the next control instruction, and avoiding the waste of processing resources caused by the waiting time.
可以理解的是,本申请实施例对于所述外部存储模块的具体类型不作任何限制,可依据实际应用场景进行具体设置。例如可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。It can be understood that the embodiment of the present application does not impose any restriction on the specific type of the external storage module, and specific settings can be made according to actual application scenarios. For example, it can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and programmable Read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
在一示例性实施例中,所述数据处理装置可以应用到基于卷积神经网络的相关处理,卷积神经网络是一种机器学习算法,它被广泛应用于目标识别、目标检测以及图像的语义分割等计算机视觉任务。神经神经网络的结构通常包括输入层、一个或多个隐藏层以及输出层,其中,所述隐藏层中的操作包括但不限于卷积运算、池化操作或者激活操作;一般情况下,卷积神经网络中的隐藏层可以按照操作的类型进行命名,比如,进行卷积运算的隐藏层可以归为卷积层、进行池化操作的隐藏层可以归为池化层或者进行激活操作的隐藏层可以归为激活层。本申请实施例所提供的所述数据处理装置可以进行卷积层的卷积运算、池化层的池化操作以及激活层的激活操作,实现通过硬件方式加速深度神经网络的运算过程,减少深度神经网络的运算时间,提高运算效率。In an exemplary embodiment, the data processing device can be applied to related processing based on a convolutional neural network, which is a machine learning algorithm, which is widely used in target recognition, target detection, and image semantics. Computer vision tasks such as segmentation. The structure of a neural neural network usually includes an input layer, one or more hidden layers, and an output layer. The operations in the hidden layer include, but are not limited to, convolution operations, pooling operations, or activation operations; in general, convolution The hidden layers in the neural network can be named according to the type of operation. For example, the hidden layer for convolution operation can be classified as a convolutional layer, and the hidden layer for pooling operation can be classified as a pooling layer or a hidden layer for activation operation. Can be classified as an active layer. The data processing device provided by the embodiment of the present application can perform convolution operation of the convolutional layer, pooling operation of the pooling layer, and activation operation of the activation layer, so as to accelerate the operation process of the deep neural network through hardware and reduce the depth. The calculation time of the neural network improves the calculation efficiency.
其中,所述卷积神经网络所处理的对象包括但不限于图像、音频或文字等,所述卷积神经网络中不同类型的隐藏层对应不同的运行参数,比如所述卷积层的运行参数为卷积核,所述激活层的运行参数为激活函数, 所述池化层的运行参数为池化参数。Wherein, the objects processed by the convolutional neural network include but are not limited to images, audio or text, etc., and different types of hidden layers in the convolutional neural network correspond to different operating parameters, such as the operating parameters of the convolutional layer It is a convolution kernel, the operating parameter of the activation layer is an activation function, and the operating parameter of the pooling layer is a pooling parameter.
以下以所述卷积神经网络应用于图像处理领域,所述数据处理装置用于进行卷积神经网络中卷积层的卷积运算为例进行说明:所述卷积层的卷积运算过程为基于卷积核对待处理图像进行向量内积运算,得到特征图;则在该场景下,所述控制模块11向所述数据加载模块12、所述处理模块13以及所述数据写回模块14分别发送卷积运算控制指令,所述数据加载模块12响应于所述控制模块11的卷积运算控制指令,加载待处理图像和卷积核;所述处理模块13响应于所述控制模块11的卷积运算控制指令,通过所述卷积核对所述待处理图像进行卷积运算,得到特征图;所述数据写回模块14响应于所述控制模块11的卷积运算控制指令,将所述特征图写入外部存储模块;其中,所述控制模块11,控制所述数据加载模块12、所述处理模块13和所述数据写回模块14在同一时刻可以执行不同的卷积运算控制指令,通过卷积运算控制指令并行处理过程,有效提高运算效率同时也提高计算资源的利用率。Hereinafter, the convolutional neural network is applied to the field of image processing, and the data processing device is used to perform the convolution operation of the convolutional layer in the convolutional neural network as an example for description: the convolutional operation process of the convolutional layer is Perform vector inner product operations on the image to be processed based on the convolution kernel to obtain a feature map; then in this scenario, the control module 11 sends the data loading module 12, the processing module 13, and the data write-back module 14 respectively Sending a convolution operation control instruction, the data loading module 12 loads the image to be processed and the convolution kernel in response to the convolution operation control instruction of the control module 11; the processing module 13 responds to the convolution operation of the control module 11 The product operation control instruction is to perform a convolution operation on the image to be processed through the convolution kernel to obtain a feature map; the data write-back module 14 responds to the convolution operation control instruction of the control module 11 to convert the feature Figure is written into an external storage module; where the control module 11 controls the data loading module 12, the processing module 13 and the data write-back module 14 to execute different convolution operation control instructions at the same time, through Convolution operation controls the parallel processing of instructions, which effectively improves operation efficiency and at the same time increases the utilization rate of computing resources.
进一步地,为了提高数据加载效率,所述数据加载模块12可以包括对象加载单元和参数加载单元,所述对象加载单元用于加载所述待处理图像,所述参数加载单元用于加载所述卷积核,对象加载单元和参数加载单元同时进行加载,有利于提高加载效率。Further, in order to improve the efficiency of data loading, the data loading module 12 may include an object loading unit and a parameter loading unit. The object loading unit is used to load the image to be processed, and the parameter loading unit is used to load the volume. The product core, the object loading unit and the parameter loading unit are loaded at the same time, which is beneficial to improve the loading efficiency.
在一实施例中,所述控制模块11包括用于缓存所述控制指令的至少一指令槽。在一个实施方式中,所述指令槽包括指令缓存标志和控制状态信号的集合。其中,指令缓存标志用于指示在指令槽中缓存的指令是否有效。例如,指令缓存标志可以指示指令槽缓存的是哪一条控制指令,以及该缓存的控制指令是否有效。控制状态信号的集合用于表示对应模块的工作状态,或者表示与所述指令槽相关的其他指令槽的工作状态。在一个实施例中,工作状态是指对应的模块的处理或操作是否完成,或者,对应的指令槽的指令缓存操作是否完成。通过至少一指令槽的设置,可以协调各个指令的执行,从而使得数据处理装置中的各个子模块能够在同一时刻处 理不同指令对应的操作。因此,提高了数据处理装置的工作效率。In an embodiment, the control module 11 includes at least one instruction slot for caching the control instruction. In one embodiment, the instruction slot includes a set of instruction cache flags and control status signals. Among them, the instruction cache flag is used to indicate whether the instruction cached in the instruction slot is valid. For example, the instruction cache flag can indicate which control instruction is cached by the instruction slot and whether the cached control instruction is valid. The set of control state signals is used to indicate the working state of the corresponding module, or indicate the working state of other instruction slots related to the instruction slot. In one embodiment, the working status refers to whether the processing or operation of the corresponding module is completed, or whether the instruction cache operation of the corresponding instruction slot is completed. Through the setting of at least one instruction slot, the execution of each instruction can be coordinated, so that each sub-module in the data processing device can process operations corresponding to different instructions at the same time. Therefore, the work efficiency of the data processing device is improved.
所述控制模块11响应于所述数据加载模块12对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述数据加载模块12;以及响应于所述处理模块13对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述处理模块13;以及响应于所述数据写回模块14对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述数据写回模块14。In response to the completion of the execution of the i-th control instruction by the data loading module 12, the control module 11 sends the i+1th control instruction buffered in the instruction slot to the data loading module 12; and in response to all The processing module 13 completes the execution of the i-th control instruction, and sends the i+1-th control instruction cached in the instruction slot to the processing module 13; and responds to the data write-back module 14 for the i-th control instruction After the control instruction is executed, the i+1th control instruction buffered in the instruction slot is sent to the data write-back module 14.
进一步地,所述指令槽中可以记载该指令槽中缓存的控制指令的状态,所述控制模块11可以根据至少一指令槽记载的所述控制指令的状态,控制所述控制指令的执行进程。在一个例子中,以所述数据加载模块12、处理模块13和数据写回模块14均分别对应一指令槽,以其中一个指令槽对应所述数据加载模块12为例进行说明,当所述控制模块11将所述指令槽中的第i条控制指令发送至所述数据加载模块12之后,相应地,所述指令槽记载的所述控制指令的状态表征第i条控制指令无效,此时,所述控制模块11可以将第i+1条控制指令缓存至该指令槽。相应地,当第i+1条控制指令缓存值该指令槽之后,所述指令槽记载的所述控制指令的状态被更改为表征第i+1条控制指令有效,其指示所述第i+1条控制指令还未发送至所述数据加载模块12。本实施例中,通过所述指令槽记载的所述控制指令的状态,保证了所述控制模块11可以有序缓存不同的控制指令,以保证向所述数据加载模块12、处理模块13和数据写回模块14有序发送不同的控制指令。Further, the command slot can record the status of the control command cached in the command slot, and the control module 11 can control the execution process of the control command according to the status of the control command recorded in at least one command slot. In an example, it is assumed that the data loading module 12, the processing module 13 and the data writing back module 14 each correspond to an instruction slot, and one of the instruction slots corresponds to the data loading module 12 as an example for description. When the control After the module 11 sends the i-th control instruction in the instruction slot to the data loading module 12, correspondingly, the state of the control instruction recorded in the instruction slot indicates that the i-th control instruction is invalid. At this time, The control module 11 can cache the i+1th control instruction to the instruction slot. Correspondingly, after the i+1th control instruction caches the value of the instruction slot, the state of the control instruction recorded in the instruction slot is changed to indicate that the i+1th control instruction is valid, which indicates that the i+th control instruction is valid. One control command has not been sent to the data loading module 12 yet. In this embodiment, the state of the control instruction recorded in the instruction slot ensures that the control module 11 can cache different control instructions in an orderly manner, so as to ensure that the data loading module 12, the processing module 13 and the data The write-back module 14 sends different control commands in an orderly manner.
在一种实现方式中,所述指令槽中记载的所述控制模块11的状态可以通过一控制指令状态信号来表示,该控制指令状态信号指示对应的指令槽中缓存的控制指令是否有效。在一个例子中,当所述指令槽中的控制指令状态信号指示该指令槽中缓存的第i条控制指令无效时,所述控制模块11即可将第i+1条控制指令缓存至该指令槽,同时将所述控制指令状态信号的状态值置为表征有效的值,其用于指示所述指令槽中缓存的第i+1条 控制指令有效。In an implementation manner, the state of the control module 11 recorded in the instruction slot may be represented by a control instruction state signal, which indicates whether the control instruction buffered in the corresponding instruction slot is valid. In an example, when the control instruction status signal in the instruction slot indicates that the i-th control instruction cached in the instruction slot is invalid, the control module 11 can cache the i+1-th control instruction to the instruction At the same time, the status value of the control command status signal is set to a value that characterizes validity, which is used to indicate that the i+1th control command cached in the command slot is valid.
其中,所述数据加载模块12、处理模块13和数据写回模块14均分别对应一指令槽,则所述指令槽中缓存的控制指令是否有效,与所述至少一指令槽对应的模块是否完成该指令槽对应的控制指令的操作有关。Wherein, the data loading module 12, the processing module 13 and the data writing back module 14 respectively correspond to an instruction slot, whether the control instruction cached in the instruction slot is valid, and whether the module corresponding to the at least one instruction slot is completed The operation of the control instruction corresponding to the instruction slot is related.
作为其中一种实现方式,可以通过一控制指令完成信号来指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作;其中,所述“至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作”是指所述至少一指令槽对应的模块是否完成接收该指令槽对应的控制指令。As one of the implementation manners, a control instruction completion signal may be used to indicate whether the module corresponding to the at least one instruction slot has completed the operation of the control instruction corresponding to the at least one instruction slot; wherein, the "at least one instruction slot corresponds to "Whether the module of "has completed the operation of the control instruction corresponding to the at least one instruction slot" refers to whether the module corresponding to the at least one instruction slot has completed receiving the control instruction corresponding to the instruction slot.
所述控制模块11包括用于缓存所述控制指令的至少一指令槽;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示对应的模块完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效;否则,所述控制指令状态信号指示对应的指令槽中缓存的控制指令有效。本实施例通过控制指令状态信号和控制指令完成信号,保证了所述控制模块11可以有序缓存不同的控制指令,以保证向所述数据加载模块12、处理模块13和数据写回模块14有序发送不同的控制指令。The control module 11 includes at least one instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the corresponding module to complete the corresponding When the control instruction is operated, the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is invalid; otherwise, the control instruction status signal indicates that the control instruction cached in the corresponding instruction slot is valid. In this embodiment, through the control command status signal and the control command completion signal, it is ensured that the control module 11 can cache different control commands in an orderly manner, so as to ensure that the data loading module 12, the processing module 13 and the data writing back module 14 are available. Send different control commands in sequence.
进一步地,所述数据加载模块12、处理模块13和数据写回模块14均分别对应一指令槽,且所述控制指令在所述指令槽中的缓存过程以对应于所述数据加载模块12、处理模块13和数据写回模块14的顺序依次传递;在一个例子中,所述控制模块11包括对应于所述数据加载模块12的第二指令槽111,对应于所述数据处理模块13的第三指令槽112以及对应于所述数据写回模块14的第四指令槽113;当第二指令槽111中第i条控制指令无效时,将第i+1条控制指令缓存至第二指令槽111,当第三指令槽112中第i条控制指令无效时,将第二指令槽111中缓存的第i+1条控制指令缓存至第三指令槽112,当第四指令槽113中第i条控制指令无效时,将第三指令槽112中缓存的第i+1条控制指令缓存至第四指令槽113,可以看出, 所述第i+1条控制指令依次缓存在第二指令槽111、第三指令槽112以及第四指令槽113中。Further, the data loading module 12, the processing module 13 and the data write-back module 14 respectively correspond to an instruction slot, and the caching process of the control instruction in the instruction slot corresponds to the data loading module 12, The processing module 13 and the data write-back module 14 are transferred in sequence; in one example, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, corresponding to the first instruction slot of the data processing module 13 The third instruction slot 112 and the fourth instruction slot 113 corresponding to the data write-back module 14; when the i-th control instruction in the second instruction slot 111 is invalid, the i+1th control instruction is cached to the second instruction slot 111. When the i-th control instruction in the third instruction slot 112 is invalid, the i+1-th control instruction cached in the second instruction slot 111 is cached to the third instruction slot 112, and when the i-th control instruction in the fourth instruction slot 113 is When a control instruction is invalid, the i+1th control instruction cached in the third instruction slot 112 is cached to the fourth instruction slot 113. It can be seen that the i+1th control instruction is sequentially cached in the second instruction slot 111. In the third instruction slot 112 and the fourth instruction slot 113.
则所述指令槽中缓存的控制指令是否有效,除了与所述至少一指令槽对应的模块是否完成该指令槽对应的控制指令的操作有关,还与所述至少一指令槽是否完成对应的控制指令的操作有关;所述“至少一指令槽是否完成对应的控制指令的操作”是指所述指令槽中的控制指令是否被缓存至另一指令槽中。Then whether the control instruction cached in the instruction slot is valid, in addition to whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the instruction slot, is also related to whether the at least one instruction slot completes the corresponding control The operation of the instruction is related; the "whether at least one instruction slot completes the operation of the corresponding control instruction" refers to whether the control instruction in the instruction slot is cached in another instruction slot.
在一种实现方式中,可以设置两个控制指令完成信号,其中一个控制指令完成信号用于指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作,另一个控制指令完成信号用于指示所述至少一指令槽是否完成对应的控制指令的操作。In an implementation manner, two control instruction completion signals can be set, one of which is used to indicate whether the module corresponding to the at least one instruction slot has completed the operation of the control instruction corresponding to the at least one instruction slot, and the other A control instruction completion signal is used to indicate whether the at least one instruction slot has completed the operation of the corresponding control instruction.
所述控制模块11包括用于缓存所述控制指令的至少一指令槽;所述指令槽包括一控制指令状态信号和至少两个控制指令完成信号;其中,当其中一个控制指令完成信号指示对应的模块完成对应的控制指令的操作,且另一个控制指令完成信号指示所述至少一指令槽完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效;否则,所述控制指令状态信号指示对应的指令槽中缓存的控制指令有效。本实施例通过控制指令状态信号和两个控制指令完成信号,保证了所述控制模块11可以有序缓存不同的控制指令,以保证向所述数据加载模块12、处理模块13和数据写回模块14有序发送不同的控制指令。The control module 11 includes at least one instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and at least two control instruction completion signals; wherein, when one of the control instruction completion signals indicates the corresponding When the module completes the operation of the corresponding control instruction, and another control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid; Otherwise, the control command status signal indicates that the control command buffered in the corresponding command slot is valid. This embodiment uses the control command status signal and the two control command completion signals to ensure that the control module 11 can cache different control commands in an orderly manner to ensure that the data loading module 12, the processing module 13 and the data write-back module 14 Send different control commands in an orderly manner.
在一个例子中,请参阅图4,所述控制模块11包括对应于所述数据加载模块12的第二指令槽111,对应于所述数据处理模块13的第三指令槽112以及对应于所述数据写回模块14的第四指令槽113。在一实施方式中,第二指令槽111、第三指令槽112以及第四指令槽113均用于:在控制模块11分别在不同时刻将所述第i条控制指令发送完毕之后,分别缓存对应的所述第i+1条控制指令。在另一实施方式中,指令槽可以缓存控制指令在解码后对应的指令控制字,而并不需要缓存对应的控制指令。所述 第二指令槽111用于在所述控制模块11将所述第i条控制指令发送完毕之后,缓存发送至所述数据加载模块12的所述第i+1条控制指令;所述第三指令槽112用于在所述控制模块11将所述第i条控制指令发送完毕之后,缓存发送至所述处理模块13的所述第i+1条控制指令;所述第四指令槽113用于在所述控制模块11将所述第i条控制指令发送完毕之后,缓存发送至所述数据写回模块14的所述第i+1条控制指令。需要说明的是,上述各个指令槽缓存第i+1条控制指令的时间点可以各不相同。所述“将所述第i条控制指令发送完毕”是指所述控制模块11在将第i条控制指令发送给所述第二指令槽111对应的所述数据加载模块12、所述第三指令槽112对应的所述处理模块13或所述第四指令槽113对应的所述数据写回模块14之后,所述数据加载模块12、所述处理模块13或所述数据写回模块14完成接收所述第i条控制指令。In an example, referring to FIG. 4, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13 and corresponding to the The data is written back to the fourth instruction slot 113 of the module 14. In one embodiment, the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are all used to: after the control module 11 has sent the i-th control instruction at different times, respectively, cache the corresponding The i+1th control instruction. In another embodiment, the instruction slot can cache the instruction control word corresponding to the control instruction after decoding, and the corresponding control instruction does not need to be cached. The second instruction slot 111 is used to buffer the i+1th control instruction sent to the data loading module 12 after the control module 11 has sent the i-th control instruction; The three instruction slot 112 is used to buffer the i+1th control instruction sent to the processing module 13 after the control module 11 has sent the i-th control instruction; the fourth instruction slot 113 It is used to buffer the (i+1)th control instruction sent to the data write-back module 14 after the control module 11 finishes sending the i-th control instruction. It should be noted that the time points at which the i+1th control instruction is cached in each of the above-mentioned instruction slots may be different. The "sending the i-th control instruction" means that the control module 11 is sending the i-th control instruction to the data loading module 12 and the third instruction slot 111 corresponding to the After the data write-back module 14 corresponding to the processing module 13 or the fourth instruction slot 113 corresponding to the instruction slot 112, the data loading module 12, the processing module 13, or the data write-back module 14 is completed Receive the i-th control instruction.
对于第二指令槽111,所述控制模块11响应于所述第二指令槽111中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽111;以及响应于所述数据加载模块12对第i条控制指令执行完毕,将所述第二指令槽111中的第i+1条控制指令发送至所述数据加载模块12。For the second instruction slot 111, the control module 11 caches the i+1th control instruction to the second instruction slot 111 in response to the i-th control instruction in the second instruction slot 111 being invalid; And in response to the completion of the execution of the i-th control instruction by the data loading module 12, the i+1-th control instruction in the second instruction slot 111 is sent to the data loading module 12.
其中,所述第二指令槽111包括第二有效信号、数据加载模块完成信号以及第三指令槽完成信号;所述第二有效信号用于指示所述第二指令槽111中缓存的所述控制指令是否有效;所述数据加载模块完成信号用于指示所述数据加载模块12是否完成接收所述控制指令(即所述控制模块11是否已经将所述控制指令发送至所述数据加载模块12);所述第三指令槽完成信号用于指示所述第三指令槽112是否完成缓存所述控制指令(即所述控制模块11是否已经将所述第二指令槽111中的所述缓存指令缓存至所述第三指令槽112)。Wherein, the second command slot 111 includes a second valid signal, a data loading module completion signal, and a third command slot completion signal; the second valid signal is used to instruct the control of the buffer in the second command slot 111 Whether the instruction is valid; the data loading module completion signal is used to indicate whether the data loading module 12 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the data loading module 12) The third instruction slot completion signal is used to indicate whether the third instruction slot 112 has completed caching the control instruction (that is, whether the control module 11 has cached the cache instruction in the second instruction slot 111 To the third instruction slot 112).
当第i条控制指令已发送至所述数据加载模块12以及已缓存至所述第三指令槽112时,所述数据加载模块完成信号以及所述第三指令槽完成信号的状态值被置为表征完成的值,与此同时,所述第二有效信号的状态 值被置为表征无效的值,此时,所述控制模块11可以将所述第i+1条控制指令缓存至所述第二指令槽111;当所述第i+1条控制指令缓存至所述第二指令槽111时,所述第i+1条控制指令还未发送至所述数据加载模块12以及缓存至所述第三指令槽112,因此,所述第二有效信号的状态值被置为表征有效的值,且所述数据加载模块完成信号以及所述第三指令槽完成信号的状态值被置为表征未完成的值。When the i-th control instruction has been sent to the data loading module 12 and cached in the third instruction slot 112, the status values of the data loading module completion signal and the third instruction slot completion signal are set to At the same time, the state value of the second valid signal is set to a value representing invalid. At this time, the control module 11 may cache the i+1th control instruction to the first The second instruction slot 111; when the i+1th control instruction is cached in the second instruction slot 111, the i+1th control instruction has not been sent to the data loading module 12 and cached to the The third command slot 112, therefore, the state value of the second valid signal is set to a value representing valid, and the state values of the data loading module completion signal and the third command slot completion signal are set to be representing not The completed value.
对于第三指令槽112,所述控制模块11响应于所述第三指令槽112中的第i条控制指令无效,将所述第二指令槽111中的第i+1条控制指令缓存至所述第三指令槽112;以及响应于所述处理模块13对第i条控制指令执行完毕,将所述第三指令槽112中的第i+1条控制指令发送至所述处理模块13。For the third instruction slot 112, the control module 11, in response to the i-th control instruction in the third instruction slot 112 being invalid, caches the i+1-th control instruction in the second instruction slot 111 to all The third instruction slot 112; and in response to the processing module 13 completing the execution of the i-th control instruction, the i+1th control instruction in the third instruction slot 112 is sent to the processing module 13.
其中,所述第三指令槽112包括第三有效信号、处理模块完成信号以及第四指令槽113完成信号。所述第三有效信号用于指示所述第三指令槽112中缓存的所述控制指令是否有效;所述处理模块完成信号用于指示所述处理模块13是否完成接收所述控制指令(即所述控制模块11是否已经将所述控制指令发送至所述处理模块13);所述第四指令槽113完成信号用于指示所述第四指令槽113是否完成缓存所述控制指令(即所述控制模块11是否已经将所述第三指令槽112中的所述缓存指令缓存至所述第四指令槽113)。Wherein, the third instruction slot 112 includes a third valid signal, a processing module completion signal, and a fourth instruction slot 113 completion signal. The third valid signal is used to indicate whether the control instruction buffered in the third instruction slot 112 is valid; the processing module completion signal is used to indicate whether the processing module 13 has completed receiving the control instruction (that is, the Whether the control module 11 has sent the control instruction to the processing module 13); the fourth instruction slot 113 completion signal is used to indicate whether the fourth instruction slot 113 has completed buffering the control instruction (that is, the Whether the control module 11 has cached the cache instruction in the third instruction slot 112 to the fourth instruction slot 113).
当第i条控制指令已发送至所述处理模块13以及已缓存至所述第四指令槽113时,所述处理模块完成信号以及所述第四指令槽113完成信号的状态值被置为表征完成的值,与此同时,所述第三有效信号的状态值被置为表征无效的值,所述控制模块11可以将所述第i+1条控制指令缓存至所述第三指令槽112;当所述第i+1条控制指令缓存至所述第三指令槽112时,所述第i+1条控制指令还未发送至所述处理模块13以及缓存至所述第四指令槽113,所述第三有效信号的状态值被置为表征有效的值,且所述处理模块完成信号以及所述第四指令槽113完成信号的状态值被置为表征 未完成的值。When the i-th control instruction has been sent to the processing module 13 and has been cached in the fourth instruction slot 113, the status values of the processing module completion signal and the fourth instruction slot 113 completion signal are set to represent At the same time, the state value of the third valid signal is set to a value characterizing invalidity, and the control module 11 may cache the i+1th control instruction to the third instruction slot 112 ; When the i+1th control instruction is cached to the third instruction slot 112, the i+1th control instruction has not been sent to the processing module 13 and cached to the fourth instruction slot 113 The state value of the third valid signal is set to a value that characterizes validity, and the state value of the processing module completion signal and the state value of the fourth command slot 113 completion signal is set to a value that characterizes incompleteness.
对于第四指令槽113,所述控制模块11响应于所述第四指令槽113中的第i条控制指令无效,将所述第三指令槽112中的第i+1条控制指令缓存至所述第四指令槽113;以及响应于所述数据写回模块14对第i条控制指令执行完毕,将所述第四指令槽113中的第i+1条控制指令发送至所述处理模块13。For the fourth instruction slot 113, in response to the i-th control instruction in the fourth instruction slot 113 being invalid, the control module 11 caches the i+1-th control instruction in the third instruction slot 112 to all The fourth instruction slot 113; and in response to the completion of the execution of the i-th control instruction by the data write-back module 14, the i+1th control instruction in the fourth instruction slot 113 is sent to the processing module 13 .
其中,所述第四指令槽113包括第四有效信号以及数据写回模块14完成信号;所述第四有效信号用于指示所述第四指令槽113中缓存的所述控制指令是否有效;所述数据写回模块14完成信号用于指示所述数据写回模块14是否完成接收所述控制指令(即所述控制模块11是否已经将所述控制指令发送至所述数据写回模块14)。Wherein, the fourth command slot 113 includes a fourth valid signal and a data write-back module 14 completion signal; the fourth valid signal is used to indicate whether the control command buffered in the fourth command slot 113 is valid; The data write-back module 14 completion signal is used to indicate whether the data write-back module 14 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the data write-back module 14).
当第i条控制指令已发送给所述数据写回模块14时,所述数据写回模块14完成信号的状态值被置为表征完成的值,与此同时,所述第四有效信号的状态值被置为表征无效的值,所述控制模块11可以将所述第i+1条控制指令缓存至所述第四指令槽113;当所述第i+1条控制指令缓存至所述第四指令槽113时,所述i+1还未被所述控制模块11发送至所述数据写回模块14,此时所述第四有效信号的状态值被置为表征有效的值,且所述数据写回模块14完成信号的状态值被置为表征未完成的值。When the i-th control command has been sent to the data write-back module 14, the status value of the completion signal of the data write-back module 14 is set to the value that characterizes the completion, and at the same time, the status of the fourth valid signal The value is set to a value that characterizes invalidity. The control module 11 may cache the i+1th control instruction to the fourth instruction slot 113; when the i+1th control instruction is cached to the fourth instruction slot 113; In the case of four command slots 113, the i+1 has not been sent by the control module 11 to the data write-back module 14. At this time, the state value of the fourth valid signal is set to a valid value, and The state value of the completion signal of the data write-back module 14 is set to a value representing incompleteness.
本实施例中,通过在所述控制模块11中设置对应于所述数据加载模块12的第二指令槽111,对应于所述处理模块13的第三指令槽112以及对应于所述数据写回模块14的第四指令槽113,使得所述控制模块11可以通过第二指令槽111、第三指令槽112以及第四指令槽113分别对所述数据加载模块12、所述处理模块13以及所述数据写回模块14进行控制,保证在同一时刻所述数据加载模块12、所述处理模块13和所述数据写回模块14可以执行不同的控制指令,从而减少了相应的等待时间,有利于提高处理资源的利用率。In this embodiment, by setting the second instruction slot 111 corresponding to the data loading module 12 in the control module 11, the third instruction slot 112 corresponding to the processing module 13 and corresponding to the data write-back The fourth instruction slot 113 of the module 14 enables the control module 11 to load the data module 12, the processing module 13 and the data through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, respectively. The data write-back module 14 performs control to ensure that the data loading module 12, the processing module 13 and the data write-back module 14 can execute different control commands at the same time, thereby reducing the corresponding waiting time and benefiting Improve the utilization of processing resources.
在另一例子中,请参阅图5,为本申请根据一示例性实施例示出的 第四种数据处理装置的结构示意图,图5所示的实施例,在图4所示的实施例的基础上,所述控制模块11还包括第一指令槽114,第一指令槽114用于缓存解码后的控制指令;其中,所述解码后的控制指令可以通过相应的指令控制字来表示。In another example, please refer to FIG. 5, which is a schematic structural diagram of a fourth data processing device according to an exemplary embodiment of this application. The embodiment shown in FIG. 5 is based on the embodiment shown in FIG. 4 Above, the control module 11 also includes a first instruction slot 114, which is used to cache decoded control instructions; wherein, the decoded control instructions can be represented by corresponding instruction control words.
对于第一指令槽114,所述控制模块11对第i条控制指令进行解码;响应于所述第一指令槽114中解码后的第i条控制指令无效,将解码后的第i+1条控制指令缓存至第一指令槽114;以及响应于所述第二指令槽111中解码后的第i条控制指令无效,将所述第一指令槽114中解码后的第i+1条控制指令缓存至所述第二指令槽111。For the first instruction slot 114, the control module 11 decodes the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot 114 being invalid, the decoded i+1-th control instruction is invalid The control instruction is cached in the first instruction slot 114; and in response to the i-th control instruction decoded in the second instruction slot 111 being invalid, the i+1-th control instruction decoded in the first instruction slot 114 Cache to the second instruction slot 111.
其中,所述第一指令槽114包括第一有效信号以及第二指令槽完成信号;所述第一有效信号用于指示所述第一指令槽114中缓存的所述控制指令是否有效;所述第二指令槽完成信号用于指示所述第二指令槽111是否完成缓存所述控制指令(即所述控制模块11是否已经将所述第一指令槽114中的所述缓存指令缓存至所述第二指令槽111)。The first command slot 114 includes a first valid signal and a second command slot completion signal; the first valid signal is used to indicate whether the control command buffered in the first command slot 114 is valid; The second instruction slot completion signal is used to indicate whether the second instruction slot 111 has completed caching the control instruction (that is, whether the control module 11 has cached the cache instruction in the first instruction slot 114 to the The second command slot 111).
当解码后的第i条控制指令已发送至所述第二指令槽111时,所述第二指令槽完成信号的状态值被置为表征完成的值,与此同时所述第一有效信号的状态值被置为表征无效的值,所述控制模块11可以将解码后的第i+1条控制指令缓存至所述第一指令槽114;当解码后的第i+1条控制指令缓存至所述第一指令槽114时,所述第i+1条控制指令还未被所述控制模块11缓存至所述第二指令槽111,所述第一有效信号的状态值被置为表征有效的值,且所述第二指令槽完成信号的状态值被置为表征未完成的值。When the decoded i-th control command has been sent to the second command slot 111, the status value of the second command slot completion signal is set to a value that characterizes completion, and at the same time, the first valid signal The state value is set to a value characterizing invalidity. The control module 11 can cache the decoded i+1th control instruction to the first instruction slot 114; when the decoded i+1th control instruction is cached to In the first instruction slot 114, the i+1th control instruction has not been cached by the control module 11 to the second instruction slot 111, and the state value of the first valid signal is set to indicate valid And the state value of the completion signal of the second command slot is set to a value that characterizes incompleteness.
则相应地,所述第二指令槽111、所述第三指令槽112以及所述第四指令槽113中缓存的控制指令为解码后的控制指令。关于所述第二指令槽111、所述第三指令槽112以及所述第四指令槽113中缓存的解码后的控制指令的执行进程,可参见如图4所示的实施例中的描述,此处不再赘述。Accordingly, the control instructions cached in the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are decoded control instructions. For the execution process of the decoded control instructions cached in the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, please refer to the description in the embodiment shown in FIG. 4. I won't repeat them here.
本实施例中,通过在所述控制模块11中设置第一指令槽114、对应于所述数据加载模块12的第二指令槽111,对应于所述处理模块13的第 三指令槽112以及对应于所述数据写回模块14的第四指令槽113,使得所述控制模块11可以通过第二指令槽111、第三指令槽112以及第四指令槽113分别对所述数据加载模块12、所述处理模块13以及所述数据写回模块14进行控制,保证在同一时刻所述数据加载模块12、所述处理模块13和所述数据写回模块14可以执行不同的控制指令,从而减少了相应的等待时间,有利于提高处理资源的利用率。In this embodiment, by setting the first instruction slot 114 in the control module 11, the second instruction slot 111 corresponding to the data loading module 12, the third instruction slot 112 corresponding to the processing module 13 and the corresponding In the fourth instruction slot 113 of the data writing back module 14, the control module 11 can load the data into the module 12 and the data through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, respectively. The processing module 13 and the data write-back module 14 are controlled to ensure that the data loading module 12, the processing module 13 and the data write-back module 14 can execute different control commands at the same time, thereby reducing the corresponding The waiting time is conducive to improving the utilization rate of processing resources.
在一示例性的实施例中,本申请实施例所提供的所述数据处理装置可以应用于卷积神经网路对目标对象(所述目标对象包括但不限于图像、音频、视频或文字等)的处理过程中,可以进行卷积层的卷积运算、池化层的池化操作或者激活层的激活操作,实现通过硬件方式加速深度神经网络的运算过程,减少深度神经网络的运算时间,提高运算效率。In an exemplary embodiment, the data processing device provided in this embodiment of the application can be applied to a convolutional neural network to target objects (the target objects include but are not limited to images, audio, video, text, etc.) In the process of processing, the convolution operation of the convolutional layer, the pooling operation of the pooling layer, or the activation operation of the activation layer can be performed to accelerate the calculation process of the deep neural network through hardware, reduce the calculation time of the deep neural network, and improve Operational efficiency.
以下以所述卷积神经网络应用于图像处理领域,所述数据处理装置用于进行卷积神经网络中卷积层的卷积运算为例进行说明:所述控制模块11接收卷积运算控制指令并分发给所述数据加载模块12、所述处理模块13以及所述数据写回模块14。其中,请参阅图6,所述控制模块11对于所述控制指令的分发过程分为4个阶段,分别为解码阶段、加载阶段、执行阶段和存储阶段。本实施例通过上述4个阶段实现对不同的控制指令的执行进程的控制。Hereinafter, the convolutional neural network is applied to the field of image processing, and the data processing device is used to perform the convolution operation of the convolutional layer in the convolutional neural network as an example: the control module 11 receives a convolution operation control instruction And distributed to the data loading module 12, the processing module 13, and the data writing back module 14. Wherein, referring to FIG. 6, the distribution process of the control instruction by the control module 11 is divided into 4 stages, namely, a decoding stage, a loading stage, an execution stage, and a storage stage. This embodiment implements the control of the execution process of different control instructions through the above four stages.
所述解码阶段对应于所述第一指令槽114,所述控制模块11将解码后的控制指令缓存至所述第一指令槽114中。The decoding stage corresponds to the first instruction slot 114, and the control module 11 caches the decoded control instruction into the first instruction slot 114.
所述加载阶段对应于所述第二指令槽111,所述控制模块11将第一指令槽114中的所述解码后的控制指令缓存至第二指令槽111中,然后将第二指令槽111缓存的所述解码后的控制指令发送给所述数据加载模块12。The loading stage corresponds to the second instruction slot 111. The control module 11 caches the decoded control instruction in the first instruction slot 114 into the second instruction slot 111, and then transfers the second instruction slot 111 to the second instruction slot 111. The buffered decoded control instruction is sent to the data loading module 12.
所述执行阶段对应于第三指令槽112,所述控制模块11将第二指令槽111中的所述解码后的控制指令缓存至第三指令槽112中,然后将第三指令槽112中缓存的所述解码后的控制指令发送给所述处理模块13。The execution stage corresponds to the third instruction slot 112. The control module 11 caches the decoded control instruction in the second instruction slot 111 into the third instruction slot 112, and then caches the third instruction slot 112 The decoded control instruction is sent to the processing module 13.
所述存储阶段对应于第四指令槽113,所述控制模块11将第三指令 槽112中的所述解码后的控制指令缓存至第四指令槽114中,然后将第四指令槽114缓存的所述解码后的控制指令发送给所述数据写回模块14。The storage stage corresponds to the fourth instruction slot 113. The control module 11 caches the decoded control instruction in the third instruction slot 112 into the fourth instruction slot 114, and then caches the fourth instruction slot 114 The decoded control instruction is sent to the data writing back module 14.
其中,各个指令槽在不同时刻缓存不同的控制指令,通过所述指令槽中控制指令状态信号和控制指令完成信号来控制所述控制指令的执行进行。为方便理解,请参阅图7,将不同时间段所述控制指令在各个指令槽中的缓存情况进行举例说明:Wherein, each instruction slot buffers different control instructions at different times, and the execution of the control instruction is controlled through the control instruction status signal and the control instruction completion signal in the instruction slot. For ease of understanding, please refer to Figure 7 to illustrate the buffering of the control instructions in different time periods in each instruction slot as an example:
在T0时间段:所述控制模块11解码控制指令a并将解码后的控制指令a缓存至与解码阶段对应的第一指令槽114中;此时,所述第一指令槽114中,所述第一有效信号指示解码后的控制指令a有效,第二指令槽完成信号指示第二指令槽111未完成缓存所述解码后的控制指令a。In the T0 time period: the control module 11 decodes the control instruction a and caches the decoded control instruction a in the first instruction slot 114 corresponding to the decoding stage; at this time, in the first instruction slot 114, the The first valid signal indicates that the decoded control instruction a is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not finished buffering the decoded control instruction a.
在T1时间段:所述控制模块11将第一指令槽114中所述解码后的控制指令a缓存至加载阶段对应的第二指令槽111中,然后将第二指令槽111中的解码后的控制指令a发送给所述数据加载模块12;此时,所述第二指令槽111中,第二有效信号指示解码后的控制指令a有效,所述数据加载模块完成信号指示所述数据加载模块12完成接收所述解码后的控制指令a,以及第三指令槽完成信号指示第三指令槽112未完成缓存所述解码后的控制指令a。In the T1 time period: the control module 11 caches the decoded control instruction a in the first instruction slot 114 into the second instruction slot 111 corresponding to the loading stage, and then stores the decoded control instruction a in the second instruction slot 111 The control instruction a is sent to the data loading module 12; at this time, in the second instruction slot 111, the second valid signal indicates that the decoded control instruction a is valid, and the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction a.
同时,由于所述第一指令槽114中所述解码后的控制指令a已缓存至所述第二指令槽111,因此第一指令槽114中的所述第一有效信号指示解码后的控制指令a无效,第二指令槽完成信号指示第二指令槽111完成缓存所述解码后的控制指令a,则所述控制控制模块11可以解码控制指令b并将解码后的控制指令b缓存至第一指令槽114中,相应地,所述第一指令槽114中,所述第一有效信号指示解码后的控制指令b有效,第二指令槽完成信号指示第二指令槽111未完成缓存所述解码后的控制指令b。At the same time, since the decoded control instruction a in the first instruction slot 114 has been cached in the second instruction slot 111, the first valid signal in the first instruction slot 114 indicates the decoded control instruction a is invalid, the second instruction slot completion signal instructs the second instruction slot 111 to finish caching the decoded control instruction a, then the control control module 11 can decode the control instruction b and cache the decoded control instruction b to the first In the instruction slot 114, correspondingly, in the first instruction slot 114, the first valid signal indicates that the decoded control instruction b is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not completed buffering the decoding After the control instruction b.
在T2时间段:所述控制模块11将第二指令槽111中解码后的控制指令a缓存至执行阶段对应的第三指令槽112中,然后将第三指令槽112中的解码后的控制指令a发送给所述处理模块13;此时,所述第三指令槽 中,第三有效信号指示解码后的控制指令a有效,所述处理模块完成信号指示所述处理模块13完成接收所述解码后的控制指令a,以及所述第四指令槽完成信号指示所述第四指令槽113未完成缓存所述解码后的控制指令a。In the T2 time period: the control module 11 caches the decoded control instruction a in the second instruction slot 111 into the third instruction slot 112 corresponding to the execution stage, and then stores the decoded control instruction in the third instruction slot 112 a is sent to the processing module 13; at this time, in the third instruction slot, a third valid signal indicates that the decoded control instruction a is valid, and the processing module completion signal indicates that the processing module 13 has completed receiving the decoding The subsequent control instruction a and the fourth instruction slot completion signal indicate that the fourth instruction slot 113 has not completed buffering the decoded control instruction a.
同时,由于所述第二指令槽111中所述解码后的控制指令a已缓存至所述第三指令槽112,因此,所述第二指令槽中的第二有效信号指示解码后的控制指令a无效,所述数据加载模块完成信号指示所述数据加载模块12完成接收所述解码后的控制指令a,以及所述第三指令槽完成信号指示第三指令槽112完成缓存所述解码后的控制指令a,则所述控制模块11可以将第一指令槽114中的所述解码后的控制指令b缓存至第二指令槽111中,在一种可能的情况中,所述加载模块12还在执行所述解码后的控制指令a,所述解码后的控制指令b无法下发,则所述第二指令槽中,第二有效信号指示解码后的控制指令b有效,所述数据加载模块完成信号指示数据加载模块12未完成接收所述解码后的控制指令b,以及第三指令槽完成信号指示第三指令槽112未完成缓存所述解码后的控制指令b。At the same time, since the decoded control instruction a in the second instruction slot 111 has been cached in the third instruction slot 112, the second valid signal in the second instruction slot indicates the decoded control instruction a is invalid, the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 has completed buffering the decoded control instruction a. Control instruction a, the control module 11 can cache the decoded control instruction b in the first instruction slot 114 in the second instruction slot 111. In a possible situation, the load module 12 also When the decoded control instruction a is executed and the decoded control instruction b cannot be issued, then in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, and the data loading module The completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction b.
同时,由于所述第一指令槽114中所述解码后的控制指令b已缓存至所述第二指令槽111,因此,所述第一指令槽中的所述第一有效信号指示解码后的控制指令b无效,第二指令槽完成信号指示第二指令槽111完成缓存所述解码后的控制指令b,则所述控制模块11可以解码控制指令c并将解码后的控制指令c缓存至第一指令槽114中,相应地,所述第一指令槽中,所述第一有效信号指示解码后的控制指令c有效,第二指令槽完成信号指示第二指令槽111未完成缓存所述解码后的控制指令c。At the same time, since the decoded control instruction b in the first instruction slot 114 has been cached in the second instruction slot 111, the first valid signal in the first instruction slot indicates the decoded control instruction b The control instruction b is invalid, and the second instruction slot completion signal instructs the second instruction slot 111 to finish buffering the decoded control instruction b, then the control module 11 can decode the control instruction c and cache the decoded control instruction c to the first In an instruction slot 114, correspondingly, in the first instruction slot, the first valid signal indicates that the decoded control instruction c is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not completed buffering the decoding After the control instruction c.
在T3时间:所述控制模块11将第三指令槽112中解码后的控制指令a缓存至存储阶段对应的第四指令槽113中,然后将第四指令槽113中的解码后的控制指令a发送给所述数据写回模块14。At time T3: the control module 11 caches the decoded control instruction a in the third instruction slot 112 into the fourth instruction slot 113 corresponding to the storage stage, and then stores the decoded control instruction a in the fourth instruction slot 113 Send to the data writing back module 14.
同时,因所述数据加载模块12还在执行所述解码后的控制指令a,所述解码后的控制指令b无法下发给所述数据加载模块12,依旧缓存在第 二指令槽中;此时,所述第二指令槽中,第二有效信号指示解码后的控制指令b有效、所述数据加载模块完成信号指示数据加载模块12未完成接收所述解码后的控制指令b,以及第三指令槽完成信号指示第三指令槽112未完成缓存所述解码后的控制指令b。At the same time, because the data loading module 12 is still executing the decoded control instruction a, the decoded control instruction b cannot be issued to the data loading module 12, and it is still cached in the second instruction slot; In the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, the data loading module completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction b, and the third The instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction b.
在T4时间:所述数据加载模块12对所述解码后的控制指令a执行完毕,所述控制模块11将第二指令槽中的所述解码后的控制指令b发送给所述加载模块12;此时,所述第二指令槽中,第二有效信号指示解码后的控制指令b有效,所述数据加载模块完成信号指示数据加载模块12完成接收所述解码后的控制指令b,以及第三指令槽完成信号指示第三指令槽112未完成缓存所述解码后的控制指令b。At time T4: the data loading module 12 has finished executing the decoded control instruction a, and the control module 11 sends the decoded control instruction b in the second instruction slot to the loading module 12; At this time, in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction b, and the third The instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction b.
在T5时间:所述控制模块11将第二指令槽111中解码后的控制指令b缓存至第三指令槽112中,然后在所述处理模块13对所述解码后的控制指令b执行完毕后,将第三指令槽112中缓存的所述解码后的控制指令b发送给所述处理模块13;此时,所述第三指令槽中,第三有效信号指示解码后的控制指令b有效,所述处理模块完成信号指示所述处理模块13完成接收所述解码后的控制指令b,以及第四指令槽完成信号指示所述第四指令槽113未完成缓存所述解码后的控制指令b。At time T5: the control module 11 caches the decoded control instruction b in the second instruction slot 111 into the third instruction slot 112, and then after the processing module 13 finishes executing the decoded control instruction b , Sending the decoded control instruction b buffered in the third instruction slot 112 to the processing module 13; at this time, in the third instruction slot, the third valid signal indicates that the decoded control instruction b is valid, The processing module completion signal indicates that the processing module 13 has completed receiving the decoded control instruction b, and the fourth instruction slot completion signal indicates that the fourth instruction slot 113 has not completed buffering the decoded control instruction b.
同时,由于所述第二指令槽111中所述解码后的控制指令b已缓存至所述第三指令槽112,因此,所述第二指令槽111中的第二有效信号指示解码后的控制指令b无效,所述数据加载模块完成信号指示数据加载模块12完成接收所述解码后的控制指令b,以及第三指令槽完成信号指示第三指令槽112完成缓存所述解码后的控制指令b,则所述控制模块11可以将第一指令槽114中解码后的控制指令c缓存至第二指令槽111中,在一种可能的情况中,所述加载模块12还在执行所述解码后的控制指令b,所述解码后的控制指令c无法下发给所述数据加载模块12,则所述第二指令槽111中的第二有效信号指示解码后的控制指令c有效,所述数据加载模块完成信号指示数据加载模块12未完成接收所述解码后的控制指令c,以 及第三指令槽完成信号指示第三指令槽112未完成缓存所述解码后的控制指令c。At the same time, since the decoded control instruction b in the second instruction slot 111 has been cached in the third instruction slot 112, the second valid signal in the second instruction slot 111 indicates the decoded control Instruction b is invalid, the data loading module completion signal indicates that the data loading module 12 has completed receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 has completed buffering the decoded control instruction b , The control module 11 may cache the decoded control instruction c in the first instruction slot 114 into the second instruction slot 111. In a possible situation, the loading module 12 is still performing the decoded control instruction c. If the decoded control instruction c cannot be issued to the data loading module 12, the second valid signal in the second instruction slot 111 indicates that the decoded control instruction c is valid, and the data The loading module completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction c, and the third instruction slot completion signal indicates that the third instruction slot 112 has not completed buffering the decoded control instruction c.
同时,由于所述第一指令槽114中所述解码后的控制指令c已缓存至所述第二指令槽111,因此所述第一指令槽中的所述第一有效信号指示解码后的控制指令c无效,第二指令槽完成信号指示第二指令槽111完成缓存所述解码后的控制指令c,则所述控制模块11可以解码控制指令d并将解码后的控制指令d缓存至第一指令槽114中,相应地,所述第一指令槽114中的所述第一有效信号指示解码后的控制指令d有效,第二指令槽完成信号指示第二指令槽111未完成缓存所述解码后的控制指令d。At the same time, since the decoded control instruction c in the first instruction slot 114 has been cached in the second instruction slot 111, the first valid signal in the first instruction slot indicates the decoded control The instruction c is invalid, and the second instruction slot completion signal instructs the second instruction slot 111 to finish caching the decoded control instruction c, then the control module 11 can decode the control instruction d and cache the decoded control instruction d to the first In the instruction slot 114, correspondingly, the first valid signal in the first instruction slot 114 indicates that the decoded control instruction d is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 has not completed buffering the decoding After the control instruction d.
在T6时间段:所述控制模块11将第三指令槽112中的所述解码后的控制指令b缓存至存储阶段对应的第四指令槽113中,然后在所述数据写回模块14对所述解码后的控制指令a执行完毕后,将第四指令槽113中缓存的解码后的控制指令b发送给所述数据写回模块14。In the T6 time period: the control module 11 caches the decoded control instruction b in the third instruction slot 112 into the fourth instruction slot 113 corresponding to the storage stage, and then performs the data write-back module 14 After the decoded control instruction a is executed, the decoded control instruction b buffered in the fourth instruction slot 113 is sent to the data write-back module 14.
同时,因所述加载模块12还在执行所述解码后的控制指令b,所述解码后的控制指令c无法下发,依旧缓存在第二指令槽中,则所述第二指令槽111中的第二有效信号指示解码后的控制指令c有效,所述数据加载模块完成信号指示数据加载模块12未完成接收所述解码后的控制指令c,以及第三指令槽完成信号指示第三指令槽112未完成缓存所述解码后的控制指令c。At the same time, because the load module 12 is still executing the decoded control instruction b, the decoded control instruction c cannot be issued and is still cached in the second instruction slot, then the second instruction slot 111 The second valid signal indicates that the decoded control instruction c is valid, the data loading module completion signal indicates that the data loading module 12 has not completed receiving the decoded control instruction c, and the third instruction slot completion signal indicates the third instruction slot 112 has not finished caching the decoded control instruction c.
在一示例性实施例中,所述待处理数据包括待处理对象和运行参数,其中,所述待处理对象包括但不限于图像、音频或文字;所述运行参数包括但不限于卷积核、池化参数或激活函数。In an exemplary embodiment, the data to be processed includes objects to be processed and operating parameters, where the objects to be processed include but are not limited to images, audio, or text; the operating parameters include but are not limited to convolution kernels, Pooling parameters or activation functions.
其中,所述处理模块13包括脉动阵列;则所述处理模块13响应于所述第i条控制指令,将所述第i条控制指令对应的待处理对象和运行参数分别写入所述脉动阵列中,通过所述脉动阵列将所述待处理对象与所述运行参数进行运算,得到所述处理结果。Wherein, the processing module 13 includes a systolic array; in response to the i-th control instruction, the processing module 13 writes the object to be processed and operating parameters corresponding to the i-th control instruction into the systolic array, respectively In this step, the object to be processed and the operating parameter are calculated through the systolic array to obtain the processing result.
在一个例子中,以所述数据处理装置用于进行图像的卷积运算为例 进行说明:则所述待处理对象为图像,所述运行参数为卷积核,所述处理模块13将所述图像以及所述卷积核写入所述脉动阵列中,通过所述脉动阵列将所述图像以及所述卷积核进行乘积累加运算,得到卷积图像。In an example, the data processing device is used to perform the convolution operation of the image as an example: the object to be processed is the image, the operating parameter is the convolution kernel, and the processing module 13 converts the The image and the convolution kernel are written into the systolic array, and the image and the convolution kernel are multiplied, accumulated and added by the systolic array to obtain a convolved image.
在一实施例中,考虑到在卷积神经网络应用场景中,需要加载的待处理数据至少包括待处理对象和运行参数两个部分,因此,为了进一步提高数据加载效率,请参阅图8,为本申请根据一示例性实施例示出的第五种数据处理装置的示意图。所述数据加载模块12包括对象加载单元121和参数加载单元122。所述控制模块11响应于所述对象加载单元121对第i条控制指令执行完毕,将第i+1条控制指令发送给所述对象加载单元121;以及响应于所述参数加载单元122对第i条控制指令执行完毕,将第i+1条控制指令发送给所述参数加载单元122。本实施例通过所述对象加载单元121和参数加载单元122分别加载待处理对象和运行参数,同时进行加载,有利于提高加载效率。In one embodiment, considering that in a convolutional neural network application scenario, the to-be-processed data that needs to be loaded includes at least two parts: the to-be-processed object and the operating parameters. Therefore, in order to further improve the efficiency of data loading, please refer to FIG. 8, which is The present application shows a schematic diagram of a fifth data processing device according to an exemplary embodiment. The data loading module 12 includes an object loading unit 121 and a parameter loading unit 122. In response to the completion of the execution of the i-th control instruction by the object loading unit 121, the control module 11 sends the (i+1)th control instruction to the object loading unit 121; After the i control instruction is executed, the i+1th control instruction is sent to the parameter loading unit 122. In this embodiment, the object loading unit 121 and the parameter loading unit 122 respectively load the object to be processed and the operating parameter, and perform the loading at the same time, which is beneficial to improving the loading efficiency.
需要说明的是,所述对象加载单元121对第i条控制指令执行完毕是指所述对象加载单元121加载完所述i条控制指令对应的待处理对象;所述参数加载单元122对第i条控制指令执行完毕是指所述参数加载单元122加载完所述第i条控制指令对应的运行参数。It should be noted that the completion of the execution of the i-th control instruction by the object loading unit 121 means that the object loading unit 121 has finished loading the object to be processed corresponding to the i-th control instruction; The completion of the execution of the control instruction means that the parameter loading unit 122 has finished loading the operating parameter corresponding to the i-th control instruction.
所述对象加载单元121响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理对象;所述参数加载单元122响应于所述第i条控制指令,加载所述第i条控制指令对应的运行参数。In response to the i-th control instruction, the object loading unit 121 loads the object to be processed corresponding to the i-th control instruction; the parameter loading unit 122 loads the i-th control instruction in response to the i-th control instruction. The operating parameters corresponding to i control instructions.
其中,当所述处理模块13响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,若所述对象加载单元121已加载完所述第i条控制指令对应的待处理对象,无需等待所述处理模块13对所述第i条控制指令执行完毕,所述对象加载单元121即可直接接收所述控制模块11发送的第i+1条控制指令,并加载第i+1条控制指令对应的待处理对象;同样的,若所述参数加载单元122已加载完所述第i条控制指令对应的运行参数,无需等待所述处理模块13对所述第i条控制指令执行完 毕,所述参数加载单元122即可直接接收所述控制模块11发送的第i+1条控制指令,并加载第i+1条控制指令对应的运行参数;本实施例进一步减少所述对象加载单元121和所述参数加载单元122对第i+1条控制指令的等待时间,避免过长的等待时间造成的处理资源的浪费。Wherein, when the processing module 13 processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, if the object loading unit 121 has finished loading the i-th control instruction The object to be processed corresponding to the instruction does not need to wait for the processing module 13 to complete the execution of the i-th control instruction, and the object loading unit 121 can directly receive the i+1-th control instruction sent by the control module 11. And load the object to be processed corresponding to the i+1th control instruction; similarly, if the parameter loading unit 122 has loaded the operating parameters corresponding to the i-th control instruction, there is no need to wait for the processing module 13 to respond to the After the i-th control instruction is executed, the parameter loading unit 122 can directly receive the i+1-th control instruction sent by the control module 11, and load the operating parameter corresponding to the i+1-th control instruction; this embodiment The waiting time of the object loading unit 121 and the parameter loading unit 122 for the i+1th control instruction is further reduced, and the waste of processing resources caused by the excessive waiting time is avoided.
在一实施例中,所述控制模块11包括对应于所述数据加载模块12的第二指令槽111,对应于所述数据处理模块13的第三指令槽112以及对应于所述数据写回模块14的第四指令槽113;当所述数据加载模块12包括对象加载单元121和参数加载单元122时,相应地,对于所述第二指令槽111,所述控制模块11响应于所述第二指令槽111中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽111;以及响应于所述对象加载单元121和参数加载单元122对第i条控制指令执行完毕,将所述第二指令槽111中的第i+1条控制指令发送至所述数据加载模块12。In an embodiment, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a third instruction slot 112 corresponding to the data write-back module. 14 of the fourth instruction slot 113; when the data loading module 12 includes an object loading unit 121 and a parameter loading unit 122, correspondingly, for the second instruction slot 111, the control module 11 responds to the second The i-th control instruction in the instruction slot 111 is invalid, and the i+1-th control instruction is cached in the second instruction slot 111; and in response to the object loading unit 121 and the parameter loading unit 122, the i-th control instruction is After the control instruction is executed, the i+1th control instruction in the second instruction slot 111 is sent to the data loading module 12.
其中,所述第二指令槽111包括第二有效信号、对象加载单元完成信号、参数加载单元完成信号以及第三指令槽完成信号;所述第二有效信号用于指示所述第二指令槽111中缓存的所述控制指令是否有效;所述对象加载单元完成信号用于指示所述对象加载单元121是否完成接收所述控制指令(即所述控制模块11是否已经将所述控制指令发送至所述对象加载单元121);所述参数加载单元完成信号用于指示所述参数加载单元122是否完成接收所述控制指令(即所述控制模块11是否已经将所述控制指令发送至所述参数加载单元122);所述第三指令槽完成信号用于指示所述第三指令槽112是否完成缓存所述控制指令(即所述控制模块11是否已经将所述第二指令槽111中的所述缓存指令缓存至所述第三指令槽112)。The second command slot 111 includes a second valid signal, an object loading unit completion signal, a parameter loading unit completion signal, and a third command slot completion signal; the second valid signal is used to instruct the second command slot 111 Whether the control instruction buffered in the central storage is valid; the object loading unit completion signal is used to indicate whether the object loading unit 121 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the The object loading unit 121); the parameter loading unit completion signal is used to indicate whether the parameter loading unit 122 has completed receiving the control instruction (that is, whether the control module 11 has sent the control instruction to the parameter loading Unit 122); the third instruction slot completion signal is used to indicate whether the third instruction slot 112 has completed caching the control instruction (that is, whether the control module 11 has already stored the second instruction slot 111 The cache instruction is cached to the third instruction slot 112).
当第i条控制指令已发送至所述对象加载单元121、所述参数加载单元122以及已缓存至所述第三指令槽112时,所述对象加载单元完成信号、所述参数加载单元完成信号以及所述第三指令槽完成信号的状态值被置为表征完成的值,与此同时,所述第二有效信号的状态值被置为表征无效的值,此时,所述控制模块11可以将所述第i+1条控制指令缓存至所述第二 指令槽111;当所述第i+1条控制指令缓存至所述第二指令槽111时,所述第i+1条控制指令还未发送至所述对象加载单元121、所述参数加载单元122以及缓存至所述第三指令槽112,因此,所述第二有效信号的状态值被置为表征有效的值,且所述数据加载模块完成信号以及所述第三指令槽完成信号的状态值被置为表征未完成的值。When the i-th control instruction has been sent to the object loading unit 121, the parameter loading unit 122, and has been cached in the third instruction slot 112, the object loading unit completion signal and the parameter loading unit completion signal And the state value of the completion signal of the third command slot is set to a value that characterizes completion. At the same time, the state value of the second valid signal is set to a value that characterizes invalidity. At this time, the control module 11 may The i+1th control instruction is cached in the second instruction slot 111; when the i+1th control instruction is cached in the second instruction slot 111, the i+1th control instruction It has not been sent to the object loading unit 121, the parameter loading unit 122, and cached in the third instruction slot 112. Therefore, the state value of the second valid signal is set to a value representing valid, and the The status values of the data loading module completion signal and the third instruction slot completion signal are set to values that represent incompleteness.
其中,对于第三指令槽112以及第四指令槽113的执行进程可参见如图3所示的实施例,本申请实施例对此不在赘述。For the execution process of the third instruction slot 112 and the fourth instruction slot 113, please refer to the embodiment shown in FIG. 3, which will not be repeated in the embodiment of the present application.
在一实施例中,为了进一步提高数据处理的效率,可以将待处理目标数据分成至少两部分,所述待处理数据为所述待处理目标数据的其中一部分,所述控制模块11通过至少两条控制指令进行控制,一条控制指令指示所述待处理目标数据的一部分,实现对所述待处理目标数据的处理,由于将所述待处理目标数据分成了至少两个部分,则所述数据加载模块12在基于其中一条控制指令加载待处理数据时,加载的只是所述待处理目标数据的一部分,从而有利于提高加载效率,使得所述处理模块13无需等待所述数据加载模块12加载完整个待处理目标数据,可以更快地对加载完的待处理数据进行处理,进一步提高处理效率。In an embodiment, in order to further improve the efficiency of data processing, the target data to be processed may be divided into at least two parts. The data to be processed is a part of the target data to be processed. The control module 11 passes at least two parts. The control instruction performs control, and a control instruction instructs a part of the target data to be processed to realize the processing of the target data to be processed. Since the target data to be processed is divided into at least two parts, the data loading module 12 When loading the data to be processed based on one of the control instructions, only a part of the target data to be processed is loaded, which is beneficial to improve the loading efficiency, so that the processing module 13 does not need to wait for the data loading module 12 to load the complete data to be processed. By processing the target data, the loaded data to be processed can be processed faster, and the processing efficiency can be further improved.
由于将所述待处理目标数据分成至少两个部分,所述控制通过至少两条控制指令进行控制,一条控制指令指示所述待处理目标数据的一部分,实现对所述待处理目标数据的处理,基于此,所述控制指令包括结果写回控制指令和结果不写回控制指令。Since the target data to be processed is divided into at least two parts, the control is controlled by at least two control instructions, and one control instruction instructs a part of the target data to be processed to realize the processing of the target data to be processed, Based on this, the control instruction includes a result write-back control instruction and a result non-write-back control instruction.
所述结果不写回控制指令用于指示所述处理模块13在得到所述处理结果之后,不将所述处理结果发送至所述数据写回模块14,而是缓存所述处理结果,所述结果不写回控制指令对应的处理结果并不是最终由所述数据写回模块14写入外部存储模块的最终处理结果,而是最终处理结果的其中一部分;在接收到所述结果不写回控制指令之后,所述处理模块13根据所述结果不写回控制指令,对相应的待处理数据进行处理,得到处理结果并缓存,然后在缓存完毕之后生成发送至所述控制信号的结束信号, 所述控制模块11接收所述处理模块13在缓存所述处理结果之后发送的结束信号,所述结束信号表征所述结果不写回控制指令在所述数据处理装置中执行完毕。The result non-write-back control instruction is used to instruct the processing module 13 to not send the processing result to the data write-back module 14 after obtaining the processing result, but to cache the processing result. The result is not written back to the processing result corresponding to the control instruction is not the final processing result that is finally written into the external storage module by the data writing back module 14, but a part of the final processing result; the result is not written back to the control after receiving the result. After the instruction, the processing module 13 processes the corresponding data to be processed according to the result without writing back the control instruction, obtains the processing result and caches it, and then generates an end signal sent to the control signal after the cache is completed, so The control module 11 receives an end signal sent by the processing module 13 after buffering the processing result, and the end signal indicates that the result is not written back control instruction that has been executed in the data processing device.
所述结果写回控制指令用于指示所述处理模块13将与所述待处理目标数据相关的所有处理结果发送至所述数据写回模块14,在接收到所述结果写回控制指令之后,所述处理模块13根据所述结果写回控制指令,对相应的待处理数据进行处理,得到处理结果,并将与所述待处理目标数据相关的所有处理结果整合后发送至所述数据写回模块14,由所述数据写回模块14写入外部存储模块,在完成写入操作之后,所述数据写回模块14生成结束信号并发送至所述控制模块11,所述控制模块11接收所述数据写回模块14在将所述处理结果写入完成后发送的结束信号,所述结束信号表征所述结果写回控制指令在所述数据处理装置中执行完毕。The result write-back control instruction is used to instruct the processing module 13 to send all processing results related to the target data to be processed to the data write-back module 14. After receiving the result write-back control instruction, The processing module 13 processes the corresponding data to be processed according to the result write-back control instruction to obtain the processing result, and integrates all processing results related to the target data to be processed and sends it to the data write-back Module 14, the data write-back module 14 writes to the external storage module. After the write operation is completed, the data write-back module 14 generates an end signal and sends it to the control module 11, and the control module 11 receives The data write-back module 14 sends an end signal after writing the processing result, and the end signal indicates that the result write-back control instruction has been executed in the data processing device.
本实施例中,由于将待处理目标数据划分成至少两部分,每一部分由一控制指令进行指示,使得所述数据加载模块12在加载所述控制指令对应的待处理数据时,只需加载所述待处理目标数据的一部分,有利于提高加载效率,减少了所述处理模块13等待所述数据加载模块12加载所述待处理目标数据的时间,使得所述处理可以更快速地对加载完的待处理数据进行处理,有利于提高处理效率。In this embodiment, since the target data to be processed is divided into at least two parts, each part is instructed by a control instruction, so that the data loading module 12 only needs to load all the data to be processed when loading the data to be processed corresponding to the control instruction. The part of the target data to be processed is beneficial to improve the loading efficiency, and reduces the time that the processing module 13 waits for the data loading module 12 to load the target data to be processed, so that the processing can more quickly perform the loading process. The processing of the data to be processed is conducive to improving processing efficiency.
进一步地,所述控制模块11可以将所述控制指令对应的结束信号返回给外部控制模块,以通知所述外部控制模块所述控制指令已被执行完毕,以便所述外部控制模块基于写入外部存储模块的最终处理结果进行下一处理步骤。Further, the control module 11 may return the end signal corresponding to the control instruction to the external control module to notify the external control module that the control instruction has been executed, so that the external control module can write to the external The final processing result of the storage module goes to the next processing step.
其中,由于所述控制指令包括结果写回控制指令和结果不写回控制指令,两种控制指令对于获取的处理结果的处理方式不同,其发送结束信号的方式也有所不同,所述结果写回控制指令由所述数据写回模块14在完成该指令后发送结束信号,所述结果不写回控制指令由所述处理模块13在完成该指令后发送结束信号;在一示例性场景中,所述数据写回模块14 正将待处理目标数据A对应的最终处理结果A写入外部存储模块,此时所述数据写回模块14还未执行完,因此并未生成发送至所述控制模块11的结束信号A,此时,所述控制模块11可能已处理完所述待处理目标数据B的一部分待处理数据B 1并生成发送至所述控制模块11的结束信号B 1,若此时所述控制模块11将所述结束信号B 1返回给外部控制模块,所述外部控制模块直接基于结束信号B 1执行下一步骤,而忽略了还未收到的结束信号A,所述外部控制模块可能直接跳过了基于所述最终处理结果A所进行的处理步骤,可能会导致处理流程出错。 Wherein, since the control instruction includes a result write-back control instruction and a result-not-write-back control instruction, the two control instructions have different processing methods for the acquired processing results, and their ways of sending the end signal are also different. The results are written back. The control instruction is sent by the data write-back module 14 after the completion of the instruction, and the result is not written back control instruction is sent by the processing module 13 after the completion of the instruction; in an exemplary scenario, The data write-back module 14 is writing the final processing result A corresponding to the target data A to be processed into the external storage module. At this time, the data write-back module 14 has not been executed yet, so it is not generated and sent to the control module 11 At this time, the control module 11 may have processed a part of the to-be-processed data B 1 of the to-be-processed target data B and generated the end signal B 1 to be sent to the control module 11. said control signal module 11 the end B is returned to the external control module 1, the external control module 1 directly to the next step based on the end of the signal B, the signal a while ignoring the end has not been received, the external control module The processing steps based on the final processing result A may be skipped directly, which may lead to errors in the processing flow.
因此,为了保证处理流程的准确性,所述控制模块11按照所述控制指令的接收顺序以及先进先出原则,将所述控制指令对应的结束信号返回给外部控制模块,如果按照所述控制指令的接收顺序以及先进先出原则,确定当前接收到的结束信号不是当前要发送的,先缓存所述当前接收到的结束信号,直到轮到该结束信号发送时才将该结束信号返回给外部控制模块。本实施例通过对结束信号进行保序处理,保证从外部控制模块先接收的控制信号,其对应的结束信号先返回给所述外部控制模块,从而保证了数据处理流程的准确性和有序进行。Therefore, in order to ensure the accuracy of the processing flow, the control module 11 returns the end signal corresponding to the control instruction to the external control module according to the order in which the control instruction is received and the first-in-first-out principle. The receiving order and the first-in-first-out principle, it is determined that the currently received end signal is not currently to be sent, and the currently received end signal is buffered first, and the end signal is not returned to the external control until it is the turn of the end signal to be sent Module. In this embodiment, the end signal is processed in order to ensure that the control signal received from the external control module first, and the corresponding end signal is first returned to the external control module, thereby ensuring the accuracy and orderly progress of the data processing process. .
在一个例子,请参阅图9,比如所述待处理目标数据C被划分两个部分,包括待处理数据c1和待处理数据c2,所述控制指令包括结果写回控制指令和结果不写回控制指令,所述结果不写回控制指令对应待处理数据c1,所述结果写回控制指令对应待处理c2。In an example, please refer to FIG. 9. For example, the target data C to be processed is divided into two parts, including the data to be processed c1 and the data to be processed c2, and the control instructions include a result write-back control instruction and a result non-write-back control Instruction, the result not written back control instruction corresponds to the to-be-processed data c1, and the result write-back control instruction corresponds to the to-be-processed c2.
图9所示的实施例中,所述数据加载模块12基于所述结果不写回控制指令加载待处理数据c1,所述处理模块13基于所述结果不写回控制指令对所述待处理数据c1进行处理,得到处理结果c1,并缓存所述处理结果c1;所述数据加载模块12基于所述数据写回控制指令加载待处理数据c2,所述处理模块13基于所述结果写回控制指令对所述待处理数据c2进行处理,得到处理结果c2,然后将处理结果c1和处理结果c2进行整合,得到所述待处理目标数据C对应的最终处理结果(c1,c2),然后由所述 数据写回模块14将所述最终处理结果(c1,c2)写入外部存储模块。In the embodiment shown in FIG. 9, the data loading module 12 loads the data to be processed c1 based on the result not writing back control instruction, and the processing module 13 performs the processing on the data to be processed based on the result not writing back control instruction. c1 performs processing to obtain the processing result c1, and caches the processing result c1; the data loading module 12 loads the data to be processed c2 based on the data write-back control instruction, and the processing module 13 writes back the control instruction based on the result Process the to-be-processed data c2 to obtain the processing result c2, and then integrate the processing result c1 and the processing result c2 to obtain the final processing result (c1, c2) corresponding to the to-be-processed target data C, and then use the The data write-back module 14 writes the final processing result (c1, c2) into the external storage module.
在一示例性实施例中,所述待处理数据包括待处理对象和运行参数,其中,所述待处理对象包括但不限于图像、音频或文字;所述运行参数包括但不限于卷积核、池化参数或激活函数。为了进一步提高数据处理的效率,可以将待处理目标对象分成至少两部分以及将目标运行参数划分成至少两个部分,所述待处理对象为所述待处理目标对象的其中一部分,所述运行参数为所述目标运行参数的其中一部分。In an exemplary embodiment, the data to be processed includes objects to be processed and operating parameters, where the objects to be processed include but are not limited to images, audio, or text; the operating parameters include but are not limited to convolution kernels, Pooling parameters or activation functions. In order to further improve the efficiency of data processing, the target object to be processed may be divided into at least two parts and the target operating parameter may be divided into at least two parts. The target object to be processed is a part of the target object to be processed, and the operating parameter Is part of the target operating parameters.
相应地,所述控制指令包括结果写回控制指令和结果不写回控制指令,所述结果不写回控制指令用于指示所述处理模块13在得到所述处理结果之后,不将所述处理结果发送至所述数据写回模块14,而是缓存所述处理结果,所述结果不写回控制指令对应的处理结果并不是最终由所述数据写回模块14写入外部存储模块的最终处理结果,而是最终处理结果的其中一部分;所述结果写回控制指令用于指示所述处理模块13将与所述待处理目标对象和所述目标运行参数相关的所有处理结果发送至所述数据写回模块14,所述处理模块13根据所述结果写回控制指令将与所述待处理目标对象和所述目标运行参数相关的所有处理结果整合后发送至所述数据写回模块14,由所述数据写回模块14写入外部存储模块。Correspondingly, the control instruction includes a result write-back control instruction and a result-not-write-back control instruction, and the result-not-write-back control instruction is used to instruct the processing module 13 not to perform the processing after obtaining the processing result. The result is sent to the data write-back module 14, but the processing result is cached. The processing result corresponding to the control instruction is not written back to the final processing of the data write-back module 14 to the external storage module. The result is part of the final processing result; the result write-back control instruction is used to instruct the processing module 13 to send all processing results related to the target object to be processed and the target operating parameter to the data The write-back module 14, the processing module 13 integrates all the processing results related to the target object to be processed and the target operating parameters according to the result write-back control instruction, and sends the result to the data write-back module 14. The data write-back module 14 writes to the external storage module.
本实施例中,由于将所述待处理目标对象和所述目标运行参数划分成至少两部分,每一部分由一控制指令进行指示,使得所述数据加载模块12在加载所述控制指令对应的所述待处理对象和所述运行参数时,只需加载所述待处理目标对象和所述目标运行参数的一部分,有利于提高加载效率,减少了所述处理模块13等待所述数据加载模块12加载所述待处理目标数据的时间,使得所述处理可以更快速对加载完的待处理数据进行处理,有利于提高处理效率。In this embodiment, since the target object to be processed and the target operating parameter are divided into at least two parts, each part is instructed by a control instruction, so that the data loading module 12 loads all the corresponding control instructions. When the object to be processed and the operating parameters are described, only a part of the object to be processed and the target operating parameters need to be loaded, which is beneficial to improve the loading efficiency and reduces the processing module 13 waiting for the data loading module 12 to load. The time of the target data to be processed enables the processing to process the loaded data to be processed more quickly, which is beneficial to improve processing efficiency.
需要说明的是,在如图4以及如图5所示的实施例中,所述控制模块11包括对应于所述数据加载模块12的第二指令槽111,对应于所述数据处理模块13的第三指令槽112以及对应于所述数据写回模块14的第四 指令槽113;所述第二指令槽111、第三指令槽112以及第四指令槽113用于缓存控制指令,其中,当所述控制指令为结果不写回控制指令时,无需将所述数据不写回控制指令缓存至所述第四指令槽113,也无需将所述数据不写回指令发送给所述数据写回模块14,即所述数据写回模块14无需执行所述结果不写回控制指令。It should be noted that, in the embodiments shown in FIG. 4 and FIG. 5, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, which corresponds to the data processing module 13 The third instruction slot 112 and the fourth instruction slot 113 corresponding to the data write-back module 14; the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are used to cache control instructions. When the control instruction is a result non-write-back control instruction, there is no need to cache the data non-write-back control instruction to the fourth instruction slot 113, and there is no need to send the data non-write-back instruction to the data write-back. The module 14, that is, the data write-back module 14 does not need to execute the result non-write-back control instruction.
在一示例性的实施例中,本申请实施例所提供的所述数据处理装置可以应用于卷积神经网路对目标对象(所述目标对象包括但不限于图像、音频、视频或文字等)的处理过程中,可以进行卷积层的卷积运算、池化层的池化操作或者激活层的激活操作,实现通过硬件方式加速深度神经网络的运算过程,减少深度神经网络的运算时间,提高运算效率。In an exemplary embodiment, the data processing device provided in this embodiment of the application can be applied to a convolutional neural network to target objects (the target objects include but are not limited to images, audio, video, text, etc.) In the process of processing, the convolution operation of the convolutional layer, the pooling operation of the pooling layer, or the activation operation of the activation layer can be performed to accelerate the calculation process of the deep neural network through hardware, reduce the calculation time of the deep neural network, and improve Operational efficiency.
以下以所述卷积神经网络应用于图像处理领域,所述数据处理装置用于进行卷积神经网络中卷积层的卷积运算为例进行说明:所述控制模块11接收卷积运算控制指令并分发给所述数据加载模块12、所述处理模块13以及所述数据写回模块14;其中,请参阅图10,所述控制模块11对于控制指令的分发过程分为4个阶段,分别为解码阶段、加载阶段、执行阶段和存储阶段。所述解码阶段对应于所述第一指令槽114,所述加载阶段对应于所述第二指令槽111,所述执行阶段对应于第三指令槽112,所述存储阶段对应于第四指令槽113;其中,当所述控制指令为结果不写回控制指令时,无需将所述数据不写回控制指令缓存至所述第四指令槽113,也无需将所述数据不写回指令发送给所述数据写回模块14,即所述数据写回模块14无需执行所述结果不写回控制指令;当所述控制指令为结果写回控制指令时,则需要将其缓存至所述第四指令槽113以及发送给所述数据写回模块14,由所述数据写回模块14执行。Hereinafter, the convolutional neural network is applied to the field of image processing, and the data processing device is used to perform the convolution operation of the convolutional layer in the convolutional neural network as an example: the control module 11 receives a convolution operation control instruction And distributed to the data loading module 12, the processing module 13 and the data write-back module 14; among them, referring to FIG. 10, the control module 11 is divided into 4 stages for the distribution process of the control instructions, namely Decoding phase, loading phase, execution phase and storage phase. The decoding stage corresponds to the first instruction slot 114, the loading stage corresponds to the second instruction slot 111, the execution stage corresponds to the third instruction slot 112, and the storage stage corresponds to the fourth instruction slot. 113; Wherein, when the control instruction is the result of non-write-back control instruction, there is no need to cache the data-non-write-back control instruction to the fourth instruction slot 113, and there is no need to send the data non-write-back instruction to The data write-back module 14, that is, the data write-back module 14 does not need to execute the result-not-write-back control instruction; when the control instruction is a result-write-back control instruction, it needs to be cached to the fourth The instruction slot 113 and sent to the data write-back module 14 are executed by the data write-back module 14.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实 现本实施例方案的目的。本邻域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Ordinary technicians in this neighborhood can understand and implement without creative work.
相应地,请参阅图11,本申请实施例还提供了一种数据处理方法,应用于数据处理装置上,所述数据处理装置包括数据加载模块和处理模块;所述方法包括:Correspondingly, referring to FIG. 11, an embodiment of the present application also provides a data processing method, which is applied to a data processing device, and the data processing device includes a data loading module and a processing module; the method includes:
在步骤S101中,响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理。In step S101, in response to a control instruction, data is loaded by the data loading module for the processing module to perform data processing.
在步骤S102中,响应于所述控制指令,通过所述处理模块进行数据处理;其中,所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。In step S102, in response to the control instruction, data processing is performed by the processing module; wherein the data loading module and the processing module execute different control instructions at the same time.
在一实施例中,所述方法还包括:In an embodiment, the method further includes:
响应于所述控制指令,将所述待处理数据的处理结果写入外部存储模块;其中,所述数据加载模块、所述处理模块和所述数据写回模块在同一时刻执行不同的控制指令。In response to the control instruction, the processing result of the to-be-processed data is written into an external storage module; wherein the data loading module, the processing module, and the data write-back module execute different control instructions at the same time.
在一实施例中,所述方法还包括:In an embodiment, the method further includes:
响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块;其中,i为整数。In response to the completion of the execution of the i-th control instruction by the data loading module, the i+1th control instruction is sent to the data loading module; and in response to the completion of the execution of the i-th control instruction by the processing module, the i+1 control instructions are sent to the processing module; where i is an integer.
所述步骤S101包括:响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据。The step S101 includes: in response to the i-th control instruction, loading the to-be-processed data corresponding to the i-th control instruction.
所述步骤S102包括:响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。The step S102 includes: in response to the i-th control instruction, processing the to-be-processed data corresponding to the i-th control instruction to obtain a processing result.
在一实施例中,所述装置还包括数据写回模块。In an embodiment, the device further includes a data write-back module.
所述方法还包括:The method also includes:
响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块。In response to the completion of the execution of the i-th control instruction by the data write-back module, the i+1-th control instruction is sent to the data write-back module.
响应于所述第i条控制指令,通过所述数据写回模块将所述第i条控制指令对应的处理结果写入外部存储模块。In response to the i-th control instruction, the processing result corresponding to the i-th control instruction is written into the external storage module through the data write-back module.
在一实施例中,还包括:In an embodiment, it further includes:
当所述处理模块响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,无需等待所述处理模块对所述第i条控制指令执行完毕,所述数据加载模块接收第i+1条控制指令。When the processing module processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, there is no need to wait for the processing module to finish executing the i-th control instruction. The data loading module receives the i+1th control instruction.
在一实施例中,还包括:In an embodiment, it further includes:
当所述数据写回模块响应于所述第i条控制指令将所述第i条控制指令对应的处理结果写入外部存储模块时,无需等待所述数据写回模块对所述第i条控制指令执行完毕,所述处理模块接收第i+1条控制指令。When the data write-back module writes the processing result corresponding to the i-th control instruction to the external storage module in response to the i-th control instruction, there is no need to wait for the data write-back module to control the i-th control instruction After the instruction is executed, the processing module receives the i+1th control instruction.
在一实施例中,还包括:In an embodiment, it further includes:
通过至少一指令槽缓存所述控制指令,以及,根据至少一指令槽记载的所述控制指令的状态,控制所述控制指令的执行进程。The control instruction is cached through at least one instruction slot, and the execution process of the control instruction is controlled according to the state of the control instruction recorded in the at least one instruction slot.
在一实施例中,还包括:In an embodiment, it further includes:
通过至少一指令槽缓存所述控制指令;所述至少一指令槽包括至少一控制指令状态信号,其中,所述控制指令状态信号用于指示对应的指令槽中缓存的控制指令是否有效。The control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction status signal, wherein the control instruction status signal is used to indicate whether the control instruction cached in the corresponding instruction slot is valid.
在一实施例中,还包括:In an embodiment, it further includes:
通过至少一指令槽缓存所述控制指令;所述至少一指令槽包括至少一控制指令完成信号;其中,所述至少一控制指令完成信号用于指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作,或者所述至少一控制指令完成信号用于指示所述至少一指令槽是否完成对应的控制指令的操作。The control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction completion signal; wherein, the at least one control instruction completion signal is used to indicate whether the module corresponding to the at least one instruction slot has completed the The operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes the operation of the corresponding control instruction.
在一实施例中,还包括:In an embodiment, it further includes:
通过至少一指令槽缓存所述控制指令;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示对应的模块完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指 令槽中缓存的控制指令无效。The control instruction is cached through at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the corresponding module to complete the operation of the corresponding control instruction, the The control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid.
在一实施例中,还包括:In an embodiment, it further includes:
通过至少一指令槽缓存所述控制指令;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示所述至少一指令槽完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。The control instruction is cached by at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the at least one instruction slot to complete the operation of the corresponding control instruction , The control command status signal indicates that the control command buffered in the corresponding command slot is invalid.
在一实施例中,还包括:In an embodiment, it further includes:
通过至少一指令槽缓存所述控制指令;其中,发送至所述数据加载模块、所述处理模块以及所述数据写回模块的第i+1条控制指令从所述指令槽中获取。The control instruction is cached by at least one instruction slot; wherein the i+1th control instruction sent to the data loading module, the processing module, and the data write-back module is obtained from the instruction slot.
在一实施例中,还包括:In an embodiment, it further includes:
通过第二指令槽缓存对应于所述数据加载模块的控制指令,以及通过第三指令槽缓存对应于所述处理模块的控制指令,以及通过第三指令槽缓存对应于所述数据写回模块的控制指令。Cache the control instructions corresponding to the data loading module through the second instruction slot, cache the control instructions corresponding to the processing module through the third instruction slot, and cache the control instructions corresponding to the data write-back module through the third instruction slot. Control instruction.
在一实施例中,还包括:In an embodiment, it further includes:
响应于所述第二指令槽中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽。In response to the i-th control instruction in the second instruction slot being invalid, the i+1-th control instruction is cached in the second instruction slot.
所述响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块,包括:The sending the i+1th control instruction to the data loading module in response to the completion of the execution of the i-th control instruction by the data loading module includes:
响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中的第i+1条控制指令发送至所述数据加载模块。In response to the completion of the execution of the i-th control instruction by the data loading module, the (i+1)th control instruction in the second instruction slot is sent to the data loading module.
在一实施例中,还包括:In an embodiment, it further includes:
响应于所述第三指令槽中的第i条控制指令无效,将所述第二指令槽中的第i+1条控制指令缓存至所述第三指令槽。In response to the i-th control instruction in the third instruction slot being invalid, the i+1-th control instruction in the second instruction slot is cached to the third instruction slot.
所述响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块,包括:The sending the i+1th control instruction to the processing module in response to the completion of the execution of the i-th control instruction by the processing module includes:
响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令 槽中的第i+1条控制指令发送至所述处理模块。In response to the completion of the execution of the i-th control instruction by the processing module, the i+1-th control instruction in the third instruction slot is sent to the processing module.
在一实施例中,还包括:In an embodiment, it further includes:
响应于所述第四指令槽中的第i条控制指令无效,将所述第三指令槽中的第i+1条控制指令缓存至所述第四指令槽。In response to the i-th control instruction in the fourth instruction slot being invalid, the i+1-th control instruction in the third instruction slot is cached to the fourth instruction slot.
所述响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块,包括:The sending the i+1th control instruction to the data write-back module in response to the completion of the execution of the i-th control instruction by the data write-back module includes:
响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中的第i+1条控制指令发送至所述处理模块。In response to the completion of the execution of the i-th control instruction by the data write-back module, the i+1-th control instruction in the fourth instruction slot is sent to the processing module.
在一实施例中,还包括:In an embodiment, it further includes:
对第i条控制指令进行解码;响应于所述第一指令槽中解码后的第i条控制指令无效,将解码后的第i+1条控制指令缓存至第一指令槽;以及,Decode the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot being invalid, buffer the decoded i+1-th control instruction to the first instruction slot; and,
响应于所述第二指令槽中解码后的第i条控制指令无效,将所述第一指令槽中解码后的第i+1条控制指令缓存至所述第二指令槽。In response to the invalidation of the i-th control instruction decoded in the second instruction slot, the i+1-th control instruction decoded in the first instruction slot is cached in the second instruction slot.
在一实施例中,所述控制指令包括结果写回控制指令和结果不写回控制指令。In an embodiment, the control instruction includes a result write-back control instruction and a result non-write-back control instruction.
所述待处理数据为待处理目标数据的其中一部分;所述待处理目标数据被划分为至少两部分。The to-be-processed data is a part of the to-be-processed target data; the to-be-processed target data is divided into at least two parts.
所述结果不写回控制指令用于指示所述处理模块缓存所述处理结果;The result not writing back control instruction is used to instruct the processing module to cache the processing result;
所述结果写回控制指令用于指示所述处理模块将与所述待处理目标数据相关的所有处理结果发送至所述数据写回模块。The result write-back control instruction is used to instruct the processing module to send all processing results related to the target data to be processed to the data write-back module.
在一实施例中,所述步骤S102包括:In an embodiment, the step S102 includes:
根据所述结果不写回控制指令,对相应的待处理数据进行处理,得到处理结果并缓存;以及,According to the result, the control instruction is not written back, and the corresponding data to be processed is processed, and the processing result is obtained and cached; and,
根据所述结果写回控制指令,对相应的待处理数据进行处理,得到处理结果,并将与所述待处理目标数据相关的所有处理结果整合后发送至所述数据写回模块。According to the result write-back control instruction, the corresponding to-be-processed data is processed to obtain the processing result, and all the processing results related to the to-be-processed target data are integrated and sent to the data write-back module.
在一实施例中,还包括:In an embodiment, it further includes:
若所述控制指令为所述结果不写回控制指令,接收所述处理模块在缓存所述处理结果之后发送的结束信号;以及,If the control instruction is a control instruction not to write back the result, receiving an end signal sent by the processing module after buffering the processing result; and,
若所述控制指令为所述结果写回控制指令,接收所述数据写回模块在将所述处理结果写入完成后发送的结束信号;其中,所述结束信号表征所述结果不写回控制指令或所述结果写回控制指令在所述数据处理装置中执行完毕。If the control instruction is the result write-back control instruction, receive the end signal sent by the data write-back module after the processing result is written; wherein, the end signal indicates that the result is not written back control The instruction or the result write-back control instruction is executed in the data processing device.
在一实施例中,还包括:In an embodiment, it further includes:
按照所述控制指令的接收顺序以及先进先出原则,将所述控制指令对应的结束信号返回给外部控制模块。According to the receiving order of the control instructions and the first-in-first-out principle, the end signal corresponding to the control instruction is returned to the external control module.
在一实施例中,还包括:In an embodiment, it further includes:
如果按照所述控制指令的接收顺序以及先进先出原则,确定当前接收到的结束信号不是当前要发送的,缓存所述当前接收到的结束信号。If it is determined that the currently received end signal is not currently to be sent according to the receiving order of the control instructions and the first-in-first-out principle, the currently received end signal is buffered.
在一实施例中,所述待处理数据包括待处理对象和运行参数。In an embodiment, the data to be processed includes the object to be processed and operating parameters.
在一实施例中,所述待处理对象包括以下任意一种:图像、音频或文字;所述运行参数包括以下任意一种:卷积核、池化参数或激活函数。In an embodiment, the object to be processed includes any one of the following: image, audio, or text; the operating parameter includes any one of the following: a convolution kernel, a pooling parameter, or an activation function.
在一实施例中,所述数据加载模块包括对象加载单元和参数加载单元。In an embodiment, the data loading module includes an object loading unit and a parameter loading unit.
所述步骤S101包括:The step S101 includes:
响应于所述第i条控制指令,通过所述对象加载单元加载所述第i条控制指令对应的待处理对象;以及,In response to the i-th control instruction, load the object to be processed corresponding to the i-th control instruction through the object loading unit; and,
响应于所述第i条控制指令,通过所述参数加载单元加载所述第i条控制指令对应的运行参数。In response to the i-th control instruction, the operating parameter corresponding to the i-th control instruction is loaded by the parameter loading unit.
在一实施例中,所述步骤S102包括:In an embodiment, the step S102 includes:
响应于所述第i条控制指令,将所述第i条控制指令对应的待处理数据写入脉动阵列中,通过所述脉动阵列对所述待处理数据进行运算,得到处理结果。In response to the i-th control instruction, write the to-be-processed data corresponding to the i-th control instruction into a systolic array, and perform operations on the to-be-processed data through the systolic array to obtain a processing result.
在一实施例中,所述步骤S102包括:In an embodiment, the step S102 includes:
响应于所述第i条控制指令,将所述第i条控制指令对应的待处理对象和运行参数分别写入所述脉动阵列中,通过所述脉动阵列将所述待处理对象与所述运行参数进行运算,得到处理结果。In response to the i-th control instruction, the object to be processed and the operating parameters corresponding to the i-th control instruction are respectively written into the systolic array, and the object to be processed and the operating parameter are combined through the systolic array. The parameters are calculated and the processing result is obtained.
方法实施例的具体实现方式可参见装置实施例的描述,此处不再赘述。For the specific implementation of the method embodiment, please refer to the description of the device embodiment, which will not be repeated here.
相应地,本申请实施例还提供了一种加速器,其包括上述任意一项所述的装置。Correspondingly, an embodiment of the present application also provides an accelerator, which includes any of the above-mentioned devices.
所述加速器可以应用于各种神经网络,例如卷积神经网络。The accelerator can be applied to various neural networks, such as convolutional neural networks.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. The terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes other elements that are not explicitly listed. Elements, or also include elements inherent to such processes, methods, articles, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.
以上对本发明实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本邻域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The methods and devices provided by the embodiments of the present invention are described in detail above. Specific examples are used in this article to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods and methods of the present invention. The core idea; at the same time, for ordinary technicians in this neighborhood, according to the idea of the present invention, there will be changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a reference to the present invention. limit.

Claims (57)

  1. 一种数据处理装置,其特征在于,包括控制模块、数据加载模块和处理模块;A data processing device, characterized by comprising a control module, a data loading module, and a processing module;
    所述数据加载模块,响应于所述控制模块的控制指令,加载待处理数据以供所述处理模块进行处理;The data loading module, in response to the control instruction of the control module, loads the data to be processed for processing by the processing module;
    所述处理模块,响应于所述控制模块的控制指令,进行待处理数据的处理;The processing module responds to the control instruction of the control module to process the data to be processed;
    所述控制模块,控制所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。The control module controls the data loading module and the processing module to execute different control instructions at the same time.
  2. 根据权利要求1所述的装置,其特征在于,还包括数据写回模块,The device according to claim 1, further comprising a data write-back module,
    所述数据写回模块,响应于所述控制模块的控制指令,将所述待处理数据的处理结果写入外部存储模块;The data write-back module, in response to the control instruction of the control module, writes the processing result of the data to be processed into the external storage module;
    所述控制模块,控制所述数据加载模块、所述处理模块和所述数据写回模块在同一时刻执行不同的控制指令。The control module controls the data loading module, the processing module and the data writing back module to execute different control instructions at the same time.
  3. 根据权利要求1所述的装置,其特征在于,The device of claim 1, wherein:
    所述控制模块具体用于:响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块;其中,i为整数;The control module is specifically configured to: in response to the completion of the execution of the i-th control instruction by the data loading module, send the i+1th control instruction to the data loading module; After executing the control instructions, send the i+1th control instruction to the processing module; where i is an integer;
    所述数据加载模块具体用于:响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据;The data loading module is specifically configured to: in response to the i-th control instruction, load the to-be-processed data corresponding to the i-th control instruction;
    所述处理模块具体用于:响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。The processing module is specifically configured to: in response to the i-th control instruction, process the to-be-processed data corresponding to the i-th control instruction to obtain a processing result.
  4. 根据权利要求3所述的装置,其特征在于,所述装置还包括数据写回模块;The device according to claim 3, wherein the device further comprises a data write-back module;
    所述控制模块还用于:响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块;The control module is further configured to: in response to the completion of the execution of the i-th control instruction by the data write-back module, send the i+1-th control instruction to the data write-back module;
    所述数据写回模块用于:响应于所述第i条控制指令,将所述第i条控制指令对应的处理结果写入外部存储模块。The data write-back module is configured to: in response to the i-th control instruction, write the processing result corresponding to the i-th control instruction into the external storage module.
  5. 根据权利要求3所述的装置,其特征在于,The device according to claim 3, wherein:
    当所述处理模块响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,无需等待所述处理模块对所述第i条控制指令执行完毕,所述数据加载模块接收第i+1条控制指令。When the processing module processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, there is no need to wait for the processing module to finish executing the i-th control instruction. The data loading module receives the i+1th control instruction.
  6. 根据权利要求4所述的装置,其特征在于,The device of claim 4, wherein:
    当所述数据写回模块响应于所述第i条控制指令将所述第i条控制指令对应的处理结果写入外部存储模块时,无需等待所述数据写回模块对所述第i条控制指令执行完毕,所述处理模块接收第i+1条控制指令。When the data write-back module writes the processing result corresponding to the i-th control instruction to the external storage module in response to the i-th control instruction, there is no need to wait for the data write-back module to control the i-th control instruction After the instruction is executed, the processing module receives the i+1th control instruction.
  7. 根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的至少一指令槽;所述控制模块根据至少一指令槽记载的所述控制指令的状态,控制所述控制指令的执行进程。The device according to claim 1, wherein the control module comprises at least one command slot for caching the control command; the control module controls the control command according to the state of the control command recorded in the at least one command slot The execution process of the control instruction.
  8. 根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存控制指令的至少一指令槽;所述至少一指令槽包括至少一控制指令状态信号,其中,所述控制指令状态信号用于指示对应的指令槽中缓存的控制指令是否有效。The device according to claim 1, wherein the control module includes at least one command slot for caching control commands; the at least one command slot includes at least one control command status signal, wherein the control command status The signal is used to indicate whether the control command cached in the corresponding command slot is valid.
  9. 根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存控制指令的至少一指令槽;所述至少一指令槽包括至少一控制指令完成信号;其中,所述至少一控制指令完成信号用于指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作,或者所述至少一控制指令完成信号用于指示所述至少一指令槽是否完成对应的控制指令的操作。The device according to claim 1, wherein the control module comprises at least one command slot for caching control commands; the at least one command slot comprises at least one control command completion signal; wherein, the at least one control command The instruction completion signal is used to indicate whether the module corresponding to the at least one instruction slot has completed the operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot is completed The operation of the corresponding control instruction.
  10. 根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示对应的模块完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的 控制指令无效。The device according to claim 1, wherein the control module includes an instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control When the instruction completion signal indicates that the corresponding module completes the operation of the corresponding control instruction, the control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid.
  11. 根据权利要求1所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示所述至少一指令槽完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。The device according to claim 1, wherein the control module includes an instruction slot for buffering the control instruction; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control When the instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid.
  12. 根据权利要求4所述的装置,其特征在于,所述控制模块包括用于缓存所述控制指令的指令槽;The device according to claim 4, wherein the control module includes an instruction slot for caching the control instruction;
    所述控制模块具体用于:响应于所述数据加载模块对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述处理模块;以及响应于所述数据写回模块对第i条控制指令执行完毕,将所述指令槽中缓存的第i+1条控制指令发送给所述数据写回模块。The control module is specifically configured to: in response to the completion of the execution of the i-th control instruction by the data loading module, send the i+1th control instruction buffered in the instruction slot to the data loading module; and in response to The processing module finishes executing the i-th control instruction, and sends the i+1-th control instruction cached in the instruction slot to the processing module; and responds to the data write-back module for the i-th control instruction After the execution is completed, the i+1th control instruction cached in the instruction slot is sent to the data write-back module.
  13. 根据权利要求4所述的装置,其特征在于,所述控制模块包括对应于所述数据加载模块的第二指令槽,对应于所述处理模块的第三指令槽以及对应于所述数据写回模块的第四指令槽;The device according to claim 4, wherein the control module includes a second instruction slot corresponding to the data loading module, a third instruction slot corresponding to the processing module, and a third instruction slot corresponding to the data write-back The fourth command slot of the module;
    所述第二指令槽、所述第三指令槽以及所述第四指令槽均用于:在所述控制模块分别在不同时刻将所述第i条控制指令发送完毕之后,分别缓存对应的所述第i+1条控制指令;The second instruction slot, the third instruction slot, and the fourth instruction slot are all used to: after the control module sends the i-th control instruction at different times, respectively, cache all corresponding commands. The control instruction of Article i+1;
    所述控制模块具体用于:响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中缓存的第i+1条控制指令发送给所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令槽中缓存的第i+1条控制指令发送给所述处理模块;以及响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中缓存的第i+1条控制指令发送给所述数据写回模块。The control module is specifically configured to: in response to the completion of the execution of the i-th control instruction by the data loading module, send the i+1th control instruction buffered in the second instruction slot to the data loading module; and In response to the completion of the execution of the i-th control instruction by the processing module, the i+1th control instruction buffered in the third instruction slot is sent to the processing module; and in response to the data write-back module, the first control instruction is sent to the processing module; After the i control instruction is executed, the i+1th control instruction buffered in the fourth instruction slot is sent to the data write-back module.
  14. 根据权利要求4所述的装置,其特征在于,所述控制模块包括对 应于所述数据加载模块的第二指令槽;The device according to claim 4, wherein the control module comprises a second instruction slot corresponding to the data loading module;
    所述第二指令槽用于缓存发送至所述数据加载模块的所述控制指令;The second instruction slot is used for buffering the control instruction sent to the data loading module;
    所述控制模块还用于:响应于所述第二指令槽中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽;响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中的第i+1条控制指令发送至所述数据加载模块。The control module is further configured to: in response to the i-th control instruction in the second instruction slot being invalid, buffer the i+1-th control instruction to the second instruction slot; in response to the data loading After the module finishes executing the i-th control instruction, it sends the i+1-th control instruction in the second instruction slot to the data loading module.
  15. 根据权利要求14所述的装置,其特征在于,所述控制模块还包括对应于所述处理模块的第三指令槽;The device according to claim 14, wherein the control module further comprises a third instruction slot corresponding to the processing module;
    所述第三指令槽用于缓存发送至所述处理模块的所述控制指令;The third instruction slot is used for buffering the control instruction sent to the processing module;
    所述控制模块还用于:响应于所述第三指令槽中的第i条控制指令无效,将所述第二指令槽中的第i+1条控制指令缓存至所述第三指令槽;以及响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令槽中的第i+1条控制指令发送至所述处理模块。The control module is further configured to: in response to the i-th control instruction in the third instruction slot being invalid, cache the i+1-th control instruction in the second instruction slot to the third instruction slot; And in response to the completion of the execution of the i-th control instruction by the processing module, the i+1-th control instruction in the third instruction slot is sent to the processing module.
  16. 根据权利要求15所述的装置,其特征在于,所述控制模块还包括对应于所述数据写回模块的第四指令槽;The device according to claim 15, wherein the control module further comprises a fourth instruction slot corresponding to the data write-back module;
    所述第四指令槽用于缓存发送至所述数据写回模块的所述控制指令;The fourth instruction slot is used for buffering the control instruction sent to the data write-back module;
    所述控制模块还用于:响应于所述第四指令槽中的第i条控制指令无效,将所述第三指令槽中的第i+1条控制指令缓存至所述第四指令槽;以及响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中的第i+1条控制指令发送至所述处理模块。The control module is further configured to: in response to the i-th control instruction in the fourth instruction slot being invalid, cache the i+1-th control instruction in the third instruction slot to the fourth instruction slot; And in response to the completion of the execution of the i-th control instruction by the data write-back module, the i+1-th control instruction in the fourth instruction slot is sent to the processing module.
  17. 根据权利要求14所述的装置,其特征在于,所述控制模块还包括第一指令槽;The device according to claim 14, wherein the control module further comprises a first instruction slot;
    所述第一指令槽用于缓存解码后的控制指令;The first instruction slot is used to cache decoded control instructions;
    所述控制模块还用于:对第i条控制指令进行解码;响应于所述第一指令槽中解码后的第i条控制指令无效,将解码后的第i+1条控制指令缓存至第一指令槽;以及响应于所述第二指令槽中解码后的第i条控制指令无效,将所述第一指令槽中解码后的第i+1条控制指令缓存至所述第二指 令槽。The control module is further configured to: decode the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot being invalid, buffer the decoded i+1-th control instruction to the first instruction slot. An instruction slot; and in response to the i-th control instruction decoded in the second instruction slot being invalid, buffering the i+1-th control instruction decoded in the first instruction slot to the second instruction slot .
  18. 根据权利要求2所述的装置,其特征在于,所述控制指令包括结果写回控制指令和结果不写回控制指令;The device according to claim 2, wherein the control instruction includes a result write-back control instruction and a result non-write-back control instruction;
    所述待处理数据为待处理目标数据的其中一部分;所述待处理目标数据被划分为至少两部分;The to-be-processed data is a part of the to-be-processed target data; the to-be-processed target data is divided into at least two parts;
    所述结果不写回控制指令用于指示所述处理模块缓存所述处理结果;The result not writing back control instruction is used to instruct the processing module to cache the processing result;
    所述结果写回控制指令用于指示所述处理模块将与所述待处理目标数据相关的所有处理结果发送至所述数据写回模块。The result write-back control instruction is used to instruct the processing module to send all processing results related to the target data to be processed to the data write-back module.
  19. 根据权利要求18所述的装置,其特征在于,The device of claim 18, wherein:
    所述处理模块具体用于:根据所述结果不写回控制指令,对相应的待处理数据进行处理,得到处理结果并缓存;以及根据所述结果写回控制指令,对相应的待处理数据进行处理,得到处理结果,并将与所述待处理目标数据相关的所有处理结果整合后发送至所述数据写回模块。The processing module is specifically configured to: process the corresponding data to be processed according to the result not writing back control instruction to obtain the processing result and cache it; and write back the control instruction according to the result to perform processing on the corresponding data to be processed After processing, the processing result is obtained, and all processing results related to the target data to be processed are integrated and sent to the data write-back module.
  20. 根据权利要求18所述的装置,其特征在于,The device of claim 18, wherein:
    若所述控制指令为所述结果不写回控制指令,所述控制模块还用于:接收所述处理模块在缓存所述处理结果之后发送的结束信号;以及If the control instruction is a control instruction not to write back the result, the control module is further configured to: receive an end signal sent by the processing module after buffering the processing result; and
    若所述控制指令为所述结果写回控制指令,所述控制模块还用于:接收所述数据写回模块在将所述处理结果写入完成后发送的结束信号;If the control instruction is the result write-back control instruction, the control module is further configured to: receive an end signal sent by the data write-back module after the processing result is written;
    其中,所述结束信号表征所述结果不写回控制指令或所述结果写回控制指令在所述数据处理装置中执行完毕。Wherein, the end signal indicates that the result is not written back control instruction or that the result write back control instruction has been executed in the data processing device.
  21. 根据权利要求20所述的装置,其特征在于,The device of claim 20, wherein:
    所述控制模块还用于:按照所述控制指令的接收顺序以及先进先出原则,将所述控制指令对应的结束信号返回给外部控制模块。The control module is further configured to return the end signal corresponding to the control instruction to the external control module according to the receiving order of the control instruction and the first-in-first-out principle.
  22. 根据权利要求21所述的装置,其特征在于,The device of claim 21, wherein:
    所述控制模块还用于:如果按照所述控制指令的接收顺序以及先进先出原则,确定当前接收到的结束信号不是当前要发送的,缓存所述当前接收到的结束信号。The control module is further configured to: if it is determined that the currently received end signal is not currently to be sent according to the receiving order of the control instructions and the first-in-first-out principle, buffer the currently received end signal.
  23. 根据权利要求1所述的装置,其特征在于,所述待处理数据包括待处理对象和运行参数。The device according to claim 1, wherein the data to be processed includes an object to be processed and operating parameters.
  24. 根据权利要求23所述的装置,其特征在于,所述待处理对象包括以下任意一种:图像、音频或文字;The device according to claim 23, wherein the object to be processed comprises any one of the following: image, audio or text;
    所述运行参数包括以下任意一种:卷积核、池化参数或激活函数。The operating parameters include any one of the following: a convolution kernel, a pooling parameter, or an activation function.
  25. 根据权利要求23所述的装置,其特征在于,所述数据加载模块包括对象加载单元和参数加载单元;The device according to claim 23, wherein the data loading module comprises an object loading unit and a parameter loading unit;
    所述控制模块具体用于:响应于所述对象加载单元对第i条控制指令执行完毕,将第i+1条控制指令发送给所述对象加载单元;以及响应于所述参数加载单元对第i条控制指令执行完毕,将第i+1条控制指令发送给所述参数加载单元;The control module is specifically configured to: in response to the completion of the execution of the i-th control instruction by the object loading unit, send the i+1-th control instruction to the object loading unit; After the execution of the i control instruction is completed, the i+1th control instruction is sent to the parameter loading unit;
    所述对象加载单元用于:响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理对象;The object loading unit is configured to: in response to the i-th control instruction, load the object to be processed corresponding to the i-th control instruction;
    所述参数加载单元用于:响应于所述第i条控制指令,加载所述第i条控制指令对应的运行参数。The parameter loading unit is configured to load the operating parameter corresponding to the i-th control instruction in response to the i-th control instruction.
  26. 根据权利要求1所述的装置,其特征在于,所述处理模块包括脉动阵列;The device according to claim 1, wherein the processing module comprises a systolic array;
    所述处理模块具体用于:响应于所述第i条控制指令,将所述第i条控制指令对应的待处理数据写入脉动阵列中,通过所述脉动阵列对所述待处理数据进行运算,得到所述处理结果。The processing module is specifically configured to: in response to the i-th control instruction, write the data to be processed corresponding to the i-th control instruction into a systolic array, and perform operations on the data to be processed through the systolic array , To obtain the processing result.
  27. 根据权利要求23所述的装置,其特征在于,所述处理模块包括脉动阵列;The device according to claim 23, wherein the processing module comprises a systolic array;
    所述处理模块具体用于:响应于所述第i条控制指令,将所述第i条控制指令对应的待处理对象和运行参数分别写入所述脉动阵列中,通过所述脉动阵列将所述待处理对象与所述运行参数进行运算,得到所述处理结果。The processing module is specifically configured to: in response to the i-th control instruction, write the to-be-processed objects and operating parameters corresponding to the i-th control instruction into the systolic array, respectively, and use the systolic array to write all The object to be processed is calculated with the operating parameter to obtain the processing result.
  28. 根据权利要求1所述的装置,其特征在于,The device of claim 1, wherein:
    所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括指令缓存标志和控制状态信号的集合;The control module includes an instruction slot for caching the control instruction; the instruction slot includes a set of instruction cache flags and control state signals;
    其中,所述指令缓存标志用于指示在所述指令槽中缓存的指令是否有效;所述控制状态信号的集合用于表示对应模块的工作状态,或者表示与所述指令槽相关的其他指令槽的工作状态。Wherein, the instruction cache flag is used to indicate whether the instruction cached in the instruction slot is valid; the set of control status signals is used to indicate the working status of the corresponding module, or indicate other instruction slots related to the instruction slot Working status.
  29. 一种数据处理方法,其特征在于,应用于数据处理装置上,所述数据处理装置包括数据加载模块和处理模块;所述方法包括:A data processing method, characterized in that it is applied to a data processing device, the data processing device includes a data loading module and a processing module; the method includes:
    响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理;以及,In response to the control instruction, load data through the data loading module for the processing module to perform data processing; and,
    响应于所述控制指令,通过所述处理模块进行数据处理;其中,所述数据加载模块和所述处理模块在同一时刻执行不同的控制指令。In response to the control instruction, data processing is performed by the processing module; wherein the data loading module and the processing module execute different control instructions at the same time.
  30. 根据权利要求29所述的方法,其特征在于,所述方法还包括:The method according to claim 29, wherein the method further comprises:
    响应于所述控制指令,将所述待处理数据的处理结果写入外部存储模块;其中,所述数据加载模块、所述处理模块和所述数据写回模块在同一时刻执行不同的控制指令。In response to the control instruction, the processing result of the to-be-processed data is written into an external storage module; wherein the data loading module, the processing module, and the data write-back module execute different control instructions at the same time.
  31. 根据权利要求29所述的方法,其特征在于,所述方法还包括:The method according to claim 29, wherein the method further comprises:
    响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块;以及响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块;其中,i为整数;In response to the completion of the execution of the i-th control instruction by the data loading module, the i+1th control instruction is sent to the data loading module; and in response to the completion of the execution of the i-th control instruction by the processing module, the i+1 control instructions are sent to the processing module; where i is an integer;
    所述响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理,包括:The loading data through the data loading module in response to the control instruction for the processing module to perform data processing includes:
    响应于所述第i条控制指令,加载所述第i条控制指令对应的待处理数据;In response to the i-th control instruction, load the to-be-processed data corresponding to the i-th control instruction;
    所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:The performing data processing by the processing module in response to the control instruction includes:
    响应于所述第i条控制指令,对所述第i条控制指令对应的待处理数据进行处理,得到处理结果。In response to the i-th control instruction, the to-be-processed data corresponding to the i-th control instruction is processed to obtain a processing result.
  32. 根据权利要求31所述的方法,其特征在于,所述装置还包括数据 写回模块;The method according to claim 31, wherein the device further comprises a data write-back module;
    所述方法还包括:The method also includes:
    响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块;In response to the completion of the execution of the i-th control instruction by the data write-back module, sending the i+1-th control instruction to the data write-back module;
    响应于所述第i条控制指令,通过所述数据写回模块将所述第i条控制指令对应的处理结果写入外部存储模块。In response to the i-th control instruction, the processing result corresponding to the i-th control instruction is written into the external storage module through the data write-back module.
  33. 根据权利要求31所述的方法,其特征在于,还包括:The method according to claim 31, further comprising:
    当所述处理模块响应于所述第i条控制指令对所述第i条控制指令对应的待处理数据进行处理时,无需等待所述处理模块对所述第i条控制指令执行完毕,所述数据加载模块接收第i+1条控制指令。When the processing module processes the data to be processed corresponding to the i-th control instruction in response to the i-th control instruction, there is no need to wait for the processing module to finish executing the i-th control instruction. The data loading module receives the i+1th control instruction.
  34. 根据权利要求32所述的方法,其特征在于,还包括:The method according to claim 32, further comprising:
    当所述数据写回模块响应于所述第i条控制指令将所述第i条控制指令对应的处理结果写入外部存储模块时,无需等待所述数据写回模块对所述第i条控制指令执行完毕,所述处理模块接收第i+1条控制指令。When the data write-back module writes the processing result corresponding to the i-th control instruction to the external storage module in response to the i-th control instruction, there is no need to wait for the data write-back module to control the i-th control instruction After the instruction is executed, the processing module receives the i+1th control instruction.
  35. 根据权利要求29所述的方法,其特征在于,还包括:The method according to claim 29, further comprising:
    通过至少一指令槽缓存所述控制指令,以及根据至少一指令槽记载的所述控制指令的状态,控制所述控制指令的执行进程。The control instruction is cached by at least one instruction slot, and the execution process of the control instruction is controlled according to the state of the control instruction recorded in the at least one instruction slot.
  36. 根据权利要求29所述的方法,其特征在于,还包括:The method according to claim 29, further comprising:
    通过至少一指令槽缓存所述控制指令;所述至少一指令槽包括至少一控制指令状态信号,其中,所述控制指令状态信号用于指示对应的指令槽中缓存的控制指令是否有效。The control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction status signal, wherein the control instruction status signal is used to indicate whether the control instruction cached in the corresponding instruction slot is valid.
  37. 根据权利要求29所述的方法,其特征在于,还包括:The method according to claim 29, further comprising:
    通过至少一指令槽缓存所述控制指令;所述至少一指令槽包括至少一控制指令完成信号;其中,所述至少一控制指令完成信号用于指示所述至少一指令槽对应的模块是否完成所述至少一指令槽对应的控制指令的操作,或者所述至少一控制指令完成信号用于指示所述至少一指令槽是否完成对应的控制指令的操作。The control instruction is cached by at least one instruction slot; the at least one instruction slot includes at least one control instruction completion signal; wherein, the at least one control instruction completion signal is used to indicate whether the module corresponding to the at least one instruction slot has completed the The operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes the operation of the corresponding control instruction.
  38. 根据权利要求29所述的方法,其特征在于,还包括:The method according to claim 29, further comprising:
    通过至少一指令槽缓存所述控制指令;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示对应的模块完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。The control instruction is cached by at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the corresponding module to complete the operation of the corresponding control instruction, the The control instruction status signal indicates that the control instruction buffered in the corresponding instruction slot is invalid.
  39. 根据权利要求29所述的方法,其特征在于,还包括:The method according to claim 29, further comprising:
    通过至少一指令槽缓存所述控制指令;所述指令槽包括控制指令状态信号和控制指令完成信号;其中,当所述控制指令完成信号指示所述至少一指令槽完成对应的控制指令的操作时,所述控制指令状态信号指示对应的指令槽中缓存的控制指令无效。The control instruction is cached by at least one instruction slot; the instruction slot includes a control instruction status signal and a control instruction completion signal; wherein, when the control instruction completion signal instructs the at least one instruction slot to complete the operation of the corresponding control instruction , The control command status signal indicates that the control command buffered in the corresponding command slot is invalid.
  40. 根据权利要求31所述的方法,其特征在于,还包括:The method according to claim 31, further comprising:
    通过至少一指令槽缓存所述控制指令;其中,发送至所述数据加载模块、所述处理模块以及所述数据写回模块的第i+1条控制指令从所述指令槽中获取。The control instruction is cached by at least one instruction slot; wherein the i+1th control instruction sent to the data loading module, the processing module, and the data write-back module is obtained from the instruction slot.
  41. 根据权利要求31所述的方法,其特征在于,还包括:The method according to claim 31, further comprising:
    通过第二指令槽缓存对应于所述数据加载模块的控制指令,以及通过第三指令槽缓存对应于所述处理模块的控制指令,以及通过第三指令槽缓存对应于所述数据写回模块的控制指令。Cache the control instructions corresponding to the data loading module through the second instruction slot, cache the control instructions corresponding to the processing module through the third instruction slot, and cache the control instructions corresponding to the data write-back module through the third instruction slot. Control instruction.
  42. 根据权利要求32所述的方法,其特征在于,还包括:The method according to claim 32, further comprising:
    响应于所述第二指令槽中的第i条控制指令无效,将所述第i+1条控制指令缓存至所述第二指令槽;In response to the i-th control instruction in the second instruction slot being invalid, buffering the i+1-th control instruction to the second instruction slot;
    所述响应于所述数据加载模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据加载模块,包括:The sending the i+1th control instruction to the data loading module in response to the completion of the execution of the i-th control instruction by the data loading module includes:
    响应于所述数据加载模块对第i条控制指令执行完毕,将所述第二指令槽中的第i+1条控制指令发送至所述数据加载模块。In response to the completion of the execution of the i-th control instruction by the data loading module, the (i+1)th control instruction in the second instruction slot is sent to the data loading module.
  43. 根据权利要求42所述的方法,其特征在于,还包括:The method according to claim 42, further comprising:
    响应于所述第三指令槽中的第i条控制指令无效,将所述第二指令槽 中的第i+1条控制指令缓存至所述第三指令槽;In response to the i-th control instruction in the third instruction slot being invalid, buffering the i+1-th control instruction in the second instruction slot to the third instruction slot;
    所述响应于所述处理模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述处理模块,包括:The sending the i+1th control instruction to the processing module in response to the completion of the execution of the i-th control instruction by the processing module includes:
    响应于所述处理模块对第i条控制指令执行完毕,将所述第三指令槽中的第i+1条控制指令发送至所述处理模块。In response to the completion of the execution of the i-th control instruction by the processing module, the i+1-th control instruction in the third instruction slot is sent to the processing module.
  44. 根据权利要求43所述的方法,其特征在于,还包括:The method according to claim 43, further comprising:
    响应于所述第四指令槽中的第i条控制指令无效,将所述第三指令槽中的第i+1条控制指令缓存至所述第四指令槽;In response to the i-th control instruction in the fourth instruction slot being invalid, buffering the i+1-th control instruction in the third instruction slot to the fourth instruction slot;
    所述响应于所述数据写回模块对第i条控制指令执行完毕,将第i+1条控制指令发送至所述数据写回模块,包括:The sending the i+1th control instruction to the data write-back module in response to the completion of the execution of the i-th control instruction by the data write-back module includes:
    响应于所述数据写回模块对第i条控制指令执行完毕,将所述第四指令槽中的第i+1条控制指令发送至所述处理模块。In response to the completion of the execution of the i-th control instruction by the data write-back module, the i+1-th control instruction in the fourth instruction slot is sent to the processing module.
  45. 根据权利要求42所述的方法,其特征在于,还包括:The method according to claim 42, further comprising:
    对第i条控制指令进行解码;响应于所述第一指令槽中解码后的第i条控制指令无效,将解码后的第i+1条控制指令缓存至第一指令槽;以及,Decode the i-th control instruction; in response to the decoded i-th control instruction in the first instruction slot being invalid, buffer the decoded i+1-th control instruction to the first instruction slot; and,
    响应于所述第二指令槽中解码后的第i条控制指令无效,将所述第一指令槽中解码后的第i+1条控制指令缓存至所述第二指令槽。In response to the invalidation of the i-th control instruction decoded in the second instruction slot, the i+1-th control instruction decoded in the first instruction slot is cached in the second instruction slot.
  46. 根据权利要求30所述的方法,其特征在于,所述控制指令包括结果写回控制指令和结果不写回控制指令;The method according to claim 30, wherein the control instruction includes a result write-back control instruction and a result non-write-back control instruction;
    所述待处理数据为待处理目标数据的其中一部分;所述待处理目标数据被划分为至少两部分;The to-be-processed data is a part of the to-be-processed target data; the to-be-processed target data is divided into at least two parts;
    所述结果不写回控制指令用于指示所述处理模块缓存所述处理结果;The result not writing back control instruction is used to instruct the processing module to cache the processing result;
    所述结果写回控制指令用于指示所述处理模块将与所述待处理目标数据相关的所有处理结果发送至所述数据写回模块。The result write-back control instruction is used to instruct the processing module to send all processing results related to the target data to be processed to the data write-back module.
  47. 根据权利要求46所述的方法,其特征在于,所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:The method of claim 46, wherein the processing of data by the processing module in response to the control instruction comprises:
    根据所述结果不写回控制指令,对相应的待处理数据进行处理,得到 处理结果并缓存;以及,According to the result, the control instruction is not written back, and the corresponding data to be processed is processed, and the processing result is obtained and cached; and,
    根据所述结果写回控制指令,对相应的待处理数据进行处理,得到处理结果,并将与所述待处理目标数据相关的所有处理结果整合后发送至所述数据写回模块。According to the result write-back control instruction, the corresponding to-be-processed data is processed to obtain the processing result, and all the processing results related to the to-be-processed target data are integrated and sent to the data write-back module.
  48. 根据权利要求47所述的方法,其特征在于,还包括:The method according to claim 47, further comprising:
    若所述控制指令为所述结果不写回控制指令,接收所述处理模块在缓存所述处理结果之后发送的结束信号;以及,If the control instruction is a control instruction not to write back the result, receiving an end signal sent by the processing module after buffering the processing result; and,
    若所述控制指令为所述结果写回控制指令,接收所述数据写回模块在将所述处理结果写入完成后发送的结束信号;其中,所述结束信号表征所述结果不写回控制指令或所述结果写回控制指令在所述数据处理装置中执行完毕。If the control instruction is the result write-back control instruction, receive the end signal sent by the data write-back module after the processing result is written; wherein, the end signal indicates that the result is not written back control The instruction or the result write-back control instruction is executed in the data processing device.
  49. 根据权利要求48所述的方法,其特征在于,还包括:The method according to claim 48, further comprising:
    按照所述控制指令的接收顺序以及先进先出原则,将所述控制指令对应的结束信号返回给外部控制模块。According to the receiving order of the control instructions and the first-in-first-out principle, the end signal corresponding to the control instruction is returned to the external control module.
  50. 根据权利要求49所述的方法,其特征在于,还包括:The method of claim 49, further comprising:
    如果按照所述控制指令的接收顺序以及先进先出原则,确定当前接收到的结束信号不是当前要发送的,缓存所述当前接收到的结束信号。If it is determined that the currently received end signal is not currently to be sent according to the receiving order of the control instructions and the first-in-first-out principle, the currently received end signal is buffered.
  51. 根据权利要求29所述的方法,其特征在于,所述待处理数据包括待处理对象和运行参数。The method according to claim 29, wherein the data to be processed includes the object to be processed and operating parameters.
  52. 根据权利要求51所述的方法,其特征在于,所述待处理对象包括以下任意一种:图像、音频或文字;The method according to claim 51, wherein the object to be processed comprises any one of the following: image, audio or text;
    所述运行参数包括以下任意一种:卷积核、池化参数或激活函数。The operating parameters include any one of the following: a convolution kernel, a pooling parameter, or an activation function.
  53. 根据权利要求51所述的方法,其特征在于,所述数据加载模块包括对象加载单元和参数加载单元;The method according to claim 51, wherein the data loading module comprises an object loading unit and a parameter loading unit;
    所述响应于控制指令,通过所述数据加载模块加载数据以供所述处理模块进行数据处理,包括:The loading data through the data loading module in response to the control instruction for the processing module to perform data processing includes:
    响应于所述第i条控制指令,通过所述对象加载单元加载所述第i条 控制指令对应的待处理对象;以及,In response to the i-th control instruction, load the object to be processed corresponding to the i-th control instruction through the object loading unit; and,
    响应于所述第i条控制指令,通过所述参数加载单元加载所述第i条控制指令对应的运行参数。In response to the i-th control instruction, the operating parameter corresponding to the i-th control instruction is loaded by the parameter loading unit.
  54. 根据权利要求29所述的方法,其特征在于,所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:The method according to claim 29, wherein the performing data processing by the processing module in response to the control instruction comprises:
    响应于所述第i条控制指令,将所述第i条控制指令对应的待处理数据写入脉动阵列中,通过所述脉动阵列对所述待处理数据进行运算,得到处理结果。In response to the i-th control instruction, write the to-be-processed data corresponding to the i-th control instruction into a systolic array, and perform operations on the to-be-processed data through the systolic array to obtain a processing result.
  55. 根据权利要求51所述的方法,其特征在于,所述响应于所述控制指令,通过所述处理模块进行数据处理,包括:The method according to claim 51, wherein the performing data processing by the processing module in response to the control instruction comprises:
    响应于所述第i条控制指令,将所述第i条控制指令对应的待处理对象和运行参数分别写入所述脉动阵列中,通过所述脉动阵列将所述待处理对象与所述运行参数进行运算,得到处理结果。In response to the i-th control instruction, the object to be processed and the operating parameters corresponding to the i-th control instruction are respectively written into the systolic array, and the object to be processed and the operating parameter are combined through the systolic array. The parameters are calculated and the processing result is obtained.
  56. 根据权利要求29所述的装置,其特征在于,The device of claim 29, wherein:
    所述控制模块包括用于缓存所述控制指令的指令槽;所述指令槽包括指令缓存标志和控制状态信号的集合;The control module includes an instruction slot for caching the control instruction; the instruction slot includes a set of instruction cache flags and control state signals;
    其中,所述指令缓存标志用于指示在所述指令槽中缓存的指令是否有效;所述控制状态信号的集合用于表示对应模块的工作状态,或者表示与所述指令槽相关的其他指令槽的工作状态。Wherein, the instruction cache flag is used to indicate whether the instruction cached in the instruction slot is valid; the set of control status signals is used to indicate the working status of the corresponding module, or indicate other instruction slots related to the instruction slot Working status.
  57. 一种加速器,其特征在于,包括如权利要求1至28任意一项所述的数据处理装置。An accelerator, characterized by comprising the data processing device according to any one of claims 1 to 28.
PCT/CN2020/078876 2020-03-11 2020-03-11 Data processing device, data processing method and accelerator WO2021179224A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080004332.0A CN112602094A (en) 2020-03-11 2020-03-11 Data processing apparatus, data processing method, and accelerator
PCT/CN2020/078876 WO2021179224A1 (en) 2020-03-11 2020-03-11 Data processing device, data processing method and accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/078876 WO2021179224A1 (en) 2020-03-11 2020-03-11 Data processing device, data processing method and accelerator

Publications (1)

Publication Number Publication Date
WO2021179224A1 true WO2021179224A1 (en) 2021-09-16

Family

ID=75208096

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078876 WO2021179224A1 (en) 2020-03-11 2020-03-11 Data processing device, data processing method and accelerator

Country Status (2)

Country Link
CN (1) CN112602094A (en)
WO (1) WO2021179224A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1433538A (en) * 1999-12-03 2003-07-30 英特尔公司 Method and apparatus for constructing pre-scheduled instruction cache
CN104424158A (en) * 2013-08-19 2015-03-18 上海芯豪微电子有限公司 General unit-based high-performance processor system and method
CN108475347A (en) * 2017-11-30 2018-08-31 深圳市大疆创新科技有限公司 Method, apparatus, accelerator, system and the movable equipment of Processing with Neural Network
US20190012170A1 (en) * 2017-07-05 2019-01-10 Deep Vision, Inc. Deep vision processor
CN109937416A (en) * 2017-05-17 2019-06-25 谷歌有限责任公司 Low time delay matrix multiplication component
US20190236009A1 (en) * 2018-01-30 2019-08-01 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1433538A (en) * 1999-12-03 2003-07-30 英特尔公司 Method and apparatus for constructing pre-scheduled instruction cache
CN104424158A (en) * 2013-08-19 2015-03-18 上海芯豪微电子有限公司 General unit-based high-performance processor system and method
CN109937416A (en) * 2017-05-17 2019-06-25 谷歌有限责任公司 Low time delay matrix multiplication component
US20190012170A1 (en) * 2017-07-05 2019-01-10 Deep Vision, Inc. Deep vision processor
CN108475347A (en) * 2017-11-30 2018-08-31 深圳市大疆创新科技有限公司 Method, apparatus, accelerator, system and the movable equipment of Processing with Neural Network
US20190236009A1 (en) * 2018-01-30 2019-08-01 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths

Also Published As

Publication number Publication date
CN112602094A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN109240746B (en) Apparatus and method for performing matrix multiplication operation
US11200724B2 (en) Texture processor based ray tracing acceleration method and system
JP2020526830A (en) Computational Accelerator
CN111008040B (en) Cache device and cache method, computing device and computing method
CN107766079B (en) Processor and method for executing instructions on processor
JP6345231B2 (en) Externally programmable memory management unit
CN107315563B (en) Apparatus and method for performing vector compare operations
RU2643499C2 (en) Memory control
CN112633505B (en) RISC-V based artificial intelligence reasoning method and system
CN107315716B (en) Device and method for executing vector outer product operation
CN111651202A (en) Device for executing vector logic operation
JP2021034020A5 (en)
CN114201107A (en) Storage device, method for operating storage device, and electronic device
CA3240485A1 (en) Atomicity retaining method and processor, and electronic device
CN115237599A (en) Rendering task processing method and device
WO2017185419A1 (en) Apparatus and method for executing operations of maximum value and minimum value of vectors
CN111552652B (en) Data processing method and device based on artificial intelligence chip and storage medium
WO2021179224A1 (en) Data processing device, data processing method and accelerator
US20140331021A1 (en) Memory control apparatus and method
US8321869B1 (en) Synchronization using agent-based semaphores
US20220197647A1 (en) Near-memory determination of registers
CN114327639A (en) Accelerator based on data flow architecture, and data access method and equipment of accelerator
CN106502775A (en) The method and system of DSP algorithm is dispatched in a kind of timesharing
US10387155B2 (en) Controlling register bank access between program and dedicated processors in a processing system
JPS601655B2 (en) Data prefetch method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20924697

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20924697

Country of ref document: EP

Kind code of ref document: A1