CN116820733A - Data processing device and method - Google Patents

Data processing device and method Download PDF

Info

Publication number
CN116820733A
CN116820733A CN202210264771.6A CN202210264771A CN116820733A CN 116820733 A CN116820733 A CN 116820733A CN 202210264771 A CN202210264771 A CN 202210264771A CN 116820733 A CN116820733 A CN 116820733A
Authority
CN
China
Prior art keywords
selector
data
calculation
input end
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210264771.6A
Other languages
Chinese (zh)
Inventor
李�诚
位经传
迟朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Huawei Technologies Co Ltd
Original Assignee
Tsinghua University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Huawei Technologies Co Ltd filed Critical Tsinghua University
Priority to CN202210264771.6A priority Critical patent/CN116820733A/en
Publication of CN116820733A publication Critical patent/CN116820733A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4084Scaling of whole images or parts thereof, e.g. expanding or contracting in the transform domain, e.g. fast Fourier transform [FFT] domain scaling

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The application provides a data processing device and a method, which relate to the field of image processing, wherein the data processing device comprises a control module and a calculation module, and the control module is used for generating indication information according to a calculation process to be executed; the computing module includes M reconfigurable computing units for performing a computing process by at least one of the M reconfigurable computing units: a multiply-accumulate computation unit, an activation quantization computation unit, or a pooling computation unit, the at least one computation unit being determined from the indication information. The scheme of the embodiment of the application can finish a plurality of calculation processes in one device, and has better resource utilization rate. In addition, when the calculation module includes a plurality of reconfigurable calculation units, the plurality of reconfigurable calculation units can calculate the data to be processed at the same time, so that the efficiency of target detection can be further improved.

Description

Data processing device and method
Technical Field
The present application relates to the field of image processing, and more particularly, to a data processing apparatus and method.
Background
Object detection is an emerging application in the field of computer vision, which refers to determining the location of an object image of interest in a given image and marking those locations. The calculation process involved in the target detection network algorithm includes convolution calculation, target region of interest pooling (region of interest pooling, roI pooling) calculation, target region of interest calibration (region of interest align, roI alignment) calculation, image scaling calculation, and the like.
In the prior art, a central processing unit (central processing unit, CPU) or dedicated hardware circuits are typically employed for target detection. However, performing target detection with the CPU may reduce the efficiency of target detection; the dedicated hardware circuit is used for target detection, which increases the occupied area and power consumption of the chip, and also causes resource waste when the hardware circuit is not used for target detection, for example, the dedicated hardware circuit is used for convolution calculation and RoI calculation respectively, and the hardware circuit used for RoI calculation may be in an idle state when the convolution calculation is performed. Therefore, how to improve the detection efficiency and achieve a better resource utilization rate in the target detection process is a problem to be solved.
Disclosure of Invention
The application provides a data processing device and a data processing method, which not only can improve the efficiency of target detection, but also can have better resource utilization rate.
In a first aspect, a data processing apparatus is provided, including a control module and a calculation module, where the calculation module includes M reconfigurable calculation units, and the control module is configured to generate instruction information according to a calculation process to be executed; a computing module for performing a computing process by at least one of the following of the M reconfigurable computing units: the multiply-accumulate computing unit, the activation quantization computing unit or the pooling computing unit, at least one computing unit is determined according to the indication information, wherein M is a positive integer.
Based on the above technical solution, according to the indication information of the control module, the calculation module may perform a calculation process (such as convolution calculation, also referred to as pooling calculation of the target region of interest, also referred to as calibration calculation of the target region of interest, also referred to as image scaling calculation) by at least one calculation unit of the M reconfigurable calculation units. For different computing processes, the computing module may multiplex a portion of the M reconfigurable computing units. That is, by different combinations of the computing units in the reconfigurable computing unit, different computing processes are completed, and it is possible to improve the resource utilization rate while ensuring the processing efficiency. In addition, when the calculation module includes a plurality of reconfigurable calculation units, the plurality of reconfigurable calculation units can calculate the data to be processed at the same time, so that the efficiency of target detection can be further improved.
With reference to the first aspect, in some implementations of the first aspect, the multiply-accumulate computing unit, the activation quantization computing unit, and the pooling computing unit are sequentially connected, the computing module further includes a plurality of selectors, the plurality of selectors includes a first selector, a second selector, and a third selector, wherein a first input terminal of the first selector is connected to the control module, a second input terminal of the first selector is connected to the second storage module, a first input terminal of the multiply-accumulate computing unit is connected to an output terminal of the first selector, and a second input terminal of the multiply-accumulate computing unit is connected to the first storage module; the first input end of the second selector is connected with the output end of the activation quantization calculation unit, the second input end of the second selector is connected with the first storage module, and the output end of the second selector is connected with the input end of the pooling calculation unit; the first input end of the third selector is connected with the output end of the pooling calculation unit, and the second input end of the third selector is connected with the output end of the multiply-accumulate calculation unit.
Based on the above technical solution, the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit in the computing module are sequentially connected, and the input end (e.g., the first input end, the second input end, the pooling computing unit) and the output end of the plurality of selectors (e.g., the first selector, the second selector, the third selector) are respectively connected with one of the computing units (e.g., the multiply-accumulate computing unit, the activation quantization computing unit, the pooling computing unit) or the module. In the scheme of the embodiment of the application, the computing units participating in computation in the M reconfigurable computing units can be adjusted by adjusting the connection relation between the input end and the output end of the selector. In this way, for different calculation processes, the selector can be configured such that at least one calculation unit participates in the calculation, i.e. performs the respective calculation process. Namely, different combinations of the calculation units are reconstructed through the selector to finish different calculation processes, so that the resource utilization rate can be improved while the processing efficiency is ensured.
With reference to the first aspect, in some implementations of the first aspect, the indication information includes configuration information of selectors, where the configuration information of the selectors is used to indicate connection relationships between output ends and input ends of the plurality of selectors, respectively, and when a calculation process to be performed is convolution calculation, the output end of the first selector is connected to the second input end of the first selector, the output end of the second selector is connected to the first input end of the second selector, and the output end of the third selector is connected to the first input end of the third selector; when the calculation process to be executed is the target region pooling calculation, the output end of the second selector is connected with the second input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; when the calculation process to be executed is the target region of interest calibration calculation, the output end of the first selector is connected with the first input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; or when the calculation process to be performed is image scaling calculation, the output end of the first selector is connected with the first input end of the first selector, and the output end of the third selector is connected with the second input end of the third selector.
Based on the above technical solution, the plurality of selectors may connect the output terminal with one of the input terminals according to the configuration information, wherein when the output terminal of the first selector is connected with one of the input terminals, the at least one calculation unit for performing the calculation process may include a multiply-accumulate calculation unit; when the output of the second selector and/or the third selector is connected to one of the inputs, the at least one calculation unit for performing the calculation process may further comprise a calculation unit connected to the one of the inputs. At this time, the at least one computing unit participates in the computation, that is, performs a corresponding computation process, so that a plurality of computation processes (such as convolution computation, and for example, target region pooling computation, and for example, target region calibration computation, and for example, image scaling computation) can be completed through one device, and the resource utilization rate is better.
With reference to the first aspect, in certain implementation manners of the first aspect, the control module is further configured to generate a first storage address and a first read-write control signal; the first memory module is used for reading M groups of first data according to the first memory address and the first read-write control signal; the first storage module is further configured to send the M sets of first data to the M reconfigurable computing units.
Based on the technical scheme, the plurality of reconfigurable computing units can compute the first data at the same time, so that the efficiency of target detection can be further improved.
With reference to the first aspect, in certain implementations of the first aspect, when the computing process to be performed is a target region of interest pooling computation, the first data is feature map data.
With reference to the first aspect, in certain implementation manners of the first aspect, the control module is further configured to generate a second storage address and a second read-write control signal; the second storage module is used for reading second data according to the second storage address and the second read-write control signal; the second storage module is further used for sending the second data to the M reconfigurable computing units through the first selector.
Based on the technical scheme, the plurality of reconfigurable computing units can simultaneously compute the second data, so that the efficiency of target detection can be further improved.
With reference to the first aspect, in certain implementations of the first aspect, when the computing process to be performed is a convolution computation, the first data is convolution kernel data and the second data is feature map data.
With reference to the first aspect, in certain implementation manners of the first aspect, the control module is further configured to generate third data; the control module is further used for sending the third data to the M reconfigurable computing units through the first selector.
In this way, the plurality of reconfigurable computing units can simultaneously calculate the third data, so that the efficiency of target detection can be further improved.
With reference to the first aspect, in certain implementations of the first aspect, when the calculation process to be performed is a target region of interest calibration calculation or an image scaling calculation, the first data is feature map data, and the third data is weight data.
In a second aspect, a data processing method is provided, including: the control module generates indication information according to a calculation process to be executed; the computing module comprises M reconfigurable computing units, and the computing module performs a computing process through at least one computing unit of the M reconfigurable computing units: the multiply-accumulate computing unit, the activation quantization computing unit or the pooling computing unit, at least one computing unit is determined according to the indication information, wherein M is a positive integer.
Based on the above technical solution, according to the indication information of the control module, the calculation module may perform a calculation process (such as convolution calculation, also referred to as pooling calculation of the target region of interest, also referred to as calibration calculation of the target region of interest, also referred to as image scaling calculation) by at least one calculation unit of the M reconfigurable calculation units. For different computing processes, the computing module may multiplex a portion of the M reconfigurable computing units. That is, by different combinations of the computing units in the reconfigurable computing unit, different computing processes are completed, and it is possible to improve the resource utilization rate while ensuring the processing efficiency. In addition, when the calculation module includes a plurality of reconfigurable calculation units, the plurality of reconfigurable calculation units can calculate the data to be processed at the same time, so that the efficiency of target detection can be further improved.
With reference to the second aspect, in some implementations of the second aspect, the multiply-accumulate computing unit, the activation quantization computing unit, and the pooling computing unit are sequentially connected, and the computing module further includes a plurality of selectors, where the plurality of selectors includes a first selector, a second selector, and a third selector, a first input terminal of the first selector is connected to the control module, a second input terminal of the first selector is connected to the second storage module, a first input terminal of the multiply-accumulate computing unit is connected to an output terminal of the first selector, and a second input terminal of the multiply-accumulate computing unit is connected to the first storage module; the first input end of the second selector is connected with the output end of the activation quantization calculation unit, the second input end of the second selector is connected with the first storage module, and the output end of the second selector is connected with the input end of the pooling calculation unit; the first input end of the third selector is connected with the output end of the pooling calculation unit, and the second input end of the third selector is connected with the output end of the multiply-accumulate calculation unit.
With reference to the second aspect, in some implementations of the second aspect, the indication information includes configuration information of selectors, where the configuration information of the selectors is used to indicate connection relationships between output ends and input ends of the plurality of selectors, respectively, and when the calculation process to be performed is convolution calculation, an output end of the first selector is connected to a second input end of the first selector, an output end of the second selector is connected to a first input end of the second selector, and an output end of the third selector is connected to a first input end of the third selector; when the calculation process to be executed is the target region pooling calculation, the output end of the second selector is connected with the second input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; or when the calculation process to be executed is the target region of interest calibration calculation, the output end of the first selector is connected with the first input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; or when the calculation process to be performed is image scaling calculation, the output end of the first selector is connected with the first input end of the first selector, and the output end of the third selector is connected with the second input end of the third selector.
With reference to the second aspect, in certain implementations of the second aspect, the control module generates a first memory address and a first read-write control signal; the first storage module reads M groups of first data according to the first storage address and the first read-write control signal; the first storage module sends M groups of first data to M reconfigurable computing units.
With reference to the second aspect, in certain implementations of the second aspect, when the computing process to be performed is a target region of interest pooling computation, the first data is feature map data.
With reference to the second aspect, in certain implementations of the second aspect, the control module generates a second memory address and a second read-write control signal; the second storage module reads second data according to a second storage address and a second read-write control signal; the second storage module sends the second data to the M reconfigurable computing units through the first selector.
With reference to the second aspect, in some implementations of the second aspect, when the calculation process to be performed is a convolution calculation, the first data is convolution kernel data, and the second data is feature map data.
With reference to the second aspect, in certain implementations of the second aspect, the control module generates third data; the control module sends the third data to the M reconfigurable computing units through the first selector.
With reference to the second aspect, in certain implementations of the second aspect, when the calculation process to be performed is a target region of interest calibration calculation or an image scaling calculation, the first data is feature map data, and the third data is weight data.
In a third aspect, there is provided a data processing apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the second aspect and any implementation manner of the second aspect when the program stored in the memory is executed.
The processor in the third aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).
In a fourth aspect, there is provided a computer readable storage medium storing program code for execution by a device, the program code comprising instructions for performing the method of the second aspect or any implementation of the second aspect.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the second or third aspects described above.
In a sixth aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, performing the method of the second aspect or any implementation of the second aspect.
Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in the second aspect or any implementation manner of the second aspect.
A seventh aspect provides a system on a chip (SoC) comprising the data processing apparatus of the first aspect or any implementation of the first aspect.
An eighth aspect provides an electronic device comprising the data processing apparatus of the first aspect or any implementation of the first aspect.
Drawings
Fig. 1 shows a schematic configuration of a data processing apparatus 100 applied to an embodiment of the present application.
Fig. 2 shows a schematic block diagram of a data processing apparatus 200 according to an embodiment of the application.
Fig. 3 shows a schematic structural diagram of a data processing apparatus 300 according to an embodiment of the present application.
Fig. 4 shows a schematic flow chart of a convolution calculation process provided by an embodiment of the present application.
Fig. 5 shows a schematic flow chart of a target region of interest pooling calculation procedure according to an embodiment of the present application.
Fig. 6 is a schematic flowchart of a target region of interest calibration calculation procedure according to an embodiment of the present application.
Fig. 7 is a schematic flowchart of an image scaling calculation process according to an embodiment of the present application.
Fig. 8 is a schematic flow chart of a data processing method according to an embodiment of the present application.
Fig. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The technical scheme of the application will be described below with reference to the accompanying drawings.
Artificial intelligence (artificial intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, man-machine interaction, recommendation and search, AI-based theory, and the like.
In the AI field, object detection is an emerging application direction in the field of computer vision, which refers to determining the positions of an object image of interest in a given image and labeling these positions. There are a variety of network algorithms based on target detection, such as a regional convolutional neural network (region-convolutional neural networks, R-CNN), a Fast regional convolutional neural network (Fast region-convolutional neural networks, fast R-CNN). Fast R-CNN's target detection is faster than R-CNN and does not require additional storage space to preserve extracted feature information.
The calculation processes involved in the target detection network algorithm include convolution calculation, target region of interest pooling (region of interest pooling, roI pooling) calculation, target region of interest calibration (region of interest align, roI alignment) calculation, and image scaling calculation. The RoI mapping calculation is to pool the corresponding area into a feature map with fixed size in the feature map according to the position coordinates of the pre-selected frame so as to carry out subsequent classification and bounding box regression operation. Because the position of the pre-selected frame is usually obtained by model regression, generally is a floating point number, and the pooled feature map requires a fixed size, the RoI pooling calculation generally has a twice-quantization process; compared with RoI mapping calculation, the RoI alignment calculation does not need to carry out quantization operation on image values, obtains the image values on pixel points with floating point coordinates by using a bilinear interpolation method, and carries out pooling operation on the obtained image values, so that the calculation accuracy can be further improved.
In the prior art, a central processing unit (central processing unit, CPU) or dedicated hardware circuits are typically employed for target detection. The speed of target detection by adopting a CPU is slower, and the target detection is carried out by adopting a special hardware circuit, so that the special hardware circuit is required to be designed for each calculation process involved in a target detection network algorithm, the occupied area and the power consumption of a chip are increased, and the hardware circuit is in an idle state when not used for target detection, thereby further causing the waste of resources.
In view of the above technical problems, the present application provides a data processing apparatus, by which the efficiency of target detection can be improved and better resource utilization can be achieved when the target detection is performed.
Various embodiments provided by the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 shows a schematic configuration of a data processing apparatus 100 applied to an embodiment of the present application. As shown in fig. 1, the data processing apparatus 100 includes a control module 110, a storage module 120, and a calculation module 130, wherein the control module 110 is connected to a general control part through an interface.
The master control part may pre-configure the data processing apparatus through the interface and read out the data buffered in the memory module 120. The pre-configuration may include pre-configuration of the memory address space, and other pre-configuration information, which is not limited by the embodiment of the present application. Based on the pre-configuration of the master portion, the control module 110 may configure the storage module 120 and/or the computing module 130, e.g., by pre-configuring the storage address space by the master portion, the control module 110 may generate the storage address, so that the storage module 120 may obtain the storage address of the corresponding data based on the storage address generated by the control module 110.
The control module 110 may be configured to configure the storage module 120 and/or the computing module 130, and the master control portion may read out a plurality of data buffered in the storage module 120 through the interface.
The computing module 130 may be configured to process the data to be processed. In an embodiment of the present application, the computing module 130 includes a plurality of reconfigurable computing units. The computing module 130 may perform the computing process by at least one of the following of the plurality of reconfigurable computing units: a multiply-accumulate computing unit, an activation quantization computing unit and a pooling computing unit. For example, when the calculation process is a convolution calculation, the calculation unit for processing data may include a multiply-accumulate calculation unit, an activation quantization calculation unit, and a pooling calculation unit; for another example, when the calculation process is a pooled calculation for the target region of interest, the calculation unit for processing the data may include a pooled calculation unit; for another example, when the calculation process is a target region of interest calibration calculation, the calculation unit for processing the data may include a multiply-accumulate calculation unit and a pooling calculation unit; for another example, when the calculation process is an image scaling calculation, the calculation unit for processing data may include a multiply-accumulate calculation unit.
Based on the schematic structural diagram of the data processing apparatus shown in fig. 1, the processing flow of the data processing apparatus provided in the embodiment of the present application may include the following steps:
in the first step, the master control part may perform pre-configuration on the data processing apparatus through the interface, where the pre-configuration may include pre-configuration of the storage address space, and other pre-configuration information, which is not limited by the embodiment of the present application. Based on the pre-configuration of the master control portion, the control module 110 may configure the storage module 120 and/or the computing module 130.
In the second step, the calculation module 130 obtains the data to be processed, and calculates the data to be processed. Wherein the calculating module 130 obtains data to be processed, including: the control module 110 may send the generated memory address and the read-write control signal to the memory module 120, and accordingly, the memory module 120 may read data to be processed (for example, feature map data) based on the memory address and the read-write control signal and send the data to be processed to the calculation module 130; alternatively, the control module 110 may directly generate the data to be processed (e.g., weight data) and send the data to be processed to the computing module 130.
In a possible manner, when the data to be processed is larger, the computing module 130 may acquire the data to be processed multiple times, that is, the computing module 130 may acquire a part of the data to be processed first, acquire the next part of the data to be processed after the calculation of the part of the data to be processed is completed, and calculate the next part of the data to be processed until all the calculation of the data to be processed is completed.
Third, the storage module 120 caches the data obtained after calculation by the calculation module 130, and the master control part reads the data through the interface.
In a possible manner, when more than one calculation process is to be performed, the storage module 120 may buffer the data obtained by the calculation module 130 according to the current calculation process, the master control portion performs the pre-configuration on the next calculation process to be performed through the interface, performs the first step and the second step again, and buffers the data obtained by the next calculation process to be performed in the storage module 120. After all the calculation processes are executed, the master control part reads the finally calculated data through the interface.
Based on the above technical solution, the data processing apparatus provided in the embodiments of the present application may complete a plurality of calculation processes (such as convolution calculation, e.g. pooling calculation of a target region of interest, e.g. calibration calculation of a target region of interest, and e.g. scaling calculation of an image) through one apparatus, and when data to be processed is relatively large, the calculation module may also complete calculation of the data to be processed through a block processing manner (i.e. dividing the data to be processed into a plurality of parts for processing), so as to not only improve efficiency of target detection, but also have relatively good resource utilization. In addition, when the calculation module includes a plurality of reconfigurable calculation units, the plurality of reconfigurable calculation units can calculate the data to be processed at the same time, so that the efficiency of target detection can be further improved.
In an embodiment of the present application, the data processing apparatus may complete a plurality of calculation processes. The reconfigurable computing unit included in the computing module 130 is designed such that the computing module 130 can perform computation of a plurality of computing processes, which will be described in detail later with reference to fig. 2; the design of the specific connection of the modules in the data processing apparatus will be described in detail later with reference to fig. 3 to 7.
Fig. 2 is a schematic block diagram of a data processing apparatus 200 according to an embodiment of the present application. As shown in fig. 2, the data processing apparatus 200 may include a control module 210 and a calculation module 220. Wherein the control module 210 may be the control module 110 of fig. 1, and the calculation module 220 may be the calculation module 130 of fig. 1.
The control module 210 is configured to generate the indication information according to a calculation process to be performed.
The calculation process may include convolution calculation, target region of interest pooling calculation, target region of interest calibration calculation, and image scaling calculation, among others.
The computing module 220 includes M reconfigurable computing units; a computing module 220 for performing a computing process by at least one of the following of the M reconfigurable computing units: a multiply-accumulate computation unit, an activation quantization computation unit, or a pooling computation unit. The at least one calculation unit is determined according to the indication information generated by the control module 210, where M is a positive integer.
It should be understood that, in the embodiment of the present application, the number M of reconfigurable computing units used for performing the computing process may be less than or equal to the total number N of reconfigurable computing units, where N is a positive integer, and the number of reconfigurable computing units used for performing the computing process is not specifically limited in the embodiment of the present application. For example, when M is smaller than N, the number of reconfigurable computing units for performing the computing process is smaller than the total number of reconfigurable computing units, that is, when N-M reconfigurable computing units are in an idle state; for another example, when M is equal to N, the number of reconfigurable computing units for performing the computing process is equal to the total number of reconfigurable computing units, that is, there are no reconfigurable computing units in an idle state at this time.
Illustratively, the computing module 220 may perform the computing process by at least one computing unit of the M reconfigurable computing units, it being understood that the computing module 220 may perform the computing process by M reconfigurable computing units, the at least one computing unit participating in the computation being the same among the M reconfigurable computing units.
For ease of understanding and description, the different computing processes are exemplified below by the computing module 220 performing the computing process by at least one of the one reconfigurable computing units being equal to 1.
There are four possible ways to perform the calculation process by at least one calculation unit.
In a possible manner, when the calculation process to be performed is a convolution calculation, the calculation module 220 may perform the convolution calculation by the multiply-accumulate calculation unit, the activated quantization calculation unit, and the pooled calculation unit in the reconfigurable calculation unit, and accordingly, the calculation module 220 is configured as a calculation module for performing the convolution calculation.
In another possible manner, when the calculation process to be performed is a target region of interest pooling calculation, the calculation module 220 may perform the target region of interest pooling calculation by the pooling calculation unit of the reconfigurable calculation units at this time, and accordingly, the calculation module 220 is configured as a calculation module for performing the target region of interest pooling calculation.
In another possible manner, when the calculation procedure to be performed is the target region of interest calibration calculation, the calculation module 220 may perform the target region of interest calibration calculation by the multiply-accumulate calculation unit, the activation quantization calculation unit, and the pooling calculation unit in the reconfigurable calculation unit, and accordingly, the calculation module 220 is configured as a calculation module for performing the target region of interest calibration calculation.
Alternatively, when the calculation process to be performed is an image scaling calculation, the calculation module 220 may perform the image scaling calculation by a multiply-accumulate calculation unit of the reconfigurable calculation units, and accordingly, the calculation module 220 is configured as a calculation module for performing the image scaling calculation.
It should be noted that, when M is greater than 1, the at least one computing unit participating in the computation of the M reconfigurable computing units may refer to the above description, and in order to avoid repetition, a description thereof is omitted here.
Based on the above technical solution, according to the indication information of the control module, the calculation module may perform a calculation process (such as convolution calculation, also referred to as pooling calculation of the target region of interest, also referred to as calibration calculation of the target region of interest, also referred to as image scaling calculation) by at least one calculation unit of the M reconfigurable calculation units. For different computing processes, the computing module may multiplex a portion of the M reconfigurable computing units. That is, by different combinations of the computing units in the reconfigurable computing unit, different computing processes are completed, and it is possible to improve the resource utilization rate while ensuring the processing efficiency. In addition, when the calculation module includes a plurality of reconfigurable calculation units, the plurality of reconfigurable calculation units can calculate the data to be processed at the same time, so that the efficiency of target detection can be further improved.
For ease of understanding, the structure of a data processing apparatus suitable for use in embodiments of the present application is described below in connection with fig. 3 to 7. Fig. 3 mainly describes a case where the data processing apparatus includes three modules of a control module, a storage module, and a calculation module, a plurality of calculation processes are completed by the apparatus. That is, the structure of the data processing apparatus shown in fig. 3 may generate the instruction information through the control module, store the data to be processed used by the calculation process to be performed by the storage module, and calculate the data to be processed by the plurality of reconfigurable calculation units in the calculation module, so that a plurality of calculation processes (such as convolution calculation, also such as target region pooling calculation, also such as target region calibration calculation, also such as image scaling calculation) can be completed. Fig. 4 to 7 mainly describe specific flows of respective calculation processes performed by the data processing apparatus shown in fig. 3, wherein fig. 4 is a specific flow of a convolution calculation process performed by the data processing apparatus shown in fig. 3, fig. 5 is a specific flow of a target region of interest pooling calculation process performed by the data processing apparatus shown in fig. 3, fig. 6 is a specific flow of a target region of interest calibration calculation process performed by the data processing apparatus shown in fig. 3, and fig. 7 is a specific flow of an image scaling calculation process performed by the data processing apparatus shown in fig. 3.
In the following examples, it is assumed that the above-mentioned control module (e.g., the control module 110, and the control module 210) may be the control module #1 in fig. 3 to 7, the above-mentioned storage module (e.g., the storage module 120) may be the first storage module in fig. 3 to 7, the second storage module in fig. 3 to 7, the above-mentioned calculation module (e.g., the calculation module 130, and the calculation module 220) may be the calculation module in fig. 3 to 7, and the above-mentioned data to be processed may be the first data stored in the first storage module, the second data stored in the second storage module, and the third data generated by the control module. It should be understood that the naming of the modules or other naming does not limit the scope of embodiments of the present application.
Fig. 3 shows a schematic structural diagram of a data processing apparatus 300 according to an embodiment of the present application. The apparatus 300 may include the following modules.
(1) The control module #1 may be configured to configure a memory module (e.g., a first memory module, and also e.g., a second memory module) and/or a computing module.
In one possible manner, the control module #1 is configured to generate indication information according to a calculation process to be performed, where the indication information is used to determine at least one of the following M reconfigurable calculation units in the calculation module: a multiply-accumulate computation unit, an activation quantization computation unit, or a pooling computation unit.
In another possible manner, the control module #1 is further configured to generate the first memory address and the first read/write control signal. In this way, the first memory module may read M sets of first data (e.g., feature map data, and also, e.g., convolution kernel data) according to the first memory address and the first read/write control signal.
In another possible manner, the control module #1 is further configured to generate a second memory address and a second read/write control signal. In this way, the second memory module can read second data (such as feature map data) according to the second memory address and the second read/write control signal.
In another possible manner, the control module #1 is further configured to generate third data, and send the third data to the M reconfigurable computing units through the first selector. For example, when the calculation process to be performed is the target region of interest calibration calculation, the third data is the weight data generated by the control module #1, and the control module #1 may send the generated third data (i.e., the weight data) to the M multiply-accumulate calculation units through the first selector; for another example, when the calculation process to be performed is an image scaling calculation, the third data is weight data generated by the control module #1, and the control module #1 may multicast the generated third data (i.e., the weight data) to the M multiply-accumulate calculation units through the first selector. In this way, the plurality of reconfigurable computing units can simultaneously calculate the third data, so that the efficiency of target detection can be further improved.
Wherein the control module #1 sends the third data to the M reconfigurable computing units through the first selector, may include: when the number of the first selectors is equal to 1, the control module #1 may multicast the third data to the M reconfigurable computing units through the 1 first selectors; when the number of the first selectors is equal to N, the control module #1 may transmit the third data to the M reconfigurable computing units through the M first selectors.
(2) The first memory module may be configured to read M sets of first data according to the first memory address and the first read/write control signal, and send the M sets of first data to the M reconfigurable computing units. The first storage module reads M groups of first data, which may include: the first memory module has N memory banks, and M memory banks in the N memory banks can read M groups of first data. For example, when the calculation process to be performed is convolution calculation, the first data is convolution kernel data, and M memory banks in the first memory module may read M groups of first data (i.e., convolution kernel data) and send the read M groups of first data (i.e., convolution kernel data) to M multiply-accumulate calculation units; for another example, when the calculation process to be performed is pooling calculation of the target region of interest, the first data is feature map data, M memory banks in the first memory module may read M sets of first data (i.e., feature map data), and send the read M sets of first data (i.e., feature map data) to M pooling calculation units through the second selector; for another example, when the calculation process to be performed is the target region of interest calibration calculation, the first data is feature map data, and M memory banks in the first memory module may read M sets of first data (i.e., feature map data), and send the read M sets of first data (i.e., feature map data) to M multiply-accumulate calculation units; for another example, when the calculation process to be performed is image scaling calculation, the first data is feature map data, and M banks in the first storage module may read M sets of first data (i.e., feature map data) and send the read M sets of first data (i.e., feature map data) to M multiply-accumulate calculation units. In this way, the plurality of reconfigurable computing units can simultaneously compute the first data, so that the efficiency of target detection can be further improved.
(3) The second storage module is used for reading second data according to a second storage address and a second read-write control signal and sending the second data to the M reconfigurable computing units through the first selector. For example, when the calculation process to be performed is convolution calculation, the second data is the feature map data read by the second storage module, and the second storage module may send the read second data (i.e., the feature map data) to the M multiply-accumulate calculation units through the first selector. In this way, the plurality of reconfigurable computing units can simultaneously calculate the second data, so that the efficiency of target detection can be further improved.
Wherein the second storage module sends the second data to the M reconfigurable computing units through the first selector may include: when the number of the first selectors is equal to 1, the control module #1 may multicast the second data to the M reconfigurable computing units through the 1 first selectors; when the number of the first selectors is equal to N, the control module #1 may transmit the second data to the M reconfigurable computing units through the M first selectors.
(4) The computing module comprises M reconfigurable computing units; the computing module is used for executing a computing process through at least one computing unit of the M reconfigurable computing units: a multiply-accumulate computation unit, an activation quantization computation unit, or a pooling computation unit.
Optionally, the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit are connected in sequence, and the computing module further comprises a plurality of selectors, wherein the plurality of selectors comprises a first selector, a second selector and a third selector. The connection modes of the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit with the plurality of selectors can comprise:
i. the first input end of the first selector is connected with the control module #1, the second input end of the first selector is connected with the second storage module, the first input end of the multiply-accumulate calculating unit is connected with the output end of the first selector, and the second input end of the multiply-accumulate calculating unit is connected with the first storage module.
Optionally, the output terminal of the first selector is connected to the first input terminal of the multiply-accumulate calculating unit, and may include: when the number Y of the first selectors is equal to 1, the output ends of the 1 first selectors are connected with first input ends of a plurality of multiply-accumulate computing units, wherein the multiply-accumulate computing units are each multiply-accumulate computing unit in N multiply-accumulate computing units; when the number Y of the first selectors is equal to N, an output terminal of each of the N first selectors is connected to a first input terminal of a plurality of multiply-accumulate computing units, where the plurality of multiply-accumulate computing units is each of the N multiply-accumulate computing units.
And the first input end of the second selector is connected with the output end of the activated quantization calculation unit, the second input end of the second selector is connected with the first storage module, and the output end of the second selector is connected with the input end of the pooling calculation unit. The number of the second selectors, the storage bodies in the first storage module, the activation quantization calculation units and the pooling calculation units is N, that is, the first input end of each second selector is connected with the output end of each activation quantization calculation unit, the second input end of each second selector is connected with each storage body in the first storage module, and the output end of each second selector is connected with the input end of each pooling calculation unit.
The first input end of the third selector is connected with the output end of the pooling calculation unit, the second input end of the third selector is connected with the output end of the multiply-accumulate calculation unit, and the output end of the third selector is connected with the third storage module and is connected to the first storage module or the second storage module through the third storage module and the control module # 2. The number of the third selectors, the multiply-accumulate computing units and the pooling computing units is N, that is, the first input end of each third selector is connected with the output end of each pooling computing unit, and the second input end of each third selector is connected with the output end of each multiply-accumulate computing unit.
Based on the above technical solution, the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit in the computing module are sequentially connected, and the input end (e.g., the first input end, the second input end, the pooling computing unit) and the output end of the plurality of selectors (e.g., the first selector, the second selector, the third selector) are respectively connected with one of the computing units (e.g., the multiply-accumulate computing unit, the activation quantization computing unit, the pooling computing unit) or the module. In the scheme of the embodiment of the application, the computing units participating in computation in the M reconfigurable computing units can be adjusted by adjusting the connection relation between the input end and the output end of the selector. In this way, for different calculation processes, the selector can be configured such that at least one calculation unit participates in the calculation, i.e. performs the respective calculation process. Namely, different combinations of the calculation units are reconstructed through the selector to finish different calculation processes, so that the resource utilization rate can be improved while the processing efficiency is ensured.
Illustratively, the computing module may perform the computing process by at least one computing unit of the M reconfigurable computing units, it being understood that the computing module 220 may perform the computing process by M reconfigurable computing units, the at least one computing unit participating in the computation being the same among the M reconfigurable computing units.
Optionally, at least one calculation unit for performing the calculation process is determined according to the indication information generated by the control module #1, where the indication information includes configuration information of the selectors, and the configuration information of the selectors is used to indicate connection relationships between the output ends and the input ends of the plurality of selectors, respectively. In this way, the plurality of selectors may connect the output terminal with one of the input terminals according to the configuration information, wherein the at least one calculation unit for performing the calculation process may include a multiply-accumulate calculation unit when the output terminal of the first selector is connected with one of the input terminals; when the output of the second selector and/or the third selector is connected to one of the inputs, the at least one calculation unit for performing the calculation process may further comprise a calculation unit connected to the one of the inputs.
It should be understood that in the embodiment of the present application, the input terminals of the plurality of selectors (for example, the first selector, for example, the second selector, for example, the third selector) may include two input terminals, which are respectively denoted as a first input terminal and a second input terminal, and one of the input terminals of the plurality of selectors may be the first input terminal or the second input terminal, and when the calculation process performed by the calculation module is different, one of the input terminals connected to the output terminals of the plurality of selectors may be different.
For ease of understanding and description, the different computing processes are exemplified below by M being equal to 1, i.e. by the computing module performing the computing process by at least one of the one reconfigurable computing units.
The following four possible scenarios are possible by at least one computing unit performing the computation process.
Case # a: when the configuration information of the selectors is used to indicate that the output end of the first selector is connected to the second input end of the first selector, the output end of the second selector is connected to the first input end of the second selector, and the output end of the third selector is connected to the first input end of the third selector, based on the connection relationship between the input ends of the plurality of selectors and the module or the computing unit in this case (i.e., the second input end of the first selector is connected to the second storage module, the first input end of the second selector is connected to the output end of the activated quantization computing unit, and the first input end of the third selector is connected to the output end of the pooled computing unit), at least one computing unit for executing the computing process includes a multiply-accumulate computing unit, an activated quantization computing unit, and a pooled computing unit, that is, the specific implementation steps of the current computing process may include: first, the multiply-accumulate computing unit receives first data (i.e., convolution kernel data) from a first memory module and receives second data (i.e., feature map data) from a second memory module through a first selector; secondly, activating the quantization calculation unit to receive the data calculated by the multiply-accumulate calculation unit; thirdly, the pooling calculation unit receives the data calculated by the activation quantization calculation unit through a second selector; and finally, the pooling calculation unit sends the calculated data to a third storage module through a third selector, and the third storage module caches the data calculated by the pooling calculation unit. Accordingly, the calculation process performed by the calculation module at this time may be a convolution calculation.
Case # B: when the configuration information of the selectors is used to indicate that the output terminal of the second selector is connected to the second input terminal of the second selector, respectively, and the output terminal of the third selector is connected to the first input terminal of the third selector, based on the connection relationship between the input terminals of the plurality of selectors and the module or the computing unit in this case (i.e., the second input terminal of the second selector is connected to the first storage module, and the first input terminal of the third selector is connected to the output terminal of the pooling computing unit), at least one computing unit for executing the computing process includes the pooling computing unit, that is, the specific implementation steps of the current computing process may include: firstly, the pooling calculation unit receives the first data (i.e. the feature map data) from the first storage module through the second selector, and secondly, the pooling calculation unit sends the calculated data to the third storage module through the third selector, and the third storage module caches the data calculated by the pooling calculation unit. Accordingly, the calculation process performed by the calculation module at this time may pool the calculation for the target region of interest.
Case #c: when the configuration information of the selectors is used to indicate that the output end of the first selector is connected to the first input end of the first selector, the output end of the second selector is connected to the first input end of the second selector, and the output end of the third selector is connected to the first input end of the third selector, based on the connection relationship between the input ends of the plurality of selectors and the module or the computing unit in this case (i.e., the first input end of the first selector is connected to the control module, the first input end of the second selector is connected to the output end of the activated quantization computing unit, and the first input end of the third selector is connected to the output end of the pooled computing unit), at least one computing unit for executing the computing process includes a multiply-accumulate computing unit, an activated quantization computing unit, and a pooled computing unit, that is, the specific implementation steps of the current computing process may include: firstly, the multiply-accumulate calculating unit receives first data (i.e. feature map data) from the first storage module and receives third data (i.e. weight data) from the control module through the first selector; secondly, activating the quantization calculation unit to receive the data calculated by the multiply-accumulate calculation unit; thirdly, the pooling calculation unit receives the data calculated by the activation quantization calculation unit through a second selector; and finally, the pooling calculation unit sends the calculated data to a third storage module through a third selector, and the third storage module caches the data calculated by the pooling calculation unit. Accordingly, the calculation process performed by the calculation module at this time may calibrate the calculation for the target region of interest.
Case # D: when the configuration information of the selectors is used to indicate that the output terminal of the first selector is connected to the first input terminal of the first selector, respectively, and the output terminal of the third selector is connected to the second input terminal of the third selector, based on the connection relationship between the input terminals of the plurality of selectors and the module or the calculation unit in this case (i.e., the first input terminal of the first selector is connected to the control module, and the second input terminal of the third selector is connected to the output terminal of the multiply-accumulate calculation unit), at least one calculation unit for executing the calculation process includes the multiply-accumulate calculation unit, that is, the specific implementation steps of the current calculation process may include: firstly, the multiply-accumulate calculating unit receives first data (i.e. feature map data) from the first storage module and receives third data (i.e. weight data) from the control module through the first selector; and secondly, the multiply-accumulate calculating unit sends the calculated data to a third storage module through a third selector, and the third storage module caches the data calculated by the multiply-accumulate calculating unit. Accordingly, the computing process performed by the computing module at this time may be an image scaling computation.
It should be noted that, when M is greater than 1, the at least one computing unit participating in the computation of the M reconfigurable computing units may refer to the above description, and in order to avoid repetition, a description thereof is omitted here.
Based on the above technical solution, the plurality of selectors may connect the output terminal with one of the input terminals according to the configuration information, wherein when the output terminal of the first selector is connected with one of the input terminals, the at least one calculation unit for performing the calculation process may include a multiply-accumulate calculation unit; when the output of the second selector and/or the third selector is connected to one of the inputs, the at least one calculation unit for performing the calculation process may further comprise a calculation unit connected to the one of the inputs. At this time, the at least one computing unit participates in the computation, that is, performs a corresponding computation process, so that a plurality of computation processes (such as convolution computation, and for example, target region pooling computation, and for example, target region calibration computation, and for example, image scaling computation) can be completed through one device, and the resource utilization rate is better. In addition, when the calculation module includes a plurality of reconfigurable calculation units, the plurality of reconfigurable calculation units can calculate the data to be processed at the same time, so that the efficiency of target detection can be further improved.
(5) The control module #2 may be configured to send the data buffered in the third storage module to the first storage module or the second storage module, so that the master control portion can read the data stored in the first storage module or the second storage module through the interface.
(6) And the third storage module can be used for caching the data after the calculation module and sending the data to the first storage module or the second storage module through the control module #2 so that the master control part can read the data stored in the first storage module or the second storage module through the interface.
Alternatively, when the data processing apparatus is used for more than one calculation process to be performed, the first storage module and/or the second storage module may store data of the calculation process that has been performed, and reconfigure the data processing apparatus by the control module #1 and/or the control module #2 so that the data processing apparatus may continue with the next calculation process. After all calculation processes are completed, the data stored in the first storage module and/or the second storage module are final data, and at this time, the master control part can read the final data stored in the first storage module or the second storage module through the interface.
Optionally, when the data to be processed is larger, the reconfigurable computing unit may acquire the data to be processed multiple times, a specific computing unit in the reconfigurable computing unit may calculate the data to be processed multiple times, and cache the data obtained by each calculation in the first storage module and/or the second storage module until the final data calculated by the specific computing unit is acquired, so that the efficiency of the reconfigurable computing unit when being used for calculating a large data block can be increased. For example, when the data for performing convolution computation (i.e., the first data is convolution kernel data and the second data is feature map data) is relatively large, the multiply-accumulate computation unit may first acquire part of the data, perform multiply-accumulate computation on the part of the data, store the data after the multiply-accumulate computation on the part of the data in the first storage module, and obtain a plurality of part of the data by performing multiply-accumulate computation on the data multiple times, where the plurality of part of the data are stored in the first storage module, until all the data multiply-accumulate computation is completed, the first storage module may send the cached plurality of part of the data to the multiply-accumulate computation units, and perform summation operation on the plurality of part of the data by the multiply-accumulate computation unit, thereby completing the whole multiply-accumulate computation process.
Based on the above technical solution, in the target detection process, the data processing apparatus provided by the embodiment of the present application may complete multiple calculation processes (such as convolution calculation, for example, target region pooling calculation, for example, target region calibration calculation, for example, image scaling calculation), and the first storage module may send multiple sets of first data to multiple reconfigurable calculation units, the second storage module may send second data to multiple reconfigurable calculation units through the first selector, the control module #1 may also send third data to multiple reconfigurable calculation units through the first selector, and the multiple reconfigurable calculation units may calculate the first data and/or the second data at the same time, or the multiple reconfigurable calculation units may calculate the first data and/or the third data at the same time, so that the data processing apparatus may not only improve the efficiency of target detection, but also may have a better resource utilization ratio. In addition, the data processing device provided by the embodiment of the application can be used as an accelerator chip of an independent neural network processor (neural-network processing unit, NPU), and also can be used as an intellectual property (intellectual property, IP) core to be integrated into a system on a chip (SoC) to realize the acceleration operation of a network algorithm (such as a target detection network algorithm).
Fig. 4 is a schematic flowchart illustrating a convolution calculation process according to an embodiment of the present application, where the data processing apparatus shown in fig. 4 is the data processing apparatus 300 in fig. 3, that is, the data processing apparatus 300 is used to perform convolution calculation, and a specific description may refer to the data processing apparatus 300, and in order to avoid repetition, a part of repeated description is omitted hereinafter.
The control module #1 generates instruction information according to convolution calculation to be performed, wherein the instruction information is used for determining the following calculation units in the M reconfigurable calculation units in the calculation module: a multiply-accumulate computing unit, an activation quantization computing unit and a pooling computing unit.
The control module #1 generates a first memory address and a first read-write control signal. In this way, the first memory module can read M sets of first data (i.e., convolution kernel data) according to the first memory address and the first read-write control signal.
The control module #1 generates a second memory address and a second read/write control signal. In this way, the second memory module can read the second data (i.e., the feature map data) according to the second memory address and the second read/write control signal.
The first storage module reads M groups of first data according to the first storage address and the first read-write control signal, and sends the M groups of first data to M reconfigurable computing units. For example, the first memory module has N memory banks, and M memory banks of the N memory banks can read M sets of first data.
The first data is convolution kernel data, and M memory banks in the first memory module can read M groups of first data (namely, convolution kernel data) and send the read M groups of first data (namely, convolution kernel data) to M multiply-accumulate computing units. In this way, the plurality of reconfigurable computing units can simultaneously compute the first data, so that the efficiency of target detection can be further improved.
The second storage module reads second data according to the second storage address and the second read-write control signal, and sends the second data to the M reconfigurable computing units through the first selector.
The second data is the feature map data read by the second storage module, and the second storage module can send the read second data (i.e. the feature map data) to the M multiply-accumulate computing units through the first selector. In this way, the plurality of reconfigurable computing units can simultaneously calculate the second data, so that the efficiency of target detection can be further improved.
For a specific process of the second storage module transmitting the second data to the M reconfigurable computing units through the first selector, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description will be omitted herein.
The calculation module may perform convolution calculations by a multiply-accumulate calculation unit, an activation quantization calculation unit, and a pooling calculation unit of the M reconfigurable calculation units.
Illustratively, the calculation module may perform the convolution calculation by a multiply-accumulate calculation unit, an activated quantization calculation unit, and a pooled calculation unit of M reconfigurable calculation units, and it may be understood that the calculation module may perform the convolution calculation by M reconfigurable calculation units, among which the multiply-accumulate calculation unit, the activated quantization calculation unit, and the pooled calculation unit participating in the convolution calculation are the same.
Optionally, the multiply-accumulate computation unit, the activation quantization computation unit and the pooling computation unit for performing convolution computation are determined according to indication information generated by the control module #1, the indication information including configuration information of a selector, the configuration information of the selector being used to indicate that an output terminal of the first selector is connected to a second input terminal of the first selector, respectively, an output terminal of the second selector is connected to a first input terminal of the second selector, and when an output terminal of the third selector is connected to a first input terminal of the third selector, the at least one computation unit for performing computation includes the multiply-accumulate computation unit, the activation quantization computation unit and the pooling computation unit based on a connection relation between the input terminals of the plurality of selectors and the module or the computation unit (i.e., the second input terminal of the first selector is connected to the second storage module, the first input terminal of the second selector is connected to an output terminal of the activation quantization computation unit, and the first input terminal of the third selector is connected to an output terminal of the pooling computation unit). For example, as shown in fig. 4, the specific implementation steps of the current computing process may include: first, the multiply-accumulate computing unit receives first data (i.e., convolution kernel data) from a first memory module and receives second data (i.e., feature map data) from a second memory module through a first selector; secondly, activating the quantization calculation unit to receive the data calculated by the multiply-accumulate calculation unit; thirdly, the pooling calculation unit receives the data calculated by the activation quantization calculation unit through a second selector; and finally, the pooling calculation unit sends the calculated data to a third storage module through a third selector, and the third storage module caches the data calculated by the pooling calculation unit.
For the description of the control module #1 and the third storage module, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description thereof will be omitted.
Based on the above technical scheme, in the process of performing target detection, the first storage module may send multiple sets of first data (i.e., convolution kernel data) to the multiple reconfigurable computing units, and the second storage module may send second data (i.e., feature map data) to the multiple reconfigurable computing units through the first selector, where the multiple reconfigurable computing units may calculate the first data and the second data at the same time, so that the data processing apparatus may complete the convolution computing process, and may further improve the efficiency of target detection.
Fig. 5 is a schematic flowchart of a process of pooling calculation of a target area of interest according to an embodiment of the present application, where the data processing apparatus shown in fig. 5 is the data processing apparatus 300 in fig. 3, that is, the data processing apparatus 300 is used to perform pooling calculation of a target area of interest, and for specific description, reference may be made to the data processing apparatus 300, and for avoiding repetition, a part of repeated description is omitted hereinafter.
The control module #1 generates indication information according to the pooling calculation of the target region of interest to be executed, wherein the indication information is used for determining the pooling calculation unit in the M reconfigurable calculation units in the calculation module.
The control module #1 generates a first memory address and a first read-write control signal. In this way, the first memory module can read M sets of first data (i.e., the feature map data) according to the first memory address and the first read/write control signal.
The first storage module reads M groups of first data according to the first storage address and the first read-write control signal, and sends the M groups of first data to M reconfigurable computing units. For example, the first memory module has N memory banks, and M memory banks of the N memory banks can read M sets of first data.
The first data is feature map data, and M memory banks in the first memory module may read M sets of first data (i.e., feature map data), and send the read M sets of first data (i.e., feature map data) to M pooling calculation units through the second selector. In this way, the plurality of reconfigurable computing units can simultaneously compute the first data, so that the efficiency of target detection can be further improved.
The calculation module may perform the target region of interest pooling calculation by a pooling calculation unit of the M reconfigurable calculation units.
Illustratively, the computing module may perform the target region of interest pooling computation by a pooling computing unit of the M reconfigurable computing units, and it may be understood that the computing module 220 may perform the target region of interest pooling computation by M reconfigurable computing units, where the pooled computing units participating in the computation are the same.
Alternatively, the pooling calculation unit for performing the target area of interest pooling calculation is determined according to the indication information generated by the control module #1, the indication information including the configuration information of the selector. The configuration information of the selector is used to indicate that the output end of the second selector is connected to the second input end of the second selector, and when the output end of the third selector is connected to the first input end of the third selector, at least one computing unit for executing the computing process includes a pooling computing unit based on the connection relationship between the input ends of the plurality of selectors and the module or the computing unit in this case (i.e., the second input end of the second selector is connected to the first storage module, and the first input end of the third selector is connected to the output end of the pooling computing unit). For example, as shown in fig. 5, the specific implementation steps of the current computing process may include: firstly, the pooling calculation unit receives the first data (i.e. the feature map data) from the first storage module through the second selector, and secondly, the pooling calculation unit sends the calculated data to the third storage module through the third selector, and the third storage module caches the data calculated by the pooling calculation unit.
For the description of the control module #1 and the third storage module, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description thereof will be omitted.
Based on the above technical scheme, in the process of performing target detection, the first storage module may send multiple sets of first data (i.e., feature map data) to multiple reconfigurable computing units, where the multiple reconfigurable computing units may simultaneously compute the first data, so that the data processing apparatus may complete the process of pooling computation of the target region of interest. The efficiency of target detection can be further improved.
Fig. 6 is a schematic flowchart of a process of calculating the target area of interest calibration according to an embodiment of the present application, where the data processing apparatus shown in fig. 6 is the data processing apparatus 300 in fig. 3, that is, the data processing apparatus 300 is used to perform the target area of interest calibration calculation, and a detailed description may refer to the data processing apparatus 300, and in order to avoid repetition, a part of repeated description is omitted hereinafter.
The control module #1 generates indication information according to target region of interest calibration calculation to be performed, wherein the indication information is used for determining the following calculation units in the M reconfigurable calculation units in the calculation module: a multiply-accumulate computing unit, an activation quantization computing unit and a pooling computing unit.
The control module #1 generates a first memory address and a first read-write control signal. In this way, the first memory module can read M sets of first data (i.e., the feature map data) according to the first memory address and the first read/write control signal.
The control module #1 generates third data and transmits the third data to the M reconfigurable computing units through the first selector.
The third data is weight data generated by the control module #1, and the control module #1 may send the generated third data (i.e., the weight data) to the M multiply-accumulate calculating units through the first selector. In this way, the plurality of reconfigurable computing units can simultaneously calculate the third data, so that the efficiency of target detection can be further improved.
For a specific process of the control module #1 transmitting the third data to the M reconfigurable computing units through the first selector, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description thereof will be omitted herein.
The first storage module reads M groups of first data according to the first storage address and the first read-write control signal, and sends the M groups of first data to M reconfigurable computing units. For example, the first memory module has N memory banks, and M memory banks of the N memory banks can read M sets of first data.
The first data is feature map data, and M banks in the first memory module may read M sets of first data (i.e., feature map data) and send the read M sets of first data (i.e., feature map data) to M multiply-accumulate computing units. In this way, the plurality of reconfigurable computing units can simultaneously compute the first data, so that the efficiency of target detection can be further improved.
The calculation module may perform the target region of interest calibration calculation by a multiply-accumulate calculation unit, an activation quantization calculation unit, and a pooling calculation unit of the M reconfigurable calculation units.
Illustratively, the calculation module may perform the target region of interest calibration calculation by a multiply-accumulate calculation unit, an activation quantization calculation unit, and a pooling calculation unit of the M reconfigurable calculation units, and it may be understood that the calculation module 220 may perform the target region of interest calibration calculation by M reconfigurable calculation units, where the multiply-accumulate calculation unit, the activation quantization calculation unit, and the pooling calculation unit that participate in the calculation are the same.
Optionally, the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit for performing the target region calibration computation are determined according to the indication information generated by the control module #1, the indication information including configuration information of the selectors, the configuration information of the selectors being used to indicate that the output end of the first selector is connected to the first input end of the first selector, the output end of the second selector is connected to the first input end of the second selector, and when the output end of the third selector is connected to the first input end of the third selector, the at least one computing unit for performing the computation process includes the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit based on the connection relation between the input ends of the plurality of selectors and the module or the computing unit in this case (i.e. the first input end of the first selector is connected to the control module, the first input end of the second selector is connected to the output end of the activation quantization computing unit, and the first input end of the third selector is connected to the output end of the pooling computing unit). For example, as shown in fig. 6, the specific implementation steps of the current computing process may include: firstly, the multiply-accumulate calculating unit receives first data (i.e. feature map data) from the first storage module and receives third data (i.e. weight data) from the control module through the first selector; secondly, activating the quantization calculation unit to receive the data calculated by the multiply-accumulate calculation unit; thirdly, the pooling calculation unit receives the data calculated by the activation quantization calculation unit through a second selector; and finally, the pooling calculation unit sends the calculated data to a third storage module through a third selector, and the third storage module caches the data calculated by the pooling calculation unit.
For the description of the control module #2 and the third storage module, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description thereof will be omitted.
Based on the above technical solution, in the process of performing object detection, the first storage module may send multiple sets of first data (i.e., feature map data) to the multiple reconfigurable computing units, and the control module #1 may also send third data (i.e., weight data) to the multiple reconfigurable computing units through the first selector, where the multiple reconfigurable computing units may calculate the first data and the third data at the same time, so that the data processing apparatus may complete the process of calibrating and computing the object region of interest. The efficiency of target detection can be further improved.
Fig. 7 is a schematic flowchart of an image scaling calculation process according to an embodiment of the present application, where the data processing apparatus shown in fig. 7 is the data processing apparatus 300 in fig. 3, that is, the data processing apparatus 300 is used to perform the image scaling calculation, and a detailed description may refer to the data processing apparatus 300, and in order to avoid repetition, a part of repeated description is omitted hereinafter.
The control module #1 generates indication information according to the image scaling calculation to be performed, wherein the indication information is used for determining a multiply-accumulate calculation unit in the M reconfigurable calculation units in the calculation module.
The control module #1 generates a first memory address and a first read-write control signal. In this way, the first memory module can read M sets of first data (i.e., the feature map data) according to the first memory address and the first read/write control signal.
The control module #1 generates third data and transmits the third data to the M reconfigurable computing units through the first selector.
The third data is weight data generated by the control module #1, and the control module #1 may multicast the generated third data (i.e., the weight data) to the M multiply-accumulate computing units through the first selector. In this way, the plurality of reconfigurable computing units can simultaneously calculate the third data, so that the efficiency of target detection can be further improved.
For a specific process of the control module #1 transmitting the third data to the M reconfigurable computing units through the first selector, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description thereof will be omitted herein.
The first storage module reads M groups of first data according to the first storage address and the first read-write control signal, and sends the M groups of first data to M reconfigurable computing units. For example, the first memory module has N memory banks, and M memory banks of the N memory banks can read M sets of first data.
The first data is feature map data, and M banks in the first memory module may read M sets of first data (i.e., feature map data) and send the read M sets of first data (i.e., feature map data) to M multiply-accumulate computing units. In this way, the plurality of reconfigurable computing units can simultaneously compute the first data, so that the efficiency of target detection can be further improved.
The calculation module may perform the image scaling calculation by a multiply-accumulate calculation unit of the M reconfigurable calculation units.
Illustratively, the computing module may perform the image scaling computation by a multiply-accumulate computing unit of the M reconfigurable computing units, it being understood that the computing module 220 may perform the image scaling computation by M reconfigurable computing units, of which the multiply-accumulate computing units participating in the computation are identical.
Optionally, the multiply-accumulate computation unit for performing the image scaling computation is determined according to the indication information generated by the control module #1, the indication information including configuration information of the selector, the configuration information of the selector being used to indicate that the output terminal of the first selector is connected to the first input terminal of the first selector, respectively, and that the at least one computation unit for performing the computation process includes the multiply-accumulate computation unit when the output terminal of the third selector is connected to the second input terminal of the third selector, based on the connection relationship between the input terminals of the plurality of selectors and the module or the computation unit in this case (i.e. the first input terminal of the first selector is connected to the control module, and the second input terminal of the third selector is connected to the output terminal of the multiply-accumulate computation unit). For example, as shown in fig. 7, the specific implementation steps of the current computing process may include: firstly, the multiply-accumulate calculating unit receives first data (i.e. feature map data) from the first storage module and receives third data (i.e. weight data) from the control module through the first selector; and secondly, the multiply-accumulate calculating unit sends the calculated data to a third storage module through a third selector, and the third storage module caches the data calculated by the multiply-accumulate calculating unit.
For the description of the control module #2 and the third storage module, reference may be made to the related description in fig. 3, and in order to avoid repetition, a description thereof will be omitted.
Based on the above technical solution, in the process of performing object detection, the first storage module may send multiple sets of first data (i.e., feature map data) to the multiple reconfigurable computing units, and the control module #1 may also send third data (i.e., weight data) to the multiple reconfigurable computing units through the first selector, where the multiple reconfigurable computing units may calculate the first data and the third data at the same time, so that the data processing apparatus may complete the image scaling calculation process. The efficiency of target detection can be further improved.
It will be appreciated that the examples of fig. 2-7 in the embodiments of the present application are merely for convenience of those skilled in the art to understand the embodiments of the present application, and are not intended to limit the embodiments of the present application to the specific scenarios illustrated. It will be apparent to those skilled in the art from the examples of fig. 2-7 that various equivalent modifications or variations may be made, and such modifications or variations are intended to be within the scope of the embodiments of the present application. For example, the embodiment of the present application mainly exemplifies that four calculation processes, namely, convolution calculation, target region pooling calculation, target region of interest calibration calculation, and image scaling calculation, can be completed by using the data processing apparatus, however, the present application is not limited thereto, and any calculation process that can be completed by using the data processing apparatus is included in the scope of the embodiment of the present application.
The data processing method according to the embodiment of the present application will be described below with reference to fig. 8, the data processing method shown in fig. 8 may be performed by the data processing apparatus shown in any one of fig. 3 to 7, and the detailed description may refer to the foregoing description related to the data processing apparatus, and overlapping descriptions will be omitted when describing the data processing method according to the embodiment of the present application.
The method 800 shown in fig. 8 includes step 810 and step 820. Steps 810 and 820 are described below.
810, the control module generates indication information according to the calculation process to be performed.
820, the computing module includes M reconfigurable computing units, the computing process being performed by at least one of the M reconfigurable computing units: a multiply-accumulate computation unit, an activation quantization computation unit, or a pooling computation unit, the at least one computation unit being determined from the indication information.
Wherein M is a positive integer.
Optionally, as an implementation manner, the multiply-accumulate computing unit, the activation quantization computing unit and the pooling computing unit are sequentially connected, the computing module further comprises a plurality of selectors, the plurality of selectors comprises a first selector, a second selector and a third selector, wherein a first input end of the first selector is connected with the control module, a second input end of the first selector is connected with the second storage module, a first input end of the multiply-accumulate computing unit is connected with an output end of the first selector, and a second input end of the multiply-accumulate computing unit is connected with the first storage module; the first input end of the second selector is connected with the output end of the activation quantization calculation unit, the second input end of the second selector is connected with the first storage module, and the output end of the second selector is connected with the input end of the pooling calculation unit; the first input end of the third selector is connected with the output end of the pooling calculation unit, and the second input end of the third selector is connected with the output end of the multiply-accumulate calculation unit.
Optionally, as an implementation manner, the indication information includes configuration information of a selector, where the configuration information of the selector is used to indicate connection relationships between output ends and input ends of a plurality of selectors, respectively, and when a calculation process to be performed is convolution calculation, an output end of a first selector is connected to a second input end of the first selector, an output end of the second selector is connected to a first input end of the second selector, and an output end of a third selector is connected to a first input end of the third selector; when the calculation process to be executed is the target region pooling calculation, the output end of the second selector is connected with the second input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; when the calculation process to be executed is the target region of interest calibration calculation, the output end of the first selector is connected with the first input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; or when the calculation process to be performed is image scaling calculation, the output end of the first selector is connected with the first input end of the first selector, and the output end of the third selector is connected with the second input end of the third selector.
Optionally, as an implementation manner, the control module generates a first storage address and a first read-write control signal; the first storage module reads M groups of first data according to the first storage address and the first read-write control signal; the first storage module sends M groups of first data to M reconfigurable computing units.
Alternatively, as an embodiment, when the calculation process to be performed is a target region of interest pooling calculation, the first data is feature map data.
Optionally, as an implementation manner, the control module generates a second storage address and a second read-write control signal; the second storage module reads the second data according to the second storage address and the second read-write control signal, and the second storage module sends the second data to the M reconfigurable computing units through the first selector.
Alternatively, as an embodiment, when the calculation process to be performed is a convolution calculation, the first data is convolution kernel data and the second data is feature map data.
Optionally, as an embodiment, the control module generates the third data; the control module sends the third data to the M reconfigurable computing units through the first selector.
Optionally, as an embodiment, when the calculation process to be performed is a target region of interest calibration calculation or an image scaling calculation, the first data is feature map data, and the third data is weight data.
Based on the above technical solution, in the process of performing target detection, the data processing method provided by the embodiment of the present application may complete multiple computing processes (such as convolution computation, for example, target region pooling computation, for example, target region calibration computation, for example, image scaling computation), and the first storage module may send multiple sets of first data to multiple reconfigurable computing units, the second storage module may send second data to multiple reconfigurable computing units through the first selector, the control module may also send third data to multiple reconfigurable computing units through the first selector, and the multiple reconfigurable computing units may simultaneously compute the first data and/or the second data, or the multiple reconfigurable computing units may simultaneously compute the first data and/or the third data, so that the data processing apparatus may not only improve the efficiency of target detection, but also may have a better resource utilization ratio.
Fig. 9 is a schematic hardware structure of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 900 shown in fig. 9 (the data processing apparatus 900 may be a computer device in particular) includes a memory 910, a processor 920, a communication interface 930, and a bus 940. Wherein the memory 910, the processor 920, and the communication interface 930 implement communication connection therebetween through the bus 940.
The memory 910 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 910 may store a program, and the processor 920 is configured to perform various steps of the data processing method of the embodiment of the present application when the program stored in the memory 910 is executed by the processor 920. In particular, the processor 920 may perform the method 800 above.
The processor 920 may include a control module #1, a calculation module, a first storage module, and a second storage module of any one of fig. 3 to 7.
The processor 920 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to perform the data processing methods of the method embodiments of the present application.
The processor 920 may also be an integrated circuit chip with signal processing capabilities.
The processor 920 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 910, and the processor 920 reads information in the memory 910, and in combination with hardware thereof, performs functions required to be performed by modules included in the apparatus shown in any of fig. 3 to 7, or performs a data processing method according to an embodiment of the method of the present application.
Communication interface 930 enables communication between apparatus 900 and other devices or communication networks using a transceiver device such as, but not limited to, a transceiver. For example, the data processing model may be obtained through the communication interface 930.
A bus 940 may include a path to transfer information between various components of the device 900 (e.g., the memory 910, the processor 920, the communication interface 930).
It should be noted that although the apparatus 900 described above shows only a memory, a processor, a communication interface, in a specific implementation, those skilled in the art will appreciate that the apparatus 900 may also include other devices necessary to achieve proper operation. Also, as will be appreciated by those of skill in the art, the apparatus 900 may also include hardware devices that implement other additional functions, as desired. Furthermore, it will be appreciated by those skilled in the art that the apparatus 900 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in FIG. 9.
The embodiment of the application also provides a computer readable storage medium storing program code for device execution, the program code including instructions for performing the data processing method in the embodiment of the application.
Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the data processing method of the embodiments of the present application.
The embodiment of the application also provides a chip, which comprises a processor and a data interface, wherein the processor reads the instructions stored in the memory through the data interface, and executes the data processing method in the embodiment of the application.
Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the processor is configured to execute the data processing method in the embodiment of the present application when the instructions are executed.
The embodiment of the application also provides a system on chip (SoC), which comprises the data processing device in the embodiment of the application.
The embodiment of the application also provides electronic equipment which comprises the data processing device.
It is to be appreciated that the processor in embodiments of the application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the names, such as module names, and also such as names of selectors, referred to in the embodiments of the present application should not be construed as limiting the scope of the embodiments of the present application.
It will also be appreciated that the term "and/or" is merely one association relationship describing the associated object, and means that there may be three relationships, e.g., a and/or B, and may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.
It will also be understood that "at least one" means one or more and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be further understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (21)

1. A data processing apparatus comprising a control module and a calculation module, the calculation module comprising M reconfigurable computing units, wherein,
the control module is used for generating indication information according to a calculation process to be executed;
the computing module is configured to perform the computing process by at least one computing unit of the M reconfigurable computing units: the multiply-accumulate computing unit, the activation quantization computing unit or the pooling computing unit, wherein at least one computing unit is determined according to the indication information, and M is a positive integer.
2. The apparatus of claim 1, wherein the multiply-accumulate computation unit, the activation quantization computation unit, and the pooling computation unit are sequentially connected, the computation module further comprising a plurality of selectors, the plurality of selectors comprising a first selector, a second selector, and a third selector, wherein,
The first input end of the first selector is connected with the control module, the second input end of the first selector is connected with the second storage module, the first input end of the multiply-accumulate computing unit is connected with the output end of the first selector, and the second input end of the multiply-accumulate computing unit is connected with the first storage module;
the first input end of the second selector is connected with the output end of the activation quantization calculation unit, the second input end of the second selector is connected with the first storage module, and the output end of the second selector is connected with the input end of the pooling calculation unit;
the first input end of the third selector is connected with the output end of the pooling calculation unit, and the second input end of the third selector is connected with the output end of the multiply-accumulate calculation unit.
3. The apparatus of claim 2, wherein the indication information includes configuration information of a selector for indicating connection relationships between output terminals and input terminals of the plurality of selectors, respectively,
when the calculation process to be executed is convolution calculation, the output end of the first selector is connected with the second input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector;
When the calculation process to be executed is target region pooling calculation, the output end of the second selector is connected with the second input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector;
when the calculation process to be executed is target region of interest calibration calculation, the output end of the first selector is connected with the first input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; or alternatively, the process may be performed,
when the calculation process to be executed is image scaling calculation, the output end of the first selector is connected with the first input end of the first selector, and the output end of the third selector is connected with the second input end of the third selector.
4. The apparatus of claim 3, wherein the device comprises a plurality of sensors,
the control module is also used for generating a first storage address and a first read-write control signal;
the first storage module is used for reading M groups of first data according to the first storage address and the first read-write control signal;
The first storage module is further configured to send the M sets of first data to the M reconfigurable computing units.
5. The apparatus of claim 4, wherein the first data is feature map data when the computing process to be performed is a target region of interest pooling computation.
6. The apparatus of claim 4 or 5, wherein the device comprises a plurality of sensors,
the control module is further used for generating a second storage address and a second read-write control signal;
the second storage module is used for reading second data according to the second storage address and the second read-write control signal;
the second storage module is further configured to send the second data to the M reconfigurable computing units through the first selector.
7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
when the calculation process to be executed is convolution calculation, the first data is convolution kernel data, and the second data is feature map data.
8. The apparatus of claim 4 or 5, wherein the device comprises a plurality of sensors,
the control module is further used for generating third data;
the control module is further configured to send the third data to the M reconfigurable computing units through the first selector.
9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,
when the calculation process to be executed is the target region of interest calibration calculation or the image scaling calculation, the first data is the feature map data, and the third data is the weight data.
10. A method of data processing, comprising:
the control module generates indication information according to a calculation process to be executed;
the computing module includes M reconfigurable computing units, the computing module performing the computing process by at least one of the M reconfigurable computing units: the multiply-accumulate computing unit, the activation quantization computing unit or the pooling computing unit, wherein at least one computing unit is determined according to the indication information, and M is a positive integer.
11. The method of claim 10, wherein the multiply-accumulate computation unit, the activation quantization computation unit, and the pooling computation unit are sequentially connected, the computation module further comprising a plurality of selectors, the plurality of selectors comprising a first selector, a second selector, and a third selector, wherein,
the first input end of the first selector is connected with the control module, the second input end of the first selector is connected with the second storage module, the first input end of the multiply-accumulate computing unit is connected with the output end of the first selector, and the second input end of the multiply-accumulate computing unit is connected with the first storage module;
The first input end of the second selector is connected with the output end of the activation quantization calculation unit, the second input end of the second selector is connected with the first storage module, and the output end of the second selector is connected with the input end of the pooling calculation unit;
the first input end of the third selector is connected with the output end of the pooling calculation unit, and the second input end of the third selector is connected with the output end of the multiply-accumulate calculation unit.
12. The method of claim 11, wherein the indication information includes configuration information of a selector for indicating connection relationships between output terminals and input terminals of the plurality of selectors, respectively,
when the calculation process to be executed is convolution calculation, the output end of the first selector is connected with the second input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector;
when the calculation process to be executed is target region pooling calculation, the output end of the second selector is connected with the second input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector;
When the calculation process to be executed is target region of interest calibration calculation, the output end of the first selector is connected with the first input end of the first selector, the output end of the second selector is connected with the first input end of the second selector, and the output end of the third selector is connected with the first input end of the third selector; or alternatively, the process may be performed,
when the calculation process to be executed is image scaling calculation, the output end of the first selector is connected with the first input end of the first selector, and the output end of the third selector is connected with the second input end of the third selector.
13. The method of claim 12, wherein the step of determining the position of the probe is performed,
the control module generates a first storage address and a first read-write control signal;
the first storage module reads M groups of first data according to the first storage address and the first read-write control signal;
the first storage module sends the M sets of first data to the M reconfigurable computing units.
14. The method of claim 13, wherein the first data is feature map data when the computing process to be performed is a target region of interest pooling computation.
15. The method according to claim 13 or 14, wherein,
the control module generates a second storage address and a second read-write control signal;
the second storage module reads second data according to the second storage address and the second read-write control signal;
the second storage module sends the second data to the M reconfigurable computing units through the first selector.
16. The method of claim 15, wherein the step of determining the position of the probe is performed,
when the calculation process to be executed is convolution calculation, the first data is convolution kernel data, and the second data is feature map data.
17. The method according to claim 13 or 14, wherein,
the control module generates third data;
the control module sends the third data to the M reconfigurable computing units through the first selector.
18. The method of claim 17, wherein the step of determining the position of the probe is performed,
when the calculation process to be executed is the target region of interest calibration calculation or the image scaling calculation, the first data is the feature map data, and the third data is the weight data.
19. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the method of any one of claims 10 to 18.
20. A system on chip SoC, characterized by comprising a data processing apparatus as claimed in any of claims 1 to 9.
21. An electronic device comprising a data processing apparatus as claimed in any one of claims 1 to 9.
CN202210264771.6A 2022-03-17 2022-03-17 Data processing device and method Pending CN116820733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210264771.6A CN116820733A (en) 2022-03-17 2022-03-17 Data processing device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210264771.6A CN116820733A (en) 2022-03-17 2022-03-17 Data processing device and method

Publications (1)

Publication Number Publication Date
CN116820733A true CN116820733A (en) 2023-09-29

Family

ID=88111410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210264771.6A Pending CN116820733A (en) 2022-03-17 2022-03-17 Data processing device and method

Country Status (1)

Country Link
CN (1) CN116820733A (en)

Similar Documents

Publication Publication Date Title
US11403516B2 (en) Apparatus and method for processing convolution operation of neural network
CN110390385B (en) BNRP-based configurable parallel general convolutional neural network accelerator
US10872290B2 (en) Neural network processor with direct memory access and hardware acceleration circuits
JP6880160B2 (en) Arithmetic logic unit and calculation method
US11640538B2 (en) Neural processing apparatus and method with neural network pool processing
US11468332B2 (en) Deep neural network processor with interleaved backpropagation
CN111897579A (en) Image data processing method, image data processing device, computer equipment and storage medium
US11868874B2 (en) Two-dimensional array-based neuromorphic processor and implementing method
CN111414994A (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN113743599B (en) Computing device and server of convolutional neural network
US11763153B2 (en) Method and apparatus with neural network operation
CN110991630A (en) Convolutional neural network processor for edge calculation
CN110780921A (en) Data processing method and device, storage medium and electronic device
CN114626503A (en) Model training method, target detection method, device, electronic device and medium
US11054997B2 (en) Artificial neural networks in memory
CN109685208B (en) Method and device for thinning and combing acceleration of data of neural network processor
US20220156516A1 (en) Electronic device configured to process image data for training artificial intelligence system
WO2022137696A1 (en) Information processing device and information processing method
CN113743587B (en) Convolutional neural network pooling calculation method, system and storage medium
CN116820733A (en) Data processing device and method
US20240061649A1 (en) In-memory computing (imc) processor and operating method of imc processor
CN113344178A (en) Method and hardware structure capable of realizing convolution calculation in various neural networks
WO2020039493A1 (en) Computation optimization device, method, and program
MK Younis et al. Reconfigurable self-organizing neural network design and it's FPGA implementation
KR20220034542A (en) STORAGE DEVICE, and METHOD OF OPERATING STORAGE DEVICE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination