WO2022160706A1 - Data processing method and apparatus, computer device, and storage medium - Google Patents

Data processing method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022160706A1
WO2022160706A1 PCT/CN2021/115799 CN2021115799W WO2022160706A1 WO 2022160706 A1 WO2022160706 A1 WO 2022160706A1 CN 2021115799 W CN2021115799 W CN 2021115799W WO 2022160706 A1 WO2022160706 A1 WO 2022160706A1
Authority
WO
WIPO (PCT)
Prior art keywords
multiplier
adder
data processing
group
array
Prior art date
Application number
PCT/CN2021/115799
Other languages
French (fr)
Chinese (zh)
Inventor
周军
周亮
常亮
吴飞
Original Assignee
成都商汤科技有限公司
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都商汤科技有限公司, 电子科技大学 filed Critical 成都商汤科技有限公司
Publication of WO2022160706A1 publication Critical patent/WO2022160706A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a data processing method, apparatus, computer device, and storage medium.
  • the convolutional neural network mainly relies on the multiplier-adder array for convolution processing.
  • the multiplier-adder array stores the image data to be processed in the data processing task in the corresponding register array.
  • the current data processing method has the problems of low utilization rate of the multiplier-adder array and waste of computing resources.
  • Embodiments of the present disclosure provide at least a data processing method, apparatus, computer device, and storage medium.
  • an embodiment of the present disclosure provides a data processing method, including: grouping a plurality of multiplier-adders in a multiplier-adder array based on an operation step to obtain a plurality of multiplier-adder groups; using the plurality of multiplier-adder groups Each multiplier-adder group in the multiplier-adder group executes the data processing task corresponding to each multiplier-adder group in parallel.
  • the multiplier-adder array can process multiple data processing tasks at the same time, which improves the processing efficiency of the multiplier-adder array for the data processing tasks.
  • the multiplier-adder arrays are grouped based on the operation step size, so that the multiplier-adder that was originally invalid for the processing result of a certain data processing task is effective for the processing result of another data processing task, which improves the utilization rate of the multiplier-adder array. The waste of computing resources is reduced.
  • the interval between two adjacent multiplier-adders in the same group is the same and non-zero in the number of multiplier-adders in the same group
  • the multiplier-adder array In the same column of the interval between two adjacent multiplier-adders of the same group is the same and not zero.
  • the grouping situation of the multiplier-adder array can ensure that each multiplier-adder group handles different data processing tasks, so that the multiplier-adder array can process multiple data processing tasks at the same time, and the processing of data processing tasks by the multiplier-adder array is improved. efficiency.
  • the grouping a plurality of multiplier-adders in the multiplier-adder array based on the operation step size includes: determining the number of the multiplier-adder groups based on the operation step size; The number of multiplier-adder groups groups a plurality of multiplier-adders in the multiplier-adder array.
  • each multiplier-adder group of the multiplier-adder array is effective in processing the data processing task of the multiplier-adder group, so that the multiplier-adder array can process multiple data processing tasks at the same time, improving the performance of the multiplier-adder group.
  • the processing efficiency of the array for data processing tasks is ensured that each multiplier-adder group of the multiplier-adder array is effective in processing the data processing task of the multiplier-adder group, so that the multiplier-adder array can process multiple data processing tasks at the same time, improving the performance of the multiplier-adder group.
  • the grouping a plurality of multiplier-adders in the multiplier-adder array based on the number of multiplier-adder groups includes: determining the multiplier-adder array from the multiplier-adder array. the first target multiplier-adder in each multiplier-adder group; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the size of the multiplier-adder array, Other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder are determined from the multiplier-adder array.
  • the position of the first target multiplier-adder of each multiplier-adder group in the multiplier-adder array is determined, based on the position of the first target multiplier-adder of each multiplier-adder group in the multiplier-adder array, the The positions of other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group can be determined from the multiplier-adder array, which improves the grouping efficiency of the multiplier-adder array grouping.
  • the multiplier-adder is obtained from the multiplier-adder array Determining other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array includes: for each multiplier-adder group except for the first target multiplier-adder For each other multiplier-adder, based on the operation step size, determine the first positional relationship between the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder in the multiplier-adder array; And based on the operation step size and the number of columns of the multiplier-adder array, determine the second multiplier-adder in the multiplier-adder array of the multiplier-adder and the previous multiplier-adder in the same column and adjacent to the multiplier-adder array.
  • the determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array includes: based on the operation step size, the multiply-adder determining the target matrix; determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array according to the matrix element values of the target matrix.
  • using each multiplier-adder group in the multiple multiplier-adder groups to perform data processing tasks corresponding to each multiplier-adder group in parallel includes: according to the multiplier-adder group. the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, and store the image data to be processed corresponding to each multiplier-adder group In the register array corresponding to the multiplier-adder group; for each data processing cycle in the multiple data processing cycles, read the pending processing corresponding to each multiplier-adder group in the data processing cycle from the register array corresponding to each multiplier-adder group.
  • the multiplier-adder array ensures that each multiplier-adder group can process the corresponding data processing task by reading corresponding operands in different data processing cycles, and ensures the validity of the processing result of the multiplier-adder array for the data processing task.
  • the to-be-processed image data corresponding to the multiplier-adder group storing in the register array corresponding to the multiplier-adder group comprising: for each multiplier-adder group, determining the position of the register that the target multiplier-adder of the multiplier-adder group reads in the respective corresponding register arrays; For each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, and the position of the register read by the target multiplier-adder in the multiplier-adder group , and the processing sequence of the operands contained in the image data to be processed in the data processing process, the image data to be processed corresponding to the multiplier-adder group is stored in the register array corresponding to the multiplier-adder group, In each data processing cycle, the operand stored in the fixed read register of each
  • each data processing cycle in the plurality of data processing cycles from the register array corresponding to each multiplier-adder group, respectively, read the corresponding data processing cycle corresponding to each multiplier-adder group. and perform parallel processing on the read image data to be processed to obtain the data processing results of each multiplier-adder group within the data processing cycle, including: a first step for processing the image data to be processed.
  • each data processing cycle control each target multiplier-adder in each multiplier-adder group, and read the operand corresponding to each target multiplier-adder in the first data processing cycle from the fixed read register as the first operation and determine the matrix elements in the matrix operands corresponding to the first data processing cycle of each multiplier-adder group as the second operand; respectively determine the first data processing cycle of each target multiplier-adder The product of the operand and the second operand; for the non-first data processing cycle in which the image data to be processed is processed, the image data to be processed is controlled according to the preset data movement mode corresponding to the data processing cycle.
  • the operands are shifted in an orderly manner in the register array with the transformation of the data processing cycle, so as to ensure that the corresponding multiplier-adder in the multiplier-adder array can obtain valid data, Ensure the validity of the processing results of the data processing tasks.
  • completing the data processing tasks corresponding to each multiplier-adder group according to the data processing results corresponding to each multiplier-adder group in each data processing cycle includes: for each multiplier-adder group For each target multiplier-adder in the group, add the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum value; based on the sum value corresponding to each target multiplier-adder contained in each multiplier-adder group , to complete the data processing tasks corresponding to each multiplier-adder group.
  • the data processing tasks include: convolution processing tasks; images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
  • the multiplier-adder array can process multiple images to be processed at the same time, and the processing efficiency of the multiplier-adder array to be processed is improved.
  • an embodiment of the present disclosure further provides a data processing apparatus, including: a controller; the controller is configured to: group a plurality of multiplier-adders in a multiplier-adder array based on an operation step to obtain a plurality of multiplier-adders an adder group; using each multiplier-adder group in the plurality of multiplier-adder groups, execute the data processing task corresponding to each multiplier-adder group in parallel.
  • the interval between two adjacent multiplier-adders in the same group is the same and non-zero in the number of multiplier-adders in the same group
  • the multiplier-adder array In the same column of the interval between two adjacent multiplier-adders of the same group is the same and not zero.
  • the controller is specifically configured to determine the multiplier-adder based on the operation step size the number of groups; the multiplier-adders in the multiplier-adder array are grouped based on the number of multiplier-adder groups.
  • the controller is specifically configured to extract the data from the multiplier-adder group. determining the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the multiplier-adder The size of the adder array is determined from the multiplier-adder array, and other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder are determined from the multiplier-adder array.
  • the multiplier-adder is obtained from the multiplier-adder array.
  • the controller is specifically configured to target all multiplier-adder groups except the first target multiplier-adder.
  • multiplier-adder For each multiplier-adder other than the target multiplier-adders, based on the operation step size, determine that the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder are in the multiplier-adder array The first positional relationship of the second positional relationship in the multiplier array; based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, determine the multiplier-adder The target position of the multiplier in the multiplier-adder array.
  • the controller when determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array, is specifically configured to, based on the operation step Length, the multiplier-adder array, determine the target matrix; according to the matrix element value of the target matrix, determine the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array.
  • the controller when using each multiplier-adder group in the plurality of multiplier-adder groups to perform data processing tasks corresponding to each multiplier-adder group in parallel, the controller , which is specifically used to store the to-be-processed image data corresponding to each multiplier-adder group to the In the register array corresponding to each multiplier-adder group; for each data processing cycle in the plurality of data processing cycles, respectively, from the register array corresponding to each multiplier-adder group, read the multiplication and addition of the data processing cycle.
  • the image data to be processed corresponding to the group of multipliers is processed; the read image data to be processed is processed in parallel to obtain the data processing results of each multiplier-adder group in the data processing cycle; The corresponding data processing results are completed, respectively, and the data processing tasks corresponding to each multiplier-adder group are completed.
  • the controller is specifically configured to, for each multiplier-adder group, determine that the target multiplier-adder of the multiplier-adder group is in the corresponding register array
  • the position of the read register is fixed; for each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the target multiplier-adder in the multiplier-adder group
  • the position of the register read by the processor and the processing sequence of the operands contained in the image data to be processed in the data processing process are stored, and the image data to be processed corresponding to the multiplier-adder group is stored to the multiplier-adder group.
  • the register array corresponding to the processor group in each data processing cycle
  • the controller is specifically used for processing the to-be-processed image data.
  • the operand is used as the first operand; and the matrix elements in the matrix operands corresponding to the first data processing cycle of each multiplier-adder group are determined as the second operand; respectively, determine that each target multiplier-adder is in the first data processing cycle.
  • the data movement mode moves a preset step size in the register array; and controls each target multiplier-adder in each multiplier-adder group, and reads each target multiplier from a register fixedly read with each target multiplier-adder.
  • the operand of the adder in the non-first data processing cycle is taken as the first operand; and the matrix elements in the matrix operands corresponding to the data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of the target multiplier-adder in this data processing cycle.
  • the controller when completing the data processing tasks corresponding to each multiplier-adder group according to the data processing results corresponding to each multiplier-adder group in each data processing cycle, is specifically configured to: For each target multiplier-adder in each multiplier-adder group, add the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum value; based on each target multiplier-adder included in each multiplier-adder group The sum value corresponding to the adder is used to complete the data processing task corresponding to each multiplier-adder group.
  • the data processing tasks include: convolution processing tasks; images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
  • an optional implementation manner of the present disclosure further provides a computer device, a controller, and a memory, where the memory stores machine-readable instructions executable by the controller, and the controller is configured to execute the instructions stored in the memory.
  • machine-readable instructions when the machine-readable instructions are executed by the controller, the machine-readable instructions are executed by the controller to execute the first aspect above, or any possible implementation of the first aspect steps in the method.
  • an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect, or any of the first aspect, when the computer program is run. steps in one possible implementation.
  • FIG. 1 shows a flowchart of a data processing method provided by an embodiment of the present disclosure
  • FIG. 2 shows an example diagram of a multiplier-adder array provided by an embodiment of the present disclosure
  • FIG. 3A, FIG. 3B, and FIG. 3C show an example diagram of movement based on an operation step provided by an embodiment of the present disclosure
  • FIG. 4 shows an example diagram of dividing a multiplier-adder array into four multiplier-adder groups provided by the present disclosure
  • FIG. 5 shows an example diagram of a matrix for determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array provided by an embodiment of the present disclosure
  • FIG. 6A and FIG. 6B show exemplary diagrams of a multiplier-adder array and a corresponding register array provided by an embodiment of the present disclosure
  • FIG. 7 shows an example diagram of the register array a after the image data to be processed is shifted to the left by one step in the register array as a whole in an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • convolutional neural networks mainly rely on multiplier-adder arrays for convolution processing.
  • the image data to be processed will be stored in the register array connected to the multiplier-adder array; the image data to be processed stored in the register array will be moved in the register array in different data processing cycles; multiply-add In each data processing cycle of the multiplier-adder array, the operands of the data processing cycle are read from the registers (belonging to the register array) connected to the multiplier-adder, and multiplication and/or addition operations are performed.
  • the multiplier-adder array outputs the result of convolution processing on the image data to be processed.
  • the processing results of some of the multiplier-adders in the multiplier-adder array are not needed in the results of processing the image data to be processed, so there is a data processing method in this case.
  • the present disclosure provides a data processing method, apparatus, computer equipment and storage medium.
  • the multiplier-adder array By grouping the multiplier-adder array based on the operation step size, multiple multiplier-adder groups are obtained, so that different multiplier-adder groups in the multiplier-adder array are obtained.
  • the adder groups process data processing tasks corresponding to different image data to be processed in parallel, that is, the same multiplier-adder array can process multiple image data to be processed at the same time, and each multiplier-adder group processes one image data to be processed.
  • the multiplier-adder that is not used in the process of processing one image data to be processed is used to process other image data to be processed, which improves the utilization rate of the multiplier-adder array, reduces the waste of computing resources, and improves the The processing efficiency of the multiplier-adder array for the image data to be processed.
  • the device includes, for example, a terminal device or a server or other processing device, and the terminal device can be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA) , handheld devices, computing devices, in-vehicle devices, wearable devices, etc.
  • the data processing method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 1 is a flowchart of a data processing method provided by an embodiment of the present disclosure, the method includes steps S101-S102, wherein:
  • the multiplier-adder arrays are grouped based on the operation step size to obtain multiple multiplier-adder groups, and the multiple multiplier-adder groups are allowed to execute their corresponding data processing tasks in parallel; the data processing tasks of each multiplier-adder group
  • the difference enables the multiplier-adder array to process multiple data processing tasks at the same time, thereby improving the processing efficiency of the multiplier-adder array.
  • the multiplier-adder that is not used in the data processing process of one image to be processed is used to process the data of other images to be processed, which improves the utilization rate of the multiplier-adder array and reduces the waste of computing resources. .
  • the multiplier-adder array is a matrix array composed of a plurality of multiplier-adders.
  • FIG. 2 shows an example diagram of a multiplier-adder array provided by the present disclosure.
  • the multiplier-adder array includes There are 16 multiplier-adders in 4 rows and 4 columns.
  • the matrix operand includes, for example, the convolution kernel when the image data to be processed is processed; the operation step size is, for example, the moving step size of the convolution kernel.
  • S x represents the number of pixels moved in the horizontal direction
  • S y represents the number of pixels moved in the vertical direction.
  • the number of multiplier-adder groups can be determined based on the operation step size, and a plurality of multiplier-adders in the multiplier-adder array can be performed based on the number of multiplier-adder groups. grouping.
  • the relationship between the operation step size and the number GN of multiplier-adder groups is:
  • an embodiment of the present disclosure provides a specific method for grouping multiple multiplier-adders in a multiplier-adder array based on the number of multiplier-adder groups to obtain multiple multiplier-adder groups, including: Determine the first target multiplier-adder in each multiplier-adder group in the adder array; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size and the size of the multiplier-adder array, the The target multiplier-adders other than the first target multiplier-adder in each multiplier-adder group are determined in the multiplier array.
  • the size of the multiplier-adder array since the size of the multiplier-adder array is fixed, the size of the image data to be processed will be different according to the actual situation.
  • the image data to be processed is processed in parallel, and the utilization rate of the multiplier-adder array may not reach 100% in many cases.
  • the size information of the multiplier-adder array actually used is firstly determined based on the operation step size and the size of the multiplier-adder array; wherein the size of the multiplier-adder array includes the number of rows of the multiplier-adder array, and Number of columns, the size information of the multiplier-adder array actually used includes the number of rows and columns of the multiplier-adder array actually used; the size information and operation step size of the multiplier-adder array actually used, and the size of the multiplier-adder array actually used.
  • the relationship between the dimensions is:
  • A' x A x -A x %S x ;
  • A' y A y -A y %S y ;
  • a x is the number of columns of the multiplier-adder array
  • a y is the number of rows of the multiplier-adder array
  • A' x is the number of columns of the multiplier-adder array actually used
  • A' y is the actually used multiplier-adder array
  • the number of lines, % is the operation to find the remainder.
  • the first target multiplier-adder of each multiplier-adder group is then determined in the multiplier-adder array actually used.
  • the first target multiplier-adder of each multiplier-adder group can be determined based on the following methods: determining the target matrix based on the operation step size and the multiplier-adder array; determining each The position of the first target multiplier-adder in the multiplier-adder group in the multiplier-adder array; where the matrix element value represents the first target multiplier-adder in each multiplier-adder group.
  • the size information of the multiplier-adder array actually used is 4 rows and 4 columns, and when the operation step is 2, the number of multiplier-adder groups is 4, which means that the target matrix contains two rows and two columns, a total of 4 multiplication and additions.
  • the first multiplier-adder in the multiplier-adder array is used as the first multiplier-adder of the target matrix, that is, the first target multiplier-adder of the first multiplier-adder group, based on the first multiplier-adder of the target matrix
  • Each multiplier-adder determines other multiplier-adders in the target matrix, that is, the first target multiplier-adder of other multiplier-adder groups.
  • the positional arrangement number of the actually used multiplier-adder array is:
  • the first target multiplier-adder of the first multiplier-adder group that is, the first multiplier-adder of the target matrix, is at position 0, then the target matrix with two rows and two columns can be determined based on the multiplier-adder at position 0.
  • the corresponding position of the matrix in the actual multiplier-adder array is numbered as
  • the positions of the first target multiplier-adders of the other three multiplier-adder groups in the actually used multiplier-adder array are respectively 1, 4, and 5, as shown in the target matrix in FIG. 4 .
  • the target position of the first target multiplier-adder of each multiplier-adder group in the actually used multiplier-adder array can also be determined with reference to the formula corresponding to each matrix element in the matrix shown in FIG. That is to say, each matrix element in the matrix shown in FIG. 5 respectively represents the position of the first target multiplier-adder of a multiplier-adder group in the actually used multiplier-adder array.
  • A' x is the number of columns of the multiplier-adder array actually used
  • A' y is the number of rows of the multiplier-adder array actually used
  • A' x A x -A x %S x ;
  • A' y A y -A y %S y ,
  • a x is the column number of the multiplier-adder array, A y is the row number of the multiplier-adder array; S x is the horizontal movement step of the operation step, and S y is the vertical movement step of the operation step.
  • the first divider-adder in each multiplier-adder group can be determined based on the methods described in the following steps 1 to 3 Target multiplier-adders other than target multiplier-adders:
  • Step 1 For each multiplier-adder in each multiplier-adder group except the first target multiplier-adder, based on the operation step size, determine the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder. The first positional relationship of the multiplier-adder in the multiplier-adder array; wherein, each multiplier-adder PO(i) except the first target multiplier-adder in the multiplier-adder group is in the same row of the multiplier-adder and The first positional relationship of the adjacent previous multiplier-adder PO(i-1) in the multiplier-adder array is, for example:
  • the four different colors represent four multiplier-adder groups: the first multiplier-adder group in black, the second in white Multiplier-adder group, third multiplier-adder group in light gray, and fourth multiplier-adder group in dark gray.
  • the first multiplier-adder group as an example, the first target multiplier-adder of the first multiplier-adder group is at position 0, then the position PO(A) of another multiplier-adder A in the same group in this row is :
  • the multiplier-accumulator in this row and the multiplier-accumulator at position 0 belong to the same group only has the multiplier-accumulator at position 2. adder.
  • Step 2 For each multiplier-adder except the first target multiplier-adder in each multiplier-adder group, determine the multiplier-adder and the multiplier-adder based on the operation step size and the number of columns of the multiplier-adder array The second positional relationship of the previous multiplier-adder in the same column and adjacent in the multiplier-adder array; wherein, each multiplier-adder PO(j) except the first target multiplier-adder in the multiplier-adder group The second positional relationship of the previous multiplier-adder PO(j-1) in the same column and adjacent to the multiplier-adder array in the multiplier-adder array is, for example:
  • the multiplier-accumulator that belongs to the same group as the multiplier-accumulator at position 0 is only at position 8. the multiplier-adder.
  • Step 3 Based on the position of the first target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, determine that the multiplier-adder group divides the first target multiplier-adder The target position of other target multiplier-adders in the multiplier-adder array.
  • the first position relationship and the second position relationship calculate the target positions of the other target multiplier-adders in the multiplier-adder group except the first target multiplier-adder in the actually used multiplier-adder array. Specifically, after calculating the target positions of the first target multiplier-adder in each row and the first target multiplier-adder in each column in the actually used multiplier-adder array, the target position can be calculated based on the first positional relationship or the second positional relationship. The target position of the other target multipliers in the multiplier-adder group.
  • the first target multiplier-adder of the first multiplier-adder group is at position 0, and the position PO( A) is 2, then the position PO(E) of the next multiplier-adder E in the same column as multiplier-adder A is:
  • the position PO(C) of the next multiplier-adder C in the same column and group as the multiplier-adder at position 0 is 8, then the position PO(E of the next multiplier-adder E in the same group as the multiplier-adder C is 8. )for:
  • the present disclosure provides an example diagram of dividing the multiplier-adder array into four multiplier-adder groups, four different colors represent the four multiplier-adder groups, and the black first Multiplier-adder group, the second multiplier-adder group in white, the third multiplier-adder group in light gray, and the fourth multiplier-adder group in dark gray; in the same row of the multiplier-adder array, adjacent two In the same group of multiplier-accumulator intervals, the number of non-same-group multiplier-adders is the same and not zero, and in the same column of the multiplier-adder array, two adjacent multiplier-adders in the same group have the same number of non-same-group multiplier-adders. zero.
  • the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different, for example, each multiplier-adder group convolves different data matrices respectively.
  • each multiplier-adder group in the multiple multiplier-adder groups When using each multiplier-adder group in the multiple multiplier-adder groups to execute the data processing task corresponding to each multiplier-adder group in parallel, according to each target multiplier-adder group in each multiplier-adder group position in the multiplier-adder array, and store the image data to be processed corresponding to each multiplier-adder group in the register array corresponding to each multiplier-adder group.
  • the image data to be processed includes, for example, at least one of the following: the original image to be processed; a sub-image corresponding to any color channel in the original image to be processed; a feature map obtained after feature extraction is performed on the original image; The feature sub-map corresponding to at least one channel in the feature map obtained after feature extraction of the image; the image data obtained after data filling processing is performed on the sub-map corresponding to at least one color channel in the original image; the feature map corresponding to at least one channel The image data obtained after the feature submap performs data filling processing.
  • each register in at least some of the registers stores the feature value of a feature point in the image data to be processed, also called The operands required by the multiplier-adder.
  • the multiplier-adder array includes four multiplier-adders group, respectively corresponding to the four register arrays shown in FIG. 6B, the black multiplier-adder group corresponds to the black register array a, the white multiplier-adder group corresponds to the white register array b, and the light gray multiplier-adder group corresponds to The light gray register array c, the dark gray multiplier-adder group corresponds to the dark gray register array d.
  • the target multiplier-adder PE0 reads the eigenvalue stored in register A0
  • the target multiplier-adder PE1 reads the eigenvalue stored in register B0
  • the target multiplier-adder PE2 reads the eigenvalue stored in register A2
  • the target multiplier-adder reads the eigenvalue stored in register A2.
  • the adder PE3 reads the eigenvalue stored in the register B2
  • the target multiplier-adder PE4 reads the eigenvalue stored in the register C0
  • the target multiplier-adder PE5 reads the eigenvalue stored in the register D0
  • the target multiplier-adder PE6 reads the eigenvalue stored in register C2
  • target multiplier-adder PE7 reads the eigenvalue stored in register D2
  • target multiplier-adder PE8 reads the eigenvalue stored in register A8,
  • PE9 reads the eigenvalue stored in register
  • the eigenvalue in B8, the target multiplier-adder PE10 reads the eigenvalue stored in register A10
  • the target multiplier-adder PE11 reads the eigenvalue stored in register B10
  • the target multiplier-adder PE12 reads the eigenvalue stored in register C8
  • the target multiplier-adder PE13 reads the characteristic value stored in the register D8, the target multiplier-adder PE
  • the processing sequence of the operands contained in the image data to be processed is to store the image data to be processed corresponding to the multiplier-adder group in the register array corresponding to the multiplier-adder group, so that each data processing cycle, each The operand stored in the position of the register fixedly read by the target multiplier-accumulator corresponds to the matrix element in the corresponding processing cycle matrix operand.
  • the matrix operand includes, for example, the convolution kernel in the convolution calculation, that is, a data matrix, exemplarily,
  • a matrix operand with two rows and two columns provided by the present disclosure includes matrix elements: W 0 , W 1 , W 2 , and W 3 .
  • the number of operands contained in the image data to be processed corresponding to each multiplier-adder group should be consistent.
  • the image data to be processed corresponding to the first multiplier-adder group shown in FIG. 6A is:
  • the to-be-processed image data corresponding to each multiplier-adder group is stored in the register array corresponding to each multiplier-adder group, for each data processing cycle in the multiple data processing cycles, data from each multiplier-adder In the register array corresponding to the group, read the image data to be processed corresponding to each multiplier-adder group in the data processing cycle; and process the read image data to be processed to obtain the data of each multiplier-adder group in the data processing cycle. data processing results.
  • each target multiplier-adder in each multiplier-adder group is controlled, and each target multiplier-adder is read from the fixed read register in the first data processing.
  • the operand corresponding to the cycle is taken as the first operand; and the matrix elements in the matrix operand corresponding to the first data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of the processing cycle.
  • the target multiplier-adder PE0 reads the operand a0 from the register A0 that is fixedly read in the corresponding register array
  • the target multiplier-adder PE1 reads the operand b0 in the register B0
  • the other target multiplier-adders read the operand b0.
  • the operands are analogous and will not be repeated here; assuming that the matrix operands are:
  • the matrix element corresponding to the data processing cycle is W 0 , and take W 0 as the second operand, and then calculate W 0 * a0; and store the result in a register.
  • control the image data to be processed For the non-first data processing cycle in which the image data to be processed is processed, control the image data to be processed to move a preset step size in the register array according to the preset data movement mode corresponding to the data processing cycle; and control each multiplier-adder group
  • For each target multiplier-adder in read the operand of each target multiplier-adder in the non-first data processing cycle from the fixed read register as the first operand; and determine that each multiplier-adder group is in the data processing cycle.
  • the matrix element in the matrix operand corresponding to the cycle is used as the second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing cycle is determined respectively.
  • the preset data movement mode is to move to the left, and the preset step size is 1, as shown in FIG.
  • the matrix element corresponding to the data processing cycle is W 1 , and take W 1 as the second operand, and then calculate W 1 * a1; and store the result in a register.
  • the image data to be processed can be moved up by one step on the basis of the position shown in Figure 7.
  • the operand a5 is stored in the register A0, and the corresponding data processing cycle
  • the matrix element is W 2 , and PE0 can perform the calculation of W 2 *a5;
  • the image data to be processed can be moved to the right as a whole based on the movement of the third data processing cycle.
  • the operand a4 is stored in the register A0 at this time, the matrix element corresponding to this data processing cycle is W 3 , PE0 can perform the calculation of W 3 *a4, and the same is true for other PEs, which will not be repeated here.
  • the corresponding convolution kernels may be different or the same. For example, if the two image data to be processed are different feature submaps of the same feature map, the convolution kernels corresponding to the two image data to be processed are different. If the two to-be-processed image data are image data of different positions of the same feature submap, the convolution kernels corresponding to the two to-be-processed image data are the same.
  • the data processing corresponding to each multiplier-adder group can be completed according to the data processing results corresponding to each multiplier-adder group in each data processing cycle. Task.
  • the products obtained by the target multiplier-adder in each data processing cycle are added to obtain a sum value;
  • the sum value corresponding to the target multiplier-adder completes the data processing task corresponding to each multiplier-adder group.
  • the products calculated by PE0 in four data processing cycles are: W 0 *a0, W 1 *a1, W 2 *a5, W 3 *a4 ; add the four results:
  • the obtained sum is the result value of PE0, which belongs to the processing result matrix of the data processing task corresponding to the first multiplier-adder group.
  • the resulting numerical arrangement is:
  • the feature map includes 16 channels, and the feature sub-maps corresponding to 4 channels are processed each time, that is, the feature sub-maps corresponding to the 16 channels need to be divided
  • the feature sub-maps corresponding to the 16 channels need to be divided
  • one group of feature sub-maps are processed each time. If the four groups of feature sub-maps are: group a, group b, group c, and group d, after processing the four feature sub-maps included in group a, Accumulate the results; after processing the 4 feature sub-maps included in group b, then accumulate the 4 results corresponding to group b, and calculate the accumulated result corresponding to group a and the accumulated result corresponding to group b.
  • the obtained 4 output results corresponding to the group a are: a1, a2, a3 and a4 respectively.
  • the obtained 4 output results corresponding to group b are: b1, b2, b3 and b4 respectively.
  • a1+b1 O1
  • a2+b2 O2
  • a3+b3 O3
  • the obtained 4 output results corresponding to group c are: c1, c2, c3 and c4, and then execute: O1+c1, O2+c2, O3 +c3, O4+c4; and so on, and finally get a1+b1+c1+d1, a2+b2+c2+d2, a3+b3+c3+d3, a4+b4+c4+d4, and then four
  • the results are accumulated together to obtain the accumulated sum of the convolution results corresponding to the 16 channels respectively.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • the embodiment of the present disclosure also provides a data processing apparatus corresponding to the data processing method. Reference may be made to the implementation of the method, and repeated descriptions will not be repeated.
  • the apparatus includes a controller 801 .
  • the controller 801 is configured to: group a plurality of multiplier-adders in the multiplier-adder array based on an operation step to obtain a plurality of multiplier-adder groups; use each of the plurality of multiplier-adder groups A multiplier-adder group for executing data processing tasks corresponding to each multiplier-adder group in parallel.
  • the interval between two adjacent multiplier-adders in the same group is the same and non-zero in the number of multiplier-adders in the same group
  • the multiplier-adder array In the same column of the interval between two adjacent multiplier-adders of the same group is the same and not zero.
  • the controller 801 when grouping a plurality of multiplier-adders in the multiplier-adder array based on an operation step, the controller 801 is specifically configured to determine the The number of multiplier-adder groups; the multiplier-adders in the multiplier-adder array are grouped based on the number of multiplier-adder groups.
  • the controller 801 is specifically configured to extract the multiplier-adder from the multiplier-adder group. determining the first target multiplier-adder in each multiplier-adder group in the adder array; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the The size of the multiplier-adder array, from which other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group are determined from the multiplier-adder array.
  • the multiplier-adder is obtained from the multiplier-adder array.
  • the controller 801 is specifically configured to divide the target multiplier-adder in each multiplier-adder group For each multiplier-adder other than the first target multiplier-adder, based on the operation step size, determine that the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder are in the multiplier-adder array The first positional relationship in the second positional relationship in the adder array; based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, determine the multiplier-adder The target position of the add
  • the controller 801 when determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array, is specifically configured to, based on the operation The step size, the multiplier-adder array, determine the target matrix; according to the matrix element value of the target matrix, determine the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array.
  • the controller 801 when using each multiplier-adder group in the plurality of multiplier-adder groups to perform data processing tasks corresponding to each multiplier-adder group in parallel, the controller 801, which is specifically configured to store the to-be-processed image data corresponding to each multiplier-adder group in the multiplier-adder array according to the position of each target multiplier-adder in each multiplier-adder group.
  • the register array corresponding to each of the multiplier-adder groups for each data processing cycle in the multiple data processing cycles, respectively, from the register array corresponding to each multiplier-adder group, read the multipliers of the data processing cycle.
  • the image data to be processed corresponding to the adder group is processed in parallel; the read image data to be processed is processed in parallel to obtain the data processing result of each multiplier-adder group in the data processing cycle;
  • the data processing results corresponding to the groups are completed, and the data processing tasks corresponding to the multiplier-adder groups are completed.
  • the controller 801 is specifically configured to, for each multiplier-adder group, determine that the target multiplier-adder of the multiplier-adder group is in the corresponding register array.
  • the target multiplier in the multiplier-adder group Fixed the position of the read register in the multiplier-adder group; for each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the target multiplier in the multiplier-adder group
  • the position of the register read by the adder, and the processing order of the operands contained in the image data to be processed in the data processing process, and the image data to be processed corresponding to the multiplier-adder group is stored to the multiplier-adder group.
  • the operand stored in the register fixedly read by each target multiplier-adder corresponds to the matrix element in the matrix operand corresponding to the processing cycle.
  • the controller 801 is specifically used for processing the to-be-processed image data.
  • the operand of each multiplier-adder group is determined as the first operand; and the matrix elements in the matrix operand corresponding to the first data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of each data processing cycle; for the non-first data processing cycle of processing the to-be-processed image data, the to-be-processed image data is controlled according to the preset corresponding to the data processing cycle.
  • the controller 801 when completing the data processing task corresponding to each multiplier-adder group according to the data processing results corresponding to each multiplier-adder group in each data processing cycle, the controller 801 specifically uses For each target multiplier-adder in each multiplier-adder group, the products obtained by the target multiplier-adder in each data processing cycle are added to obtain a sum value; based on each target contained in each multiplier-adder group The sum value corresponding to the multiplier-adder completes the data processing task corresponding to each multiplier-adder group.
  • the data processing tasks include: convolution processing tasks; images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
  • the image processing apparatus may include a chip, an AI chip, and the like.
  • An embodiment of the present disclosure further provides a computer device.
  • a schematic structural diagram of the computer device provided by the embodiment of the present disclosure includes a controller 910 and a memory 920 .
  • the memory 920 stores machine-readable instructions executable by the controller 910
  • the controller 910 is configured to execute the machine-readable instructions stored in the memory 920 .
  • the controller 910 performs the following steps: grouping the multiplier-adders in the multiplier-adder array based on the operation step to obtain a plurality of multiplier-adder groups; Using each of the plurality of multiplier-adder groups, data processing tasks corresponding to each of the multiplier-adder groups are performed in parallel.
  • the above-mentioned memory 920 includes a memory 921 and an external memory 922; the memory 921 here is also called an internal memory, which is used to temporarily store the operation data in the controller 910 and the data exchanged with the external memory 922 such as the hard disk.
  • the external memory 922 performs data exchange.
  • the computer device provided by the embodiment of the present disclosure may include a smart terminal such as a mobile phone, or may also be other devices, servers, etc. that have a camera and can perform image processing, which is not limited here.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program codes, and the instructions included in the program codes can be used to execute the steps of the data processing methods described in the foregoing method embodiments. For details, please refer to the foregoing methods. The embodiments are not repeated here.
  • the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

The present disclosure provides a data processing method and apparatus, a computer device, and a storage medium. The method comprises: grouping a plurality of multipliers in a multiplier array on the basis of an operation step size to obtain a plurality of multiplier groups; and performing in parallel, by using all of the plurality of multiplier groups, data processing tasks corresponding to all the multiplier groups. The present disclosure enables a multiplier array to process all of a plurality of data processing tasks, thereby improving the processing efficiency of the multiplier array on the data processing tasks. In addition, the multiplier array is grouped on the basis of an operation step size to enable a multiplier, which outputs an invalid result of processing a certain data processing task, to output a valid result of processing another data processing task, thereby increasing the utilization rate of the multiplier array, and reducing the waste of computing resources.

Description

一种数据处理方法、装置、计算机设备及存储介质A data processing method, device, computer equipment and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2021年1月31日提交的、申请号为202110132573.X的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。The present disclosure claims priority to Chinese Patent Publication No. 202110132573.X filed on January 31, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及计算机技术领域,具体而言,涉及一种数据处理方法、装置、计算机设备及存储介质。The present disclosure relates to the field of computer technology, and in particular, to a data processing method, apparatus, computer device, and storage medium.
背景技术Background technique
目前,卷积神经网络主要依赖乘加器阵列来进行卷积处理,乘加器阵列将数据处理任务中的待处理图像数据存储在对应的寄存器阵列中,待处理图像数据在不同数据处理周期在寄存器阵列中移动;但是当前的数据处理方式存在乘加器阵列利用率低、计算资源浪费的问题。At present, the convolutional neural network mainly relies on the multiplier-adder array for convolution processing. The multiplier-adder array stores the image data to be processed in the data processing task in the corresponding register array. However, the current data processing method has the problems of low utilization rate of the multiplier-adder array and waste of computing resources.
发明内容SUMMARY OF THE INVENTION
本公开实施例至少提供一种数据处理方法、装置、计算机设备及存储介质。Embodiments of the present disclosure provide at least a data processing method, apparatus, computer device, and storage medium.
第一方面,本公开实施例提供了一种数据处理方法,包括:基于操作步长对乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务。In a first aspect, an embodiment of the present disclosure provides a data processing method, including: grouping a plurality of multiplier-adders in a multiplier-adder array based on an operation step to obtain a plurality of multiplier-adder groups; using the plurality of multiplier-adder groups Each multiplier-adder group in the multiplier-adder group executes the data processing task corresponding to each multiplier-adder group in parallel.
这样,基于对乘加器阵列分组,乘加器阵列可以同时处理多个数据处理任务,提高了乘加器阵列对数据处理任务的处理效率。另外,基于操作步长对乘加器阵列进行分组使得原本对某一数据处理任务的处理结果无效的乘加器对另一个数据处理任务的处理结果有效,提高了乘加器阵列的利用率、减少了计算资源的浪费。In this way, based on the grouping of the multiplier-adder array, the multiplier-adder array can process multiple data processing tasks at the same time, which improves the processing efficiency of the multiplier-adder array for the data processing tasks. In addition, the multiplier-adder arrays are grouped based on the operation step size, so that the multiplier-adder that was originally invalid for the processing result of a certain data processing task is effective for the processing result of another data processing task, which improves the utilization rate of the multiplier-adder array. The waste of computing resources is reduced.
在一种可能的实施方式中,所述乘加器阵列的同一行中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零,以及所述乘加器阵列的同一列中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零。In a possible implementation manner, in the same row of the multiplier-adder array, the interval between two adjacent multiplier-adders in the same group is the same and non-zero in the number of multiplier-adders in the same group, and the multiplier-adder array In the same column of , the interval between two adjacent multiplier-adders of the same group is the same and not zero.
这样,基于乘加器阵列的分组情况可以保证每个乘加器组处理不同的数据处理任务,使得乘加器阵列能同时处理多个数据处理任务,提高乘加器阵列对数据处理任务的处理效率。In this way, the grouping situation of the multiplier-adder array can ensure that each multiplier-adder group handles different data processing tasks, so that the multiplier-adder array can process multiple data processing tasks at the same time, and the processing of data processing tasks by the multiplier-adder array is improved. efficiency.
在一种可能的实施方式中,所述基于操作步长对乘加器阵列中的多个乘加器进行分组,包括:基于所述操作步长确定所述乘加器组的数量;基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组。In a possible implementation manner, the grouping a plurality of multiplier-adders in the multiplier-adder array based on the operation step size includes: determining the number of the multiplier-adder groups based on the operation step size; The number of multiplier-adder groups groups a plurality of multiplier-adders in the multiplier-adder array.
这样,保证了乘加器阵列的每个乘加器组对该乘加器组的数据处理任务的处理结果是有效的,使得乘加器阵列能同时处理多个数据处理任务,提高乘加器阵列对数据处理任务的处理效率。In this way, it is ensured that each multiplier-adder group of the multiplier-adder array is effective in processing the data processing task of the multiplier-adder group, so that the multiplier-adder array can process multiple data processing tasks at the same time, improving the performance of the multiplier-adder group. The processing efficiency of the array for data processing tasks.
在一种可能的实施方式中,所述基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组,包括:从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器;基于所述首个目标乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,从所述乘加器阵列中确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器。In a possible implementation manner, the grouping a plurality of multiplier-adders in the multiplier-adder array based on the number of multiplier-adder groups includes: determining the multiplier-adder array from the multiplier-adder array. the first target multiplier-adder in each multiplier-adder group; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the size of the multiplier-adder array, Other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder are determined from the multiplier-adder array.
这样,将每个乘加器组的首个目标乘加器在乘加器阵列中的位置确定后,基于每个乘加器组的首个目标乘加器在乘加器阵列中的位置就可以从乘加器阵列中确定每个乘 加器组中除首个目标乘加器外的其他目标乘加器的位置,提高了对乘加器阵列分组的分组效率。In this way, after the position of the first target multiplier-adder of each multiplier-adder group in the multiplier-adder array is determined, based on the position of the first target multiplier-adder of each multiplier-adder group in the multiplier-adder array, the The positions of other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group can be determined from the multiplier-adder array, which improves the grouping efficiency of the multiplier-adder array grouping.
在一种可能的实施方式中,所述基于所述首个乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,从所述乘加器阵列中确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器,包括:针对每个乘加器组中除所述首个目标乘加器之外的每个乘加器,基于所述操作步长,确定该乘加器与该乘加器的同行且相邻的前一乘加器在所述乘加器阵列中的第一位置关系;并基于所述操作步长、所述乘加器阵列的列数,确定该乘加器与该乘加器的同列且相邻的前一乘加器在所述乘加器阵列中的第二位置关系;基于该乘加器组首个目标乘加器在所述乘加器阵列中的位置、所述第一位置关系和所述第二位置关系,确定该乘加器在所述乘加器阵列中的目标位置。In a possible implementation manner, based on the position of the first multiplier-adder in the multiplier-adder array, the operation step size and the size of the multiplier-adder array, the multiplier-adder is obtained from the multiplier-adder array Determining other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array includes: for each multiplier-adder group except for the first target multiplier-adder For each other multiplier-adder, based on the operation step size, determine the first positional relationship between the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder in the multiplier-adder array; And based on the operation step size and the number of columns of the multiplier-adder array, determine the second multiplier-adder in the multiplier-adder array of the multiplier-adder and the previous multiplier-adder in the same column and adjacent to the multiplier-adder array. positional relationship; based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, it is determined that the multiplier-adder is in the multiplier-adder target location in the array.
在一种可能的实施方式中,所述从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器,包括:基于所述操作步长、所述乘加器阵列,确定目标矩阵;根据所述目标矩阵的矩阵元素值,确定每个乘加器组中的首个目标乘加器在所述乘加器阵列中的位置。In a possible implementation manner, the determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array includes: based on the operation step size, the multiply-adder determining the target matrix; determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array according to the matrix element values of the target matrix.
在一种可能的实施方式中,所述利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务,包括:根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述每个乘加器组对应的待处理图像数据存储至与所述每个乘加器组对应的寄存器阵列中;针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理,得到各乘加器组在该数据处理周期内的数据处理结果;根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务。In a possible implementation manner, using each multiplier-adder group in the multiple multiplier-adder groups to perform data processing tasks corresponding to each multiplier-adder group in parallel includes: according to the multiplier-adder group. the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, and store the image data to be processed corresponding to each multiplier-adder group In the register array corresponding to the multiplier-adder group; for each data processing cycle in the multiple data processing cycles, read the pending processing corresponding to each multiplier-adder group in the data processing cycle from the register array corresponding to each multiplier-adder group. image data; perform parallel processing on the read image data to be processed to obtain the data processing results of each multiplier-adder group in the data processing cycle; according to the data processing results corresponding to each multiplier-adder group in each data processing cycle , to complete the data processing tasks corresponding to each multiplier-adder group.
这样,乘加器阵列通过在不同的数据处理周期读取对应的操作数保证了每个乘加器组能处理对应的数据处理任务,保证乘加器阵列对数据处理任务的处理结果有效性。In this way, the multiplier-adder array ensures that each multiplier-adder group can process the corresponding data processing task by reading corresponding operands in different data processing cycles, and ensures the validity of the processing result of the multiplier-adder array for the data processing task.
在一种可能的实施方式中,根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中,包括:针对每个乘加器组,确定该乘加器组的目标乘加器在各自对应寄存器阵列中固定读取的寄存器的位置;针对每个乘加器组,根据该乘加器组中的各个目标乘加器在所述乘加器阵列中的位置、该乘加器组中的目标乘加器固定读取的寄存器的位置、以及数据处理过程中对所述待处理图像数据包含的操作数的处理顺序,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中,使得每个数据处理周期,每个目标乘加器固定读取的寄存器存储的操作数,与对应数据处理周期矩阵操作数中的矩阵元素对应。In a possible implementation manner, according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, the to-be-processed image data corresponding to the multiplier-adder group storing in the register array corresponding to the multiplier-adder group, comprising: for each multiplier-adder group, determining the position of the register that the target multiplier-adder of the multiplier-adder group reads in the respective corresponding register arrays; For each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, and the position of the register read by the target multiplier-adder in the multiplier-adder group , and the processing sequence of the operands contained in the image data to be processed in the data processing process, the image data to be processed corresponding to the multiplier-adder group is stored in the register array corresponding to the multiplier-adder group, In each data processing cycle, the operand stored in the fixed read register of each target multiplier-adder corresponds to the matrix element in the matrix operand of the corresponding data processing cycle.
在一种可能的实施方式中,所述针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理,得到各乘加器组在该数据处理周期内的数据处理结果,包括:针对对所述待处理图像数据进行处理的首个数据处理周期,控制各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在所述首个数据处理周期对应的操作数作为第一操作数;并确定各乘加器组在所述首个数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在所述首个数据处理周期的第一操作数与第二操作数的乘积;针对对所述待处理图像数据进行处理的非首个数据处理周期,控制所述待处理图像数据按照该数据处理周期对应的预设数据移动方式在所述寄存器阵列中移动预设步长;并控制所述各乘加器组中的各目标乘加器,分别从与各目标乘加器固定读取的寄存器中读取各目标乘加器在所述非首个数 据处理周期的操作数作为第一操作数;并确定各乘加器组在该数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在该数据处理周期的第一操作数与第二操作数的乘积。In a possible implementation manner, for each data processing cycle in the plurality of data processing cycles, from the register array corresponding to each multiplier-adder group, respectively, read the corresponding data processing cycle corresponding to each multiplier-adder group. and perform parallel processing on the read image data to be processed to obtain the data processing results of each multiplier-adder group within the data processing cycle, including: a first step for processing the image data to be processed. each data processing cycle, control each target multiplier-adder in each multiplier-adder group, and read the operand corresponding to each target multiplier-adder in the first data processing cycle from the fixed read register as the first operation and determine the matrix elements in the matrix operands corresponding to the first data processing cycle of each multiplier-adder group as the second operand; respectively determine the first data processing cycle of each target multiplier-adder The product of the operand and the second operand; for the non-first data processing cycle in which the image data to be processed is processed, the image data to be processed is controlled according to the preset data movement mode corresponding to the data processing cycle. Move the preset step size in the register array; and control each target multiplier-adder in each multiplier-adder group, respectively read each target multiplier-adder from the register fixedly read with each target multiplier-adder in the non-multiplier-adder The operand of the first data processing cycle is taken as the first operand; and the matrix elements in the matrix operand corresponding to the data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of a data processing cycle.
这样,基于预设步长、以及预设数据移动方式使得操作数在寄存器阵列中随数据处理周期的变换作出有序的位移,确保乘加器阵列中对应的乘加器能获取到有效数据,保证对数据处理任务的处理结果的有效性。In this way, based on the preset step size and the preset data movement mode, the operands are shifted in an orderly manner in the register array with the transformation of the data processing cycle, so as to ensure that the corresponding multiplier-adder in the multiplier-adder array can obtain valid data, Ensure the validity of the processing results of the data processing tasks.
在一种可能的实施方式中,所述根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务,包括:针对每个乘加器组中的每一目标乘加器,将该目标乘加器在各个数据处理周期中得到的乘积相加得到和值;基于每个乘加器组分别包含的各目标乘加器对应的和值,完成各乘加器组分别对应的数据处理任务。In a possible implementation manner, completing the data processing tasks corresponding to each multiplier-adder group according to the data processing results corresponding to each multiplier-adder group in each data processing cycle includes: for each multiplier-adder group For each target multiplier-adder in the group, add the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum value; based on the sum value corresponding to each target multiplier-adder contained in each multiplier-adder group , to complete the data processing tasks corresponding to each multiplier-adder group.
在一种可能的实施方式中,所述数据处理任务包括:卷积处理任务;不同的所述乘加器组的卷积处理任务对应的待处理图像不同。In a possible implementation manner, the data processing tasks include: convolution processing tasks; images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
这样,使得乘加器阵列能同时处理多个待处理图像,提高乘加器阵列对待处理图像的处理效率。In this way, the multiplier-adder array can process multiple images to be processed at the same time, and the processing efficiency of the multiplier-adder array to be processed is improved.
第二方面,本公开实施例还提供一种数据处理装置,包括:控制器;所述控制器用于:基于操作步长对乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务。In a second aspect, an embodiment of the present disclosure further provides a data processing apparatus, including: a controller; the controller is configured to: group a plurality of multiplier-adders in a multiplier-adder array based on an operation step to obtain a plurality of multiplier-adders an adder group; using each multiplier-adder group in the plurality of multiplier-adder groups, execute the data processing task corresponding to each multiplier-adder group in parallel.
在一种可能的实施方式中,所述乘加器阵列的同一行中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零,以及所述乘加器阵列的同一列中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零。In a possible implementation manner, in the same row of the multiplier-adder array, the interval between two adjacent multiplier-adders in the same group is the same and non-zero in the number of multiplier-adders in the same group, and the multiplier-adder array In the same column of , the interval between two adjacent multiplier-adders of the same group is the same and not zero.
在一种可能的实施方式中,在基于操作步长对乘加器阵列中的多个乘加器进行分组时,所述控制器,具体用于基于所述操作步长确定所述乘加器组的数量;基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组。In a possible implementation manner, when the multiplier-adders in the multiplier-adder array are grouped based on the operation step size, the controller is specifically configured to determine the multiplier-adder based on the operation step size the number of groups; the multiplier-adders in the multiplier-adder array are grouped based on the number of multiplier-adder groups.
在一种可能的实施方式中,在基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组时,所述控制器,具体用于从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器;基于所述首个目标乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,从所述乘加器阵列中确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器。In a possible implementation manner, when a plurality of multiplier-adders in the multiplier-adder array are grouped based on the number of the multiplier-adder groups, the controller is specifically configured to extract the data from the multiplier-adder group. determining the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the multiplier-adder The size of the adder array is determined from the multiplier-adder array, and other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder are determined from the multiplier-adder array.
在一种可能的实施方式中,在基于所述首个乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,从所述乘加器阵列中确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器时,所述控制器,具体用于针对每个乘加器组中除所述首个目标乘加器之外的每个乘加器,基于所述操作步长,确定该乘加器与该乘加器的同行且相邻的前一乘加器在所述乘加器阵列中的第一位置关系;并基于所述操作步长、所述乘加器阵列的列数,确定该乘加器与该乘加器的同列且相邻的前一乘加器在所述乘加器阵列中的第二位置关系;基于该乘加器组首个目标乘加器在所述乘加器阵列中的位置、所述第一位置关系和所述第二位置关系,确定该乘加器在所述乘加器阵列中的目标位置。In a possible implementation, based on the position of the first multiplier-adder in the multiplier-adder array, the operation step size and the size of the multiplier-adder array, the multiplier-adder is obtained from the multiplier-adder array. When other target multiplier-accumulators except the first target multiplier-adder in each multiplier-adder group are determined in the array, the controller is specifically configured to target all multiplier-adder groups except the first target multiplier-adder. For each multiplier-adder other than the target multiplier-adders, based on the operation step size, determine that the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder are in the multiplier-adder array The first positional relationship of the second positional relationship in the multiplier array; based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, determine the multiplier-adder The target position of the multiplier in the multiplier-adder array.
在一种可能的实施方式中,在从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器时,所述控制器,具体用于基于所述操作步长、所述乘加器阵列,确定目标矩阵;根据所述目标矩阵的矩阵元素值,确定每个乘加器组中的首个目标乘加器在所述乘加器阵列中的位置。In a possible implementation manner, when determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array, the controller is specifically configured to, based on the operation step Length, the multiplier-adder array, determine the target matrix; according to the matrix element value of the target matrix, determine the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array.
在一种可能的实施方式中,在利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务时,所述控制器,具体用于根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述每个乘加器组对应的待处理图像数据存储至与所述每个乘加器组对应的寄存器阵列中;针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理,得到各乘加器组在该数据处理周期内的数据处理结果;根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务。In a possible implementation manner, when using each multiplier-adder group in the plurality of multiplier-adder groups to perform data processing tasks corresponding to each multiplier-adder group in parallel, the controller , which is specifically used to store the to-be-processed image data corresponding to each multiplier-adder group to the In the register array corresponding to each multiplier-adder group; for each data processing cycle in the plurality of data processing cycles, respectively, from the register array corresponding to each multiplier-adder group, read the multiplication and addition of the data processing cycle. The image data to be processed corresponding to the group of multipliers is processed; the read image data to be processed is processed in parallel to obtain the data processing results of each multiplier-adder group in the data processing cycle; The corresponding data processing results are completed, respectively, and the data processing tasks corresponding to each multiplier-adder group are completed.
在一种可能的实施方式中,在根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中时,所述控制器,具体用于针对每个乘加器组,确定该乘加器组的目标乘加器在各自对应寄存器阵列中固定读取的寄存器的位置;针对每个乘加器组,根据该乘加器组中的各个目标乘加器在所述乘加器阵列中的位置、该乘加器组中的目标乘加器固定读取的寄存器的位置、以及数据处理过程中对所述待处理图像数据包含的操作数的处理顺序,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中,使得每个数据处理周期,每个目标乘加器固定读取的寄存器存储的操作数,与对应数据处理周期矩阵操作数中的矩阵元素对应。In a possible implementation manner, according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, the to-be-processed image corresponding to the multiplier-adder group When the data is stored in the register array corresponding to the multiplier-adder group, the controller is specifically configured to, for each multiplier-adder group, determine that the target multiplier-adder of the multiplier-adder group is in the corresponding register array The position of the read register is fixed; for each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the target multiplier-adder in the multiplier-adder group The position of the register read by the processor and the processing sequence of the operands contained in the image data to be processed in the data processing process are stored, and the image data to be processed corresponding to the multiplier-adder group is stored to the multiplier-adder group. In the register array corresponding to the processor group, in each data processing cycle, the operand stored in the register fixedly read by each target multiplier-adder corresponds to the matrix element in the matrix operand of the corresponding data processing cycle.
在一种可能的实施方式中,在针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理,得到各乘加器组在该数据处理周期内的数据处理结果时,所述控制器,具体用于针对对所述待处理图像数据进行处理的首个数据处理周期,控制各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在所述首个数据处理周期对应的操作数作为第一操作数;并确定各乘加器组在所述首个数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在所述首个数据处理周期的第一操作数与第二操作数的乘积;针对对所述待处理图像数据进行处理的非首个数据处理周期,控制所述待处理图像数据按照该数据处理周期对应的预设数据移动方式在所述寄存器阵列中移动预设步长;并控制所述各乘加器组中的各目标乘加器,分别从与各目标乘加器固定读取的寄存器中读取各目标乘加器在所述非首个数据处理周期的操作数作为第一操作数;并确定各乘加器组在该数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在该数据处理周期的第一操作数与第二操作数的乘积。In a possible implementation manner, for each data processing cycle in the multiple data processing cycles, from the register array corresponding to each multiplier-adder group, respectively, the data processing cycle corresponding to each multiplier-adder group is read. image data to be processed; and perform parallel processing on the read image data to be processed to obtain the data processing results of each multiplier-adder group within the data processing cycle, the controller is specifically used for processing the to-be-processed image data. In the first data processing cycle of image data processing, control each target multiplier-adder in each multiplier-adder group, and read the corresponding target multiplier-adder in the first data processing cycle from the fixed read register respectively. The operand is used as the first operand; and the matrix elements in the matrix operands corresponding to the first data processing cycle of each multiplier-adder group are determined as the second operand; respectively, determine that each target multiplier-adder is in the first data processing cycle. The product of the first operand and the second operand of the data processing cycle; for the non-first data processing cycle of processing the image data to be processed, control the image data to be processed according to the preset corresponding to the data processing cycle The data movement mode moves a preset step size in the register array; and controls each target multiplier-adder in each multiplier-adder group, and reads each target multiplier from a register fixedly read with each target multiplier-adder. The operand of the adder in the non-first data processing cycle is taken as the first operand; and the matrix elements in the matrix operands corresponding to the data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of the target multiplier-adder in this data processing cycle.
在一种可能的实施方式中,在根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务时,所述控制器,具体用于针对每个乘加器组中的每一目标乘加器,将该目标乘加器在各个数据处理周期中得到的乘积相加得到和值;基于每个乘加器组分别包含的各目标乘加器对应的和值,完成各乘加器组分别对应的数据处理任务。In a possible implementation manner, when completing the data processing tasks corresponding to each multiplier-adder group according to the data processing results corresponding to each multiplier-adder group in each data processing cycle, the controller is specifically configured to: For each target multiplier-adder in each multiplier-adder group, add the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum value; based on each target multiplier-adder included in each multiplier-adder group The sum value corresponding to the adder is used to complete the data processing task corresponding to each multiplier-adder group.
在一种可能的实施方式中,所述数据处理任务包括:卷积处理任务;不同的所述乘加器组的卷积处理任务对应的待处理图像不同。In a possible implementation manner, the data processing tasks include: convolution processing tasks; images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
第三方面,本公开可选实现方式还提供一种计算机设备,控制器、存储器,所述存储器存储有所述控制器可执行的机器可读指令,所述控制器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述控制器执行时,所述机器可读指令被所述控制器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an optional implementation manner of the present disclosure further provides a computer device, a controller, and a memory, where the memory stores machine-readable instructions executable by the controller, and the controller is configured to execute the instructions stored in the memory. machine-readable instructions, when the machine-readable instructions are executed by the controller, the machine-readable instructions are executed by the controller to execute the first aspect above, or any possible implementation of the first aspect steps in the method.
第四方面,本公开可选实现方式还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被运行时执行上述第一方面,或第一方面中 任一种可能的实施方式中的步骤。In a fourth aspect, an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect, or any of the first aspect, when the computer program is run. steps in one possible implementation.
关于上述数据处理装置、计算机设备、及计算机可读存储介质的效果描述参见上述数据处理方法的说明,这里不再赘述。For the description of the effects of the above-mentioned data processing apparatus, computer equipment, and computer-readable storage medium, please refer to the description of the above-mentioned data processing method, which will not be repeated here.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the accompanying drawings required in the embodiments will be briefly introduced below. These drawings illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.
图1示出了本公开实施例所提供的一种数据处理方法的流程图;FIG. 1 shows a flowchart of a data processing method provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的一种乘加器阵列的示例图;FIG. 2 shows an example diagram of a multiplier-adder array provided by an embodiment of the present disclosure;
图3A、图3B、图3C示出了本公开实施例所提供的一种基于操作步长移动的示例图;3A, FIG. 3B, and FIG. 3C show an example diagram of movement based on an operation step provided by an embodiment of the present disclosure;
图4示出了本公开提供一种将乘加器阵列分为四个乘加器组的示例图;FIG. 4 shows an example diagram of dividing a multiplier-adder array into four multiplier-adder groups provided by the present disclosure;
图5示出了本公开实施例所提供的一种确定每个乘加器组中首个目标乘加器在乘加器阵列中的位置的矩阵示例图;FIG. 5 shows an example diagram of a matrix for determining the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array provided by an embodiment of the present disclosure;
图6A和图6B示出了本公开实施例所提供的一种乘加器阵列以及对应的寄存器阵列的示例图;6A and FIG. 6B show exemplary diagrams of a multiplier-adder array and a corresponding register array provided by an embodiment of the present disclosure;
图7示出了本公开实施例中待处理图像数据在寄存器阵列中整体左移一个步长后的寄存器阵列a的示例图;7 shows an example diagram of the register array a after the image data to be processed is shifted to the left by one step in the register array as a whole in an embodiment of the present disclosure;
图8示出了本公开实施例所提供的一种数据处理装置的示意图;FIG. 8 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
图9示出了本公开实施例所提供的一种计算机设备的示意图。FIG. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.
经研究发现,卷积神经网络主要依赖乘加器阵列来进行卷积处理。在进行卷积处理时,待处理图像数据会被存储至与乘加器阵列连接的寄存器阵列中;存储在寄存器阵列中的待处理图像数据会在不同的数据处理周期在寄存器阵列中移动;乘加器阵列每个数据处理周期,从与乘加器连接的寄存器(属于寄存器阵列)中读取该数据处理周期的操作数,并执行乘法、和/或加法运算。在经过多个数据处理周期的处理后,乘加器阵列输出对待处理图像数据进行卷积处理后的处理结果。在操作步长大于1的情况下,乘加器阵列中的部分乘加器的处理结果是在对待处理图像数据进行处理的结果中不需要的,因 此在该种情况下的数据处理方式中存在乘加器阵列利用率低、计算资源浪费的问题。After research, it is found that convolutional neural networks mainly rely on multiplier-adder arrays for convolution processing. During convolution processing, the image data to be processed will be stored in the register array connected to the multiplier-adder array; the image data to be processed stored in the register array will be moved in the register array in different data processing cycles; multiply-add In each data processing cycle of the multiplier-adder array, the operands of the data processing cycle are read from the registers (belonging to the register array) connected to the multiplier-adder, and multiplication and/or addition operations are performed. After a plurality of data processing cycles, the multiplier-adder array outputs the result of convolution processing on the image data to be processed. In the case where the operation step size is greater than 1, the processing results of some of the multiplier-adders in the multiplier-adder array are not needed in the results of processing the image data to be processed, so there is a data processing method in this case. The problem of low utilization of multiplier-adder array and waste of computing resources.
基于上述研究,本公开提供了一种数据处理方法、装置、计算机设备及存储介质,通过基于操作步长将乘加器阵列分组得到多个乘加器组,让乘加器阵列中不同的乘加器组分别并行处理不同待处理图像数据对应的数据处理任务,也即同一个乘加器阵列可以同时处理多个待处理图像数据,每一乘加器组处理一个待处理图像数据,这样使得在对一个待处理图像数据的处理过程中没用到的乘加器被用于对其他的待处理图像数据进行处理,提高了乘加器阵列的利用率、减少了计算资源的浪费,并且提高乘加器阵列对待处理图像数据的处理效率。Based on the above research, the present disclosure provides a data processing method, apparatus, computer equipment and storage medium. By grouping the multiplier-adder array based on the operation step size, multiple multiplier-adder groups are obtained, so that different multiplier-adder groups in the multiplier-adder array are obtained. The adder groups process data processing tasks corresponding to different image data to be processed in parallel, that is, the same multiplier-adder array can process multiple image data to be processed at the same time, and each multiplier-adder group processes one image data to be processed. The multiplier-adder that is not used in the process of processing one image data to be processed is used to process other image data to be processed, which improves the utilization rate of the multiplier-adder array, reduces the waste of computing resources, and improves the The processing efficiency of the multiplier-adder array for the image data to be processed.
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。The defects existing in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the present disclosure hereinafter should be the inventors Contributions made to this disclosure during the course of this disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种数据处理方法进行详细介绍,本公开实施例所提供的数据处理方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该数据处理方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, a data processing method disclosed in the embodiment of the present disclosure is first introduced in detail. The device includes, for example, a terminal device or a server or other processing device, and the terminal device can be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA) , handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the data processing method may be implemented by a processor invoking computer-readable instructions stored in a memory.
下面对本公开实施例提供的数据处理方法加以说明。The data processing method provided by the embodiments of the present disclosure will be described below.
参见图1所示,为本公开实施例提供的数据处理方法的流程图,所述方法包括步骤S101~S102,其中:Referring to FIG. 1, which is a flowchart of a data processing method provided by an embodiment of the present disclosure, the method includes steps S101-S102, wherein:
S101:基于操作步长对乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;S101: Grouping multiple multiplier-adders in the multiplier-adder array based on the operation step size to obtain multiple multiplier-adder groups;
S102:利用多个乘加器组中的每个乘加器组,并行执行与每个乘加器组对应的数据处理任务。S102: Using each multiplier-adder group in the multiple multiplier-adder groups, execute the data processing task corresponding to each multiplier-adder group in parallel.
本公开通过基于操作步长对乘加器阵列进行分组,得到多个乘加器组,让该多个乘加器组并行执行各自对应的数据处理任务;每个乘加器组的数据处理任务不同使得乘加器阵列可以同时处理多个数据处理任务,提高了乘加器阵列的处理效率。另外,在对一个待处理图像进行数据处理过程中没有用到的乘加器被用于对其他的待处理图像的数据进行处理,提高了乘加器阵列的利用率、减少了计算资源的浪费。In the present disclosure, the multiplier-adder arrays are grouped based on the operation step size to obtain multiple multiplier-adder groups, and the multiple multiplier-adder groups are allowed to execute their corresponding data processing tasks in parallel; the data processing tasks of each multiplier-adder group The difference enables the multiplier-adder array to process multiple data processing tasks at the same time, thereby improving the processing efficiency of the multiplier-adder array. In addition, the multiplier-adder that is not used in the data processing process of one image to be processed is used to process the data of other images to be processed, which improves the utilization rate of the multiplier-adder array and reduces the waste of computing resources. .
下面对上述S101~S102加以详细说明。The above S101 to S102 will be described in detail below.
针对上述S101,乘加器阵列是由多个乘加器组成的矩阵阵列,示例性的,如图2所示为本公开提供的一种乘加器阵列的示例图,该乘加器阵列包含4行4列一共16个乘加器。矩阵操作数例如包括对待处理图像数据进行处理时的卷积核;操作步长例如为卷积核的移动步长。示例性的,如图3A、图3B、图3C中卷积核按照步长2移动就代表:S x=2、S y=2,移动的过程例如为从图3A所示的目标位置一移动至图3B所示的目标位置二,再从图3B所示目标位置二移动至图3C所示的目标位置三,即横向移动时一次移动两个像素,纵向移动时也是一次移动两个像素;其中,S x代表在横向上移动的像素个数,S y代表在纵向上移动的像素个数。 For the above S101, the multiplier-adder array is a matrix array composed of a plurality of multiplier-adders. As an example, FIG. 2 shows an example diagram of a multiplier-adder array provided by the present disclosure. The multiplier-adder array includes There are 16 multiplier-adders in 4 rows and 4 columns. The matrix operand includes, for example, the convolution kernel when the image data to be processed is processed; the operation step size is, for example, the moving step size of the convolution kernel. Exemplarily, as shown in Figure 3A, Figure 3B, and Figure 3C, moving the convolution kernel according to the step size 2 means: S x =2, S y =2, and the moving process is, for example, moving from the target position shown in Figure 3A To the target position two shown in Figure 3B, then move from the target position two shown in Figure 3B to the target position three shown in Figure 3C, i.e. move two pixels at a time when moving horizontally, and move two pixels at a time when moving vertically; Among them, S x represents the number of pixels moved in the horizontal direction, and S y represents the number of pixels moved in the vertical direction.
在对乘加器阵列中的乘加器进行分组时,例如可以基于操作步长确定乘加器组的数 量,并基于乘加器组的数量对乘加器阵列中的多个乘加器进行分组。When grouping the multiplier-adders in the multiplier-adder array, for example, the number of multiplier-adder groups can be determined based on the operation step size, and a plurality of multiplier-adders in the multiplier-adder array can be performed based on the number of multiplier-adder groups. grouping.
在具体实施中,操作步长与乘加器组的数量GN的关系为:In a specific implementation, the relationship between the operation step size and the number GN of multiplier-adder groups is:
GN=S x*S yGN=S x *S y ;
例如,操作步长为2时,S x=2、S y=2,则乘加器组的数量GN为4。 For example, when the operation step size is 2, S x =2, S y =2, then the number GN of multiplier-adder groups is 4.
在具体实施中,本公开实施例提供了一种基于乘加器组的数量对乘加器阵列中的多个乘加器进行分组,得到多个乘加器组的具体方法,包括:从乘加器阵列中确定每个乘加器组中的首个目标乘加器;基于首个目标乘加器在乘加器阵列中的位置、操作步长以及乘加器阵列的尺寸,从乘加器阵列中确定每个乘加器组中除首个目标乘加器外的其他目标乘加器。In a specific implementation, an embodiment of the present disclosure provides a specific method for grouping multiple multiplier-adders in a multiplier-adder array based on the number of multiplier-adder groups to obtain multiple multiplier-adder groups, including: Determine the first target multiplier-adder in each multiplier-adder group in the adder array; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size and the size of the multiplier-adder array, the The target multiplier-adders other than the first target multiplier-adder in each multiplier-adder group are determined in the multiplier array.
在具体实施中,由于乘加器阵列的尺寸是固定的,而要处理的待处理图像数据的尺寸会根据实际情况有所区别,因此,即使利用本公开实施例提供的数据处理方法对多个待处理图像数据进行并行处理,乘加器阵列的利用率在很多情况下可能无法达百分之百。因此,本公开实施例中首先基于操作步长、以及乘加器阵列的尺寸确定实际使用的乘加器阵列的尺寸信息;其中,乘加器阵列的尺寸包括乘加器阵列的行数、以及列数,实际使用的乘加器阵列的尺寸信息包括实际使用的乘加器阵列的行数、以及列数;实际使用的乘加器阵列的尺寸信息与操作步长、以及乘加器阵列的尺寸之间的关系为:In a specific implementation, since the size of the multiplier-adder array is fixed, the size of the image data to be processed will be different according to the actual situation. The image data to be processed is processed in parallel, and the utilization rate of the multiplier-adder array may not reach 100% in many cases. Therefore, in the embodiment of the present disclosure, the size information of the multiplier-adder array actually used is firstly determined based on the operation step size and the size of the multiplier-adder array; wherein the size of the multiplier-adder array includes the number of rows of the multiplier-adder array, and Number of columns, the size information of the multiplier-adder array actually used includes the number of rows and columns of the multiplier-adder array actually used; the size information and operation step size of the multiplier-adder array actually used, and the size of the multiplier-adder array actually used. The relationship between the dimensions is:
A′ x=A x-A x%S xA' x =A x -A x %S x ;
A′ y=A y-A y%S yA' y =A y -A y %S y ;
其中,A x为乘加器阵列的列数,A y为乘加器阵列的行数;A′ x为实际使用的乘加器阵列的列数,A′ y为实际使用的乘加器阵列的行数,%为求余数的运算。示例性的,操作步长为2时,S x=2、S y=2,乘加器阵列的尺寸为:A x=5,A y=5;因此实际使用的乘加器阵列的列数A′ x=A x-A x%S x=5-5%2=4,实际使用的乘加器阵列的行数为A′ y=A y-A y%S y=5-5%2=4。 Among them, A x is the number of columns of the multiplier-adder array, A y is the number of rows of the multiplier-adder array; A' x is the number of columns of the multiplier-adder array actually used, and A' y is the actually used multiplier-adder array The number of lines, % is the operation to find the remainder. Exemplarily, when the operation step is 2, S x =2, S y =2, the size of the multiplier-adder array is: A x =5, A y =5; therefore, the number of columns of the multiplier-adder array actually used A'x = Ax - Ax %Sx = 5-5%2=4, the number of rows of the multiplier-adder array actually used is A'y = Ay- Ay % Sy= 5-5%2 =4.
然后在实际使用的乘加器阵列中确定每个乘加器组的首个目标乘加器。The first target multiplier-adder of each multiplier-adder group is then determined in the multiplier-adder array actually used.
具体实施时,例如可以基于下述方式确定每个乘加器组的首个目标乘加器:基于操作步长、乘加器阵列,确定目标矩阵;根据目标矩阵的矩阵元素值,确定每个乘加器组中的首个目标乘加器在乘加器阵列中的位置;其中,矩阵元素值表示每个乘加器组中的首个目标乘加器。During specific implementation, for example, the first target multiplier-adder of each multiplier-adder group can be determined based on the following methods: determining the target matrix based on the operation step size and the multiplier-adder array; determining each The position of the first target multiplier-adder in the multiplier-adder group in the multiplier-adder array; where the matrix element value represents the first target multiplier-adder in each multiplier-adder group.
示例性的,实际使用乘加器阵列的尺寸信息为4行4列,操作步长为2时,乘加器组的数量为4,则代表目标矩阵中包含两行两列一共4个乘加器,将乘加器阵列中的第一个乘加器作为目标矩阵的第一个乘加器,也即第一个乘加器组的首个目标乘加器,基于该目标矩阵的第一个乘加器确定该目标矩阵中的其他乘加器,也即其他乘加器组的首个目标乘加器,例如实际使用的乘加器阵列的位置排布编号为Exemplarily, the size information of the multiplier-adder array actually used is 4 rows and 4 columns, and when the operation step is 2, the number of multiplier-adder groups is 4, which means that the target matrix contains two rows and two columns, a total of 4 multiplication and additions. The first multiplier-adder in the multiplier-adder array is used as the first multiplier-adder of the target matrix, that is, the first target multiplier-adder of the first multiplier-adder group, based on the first multiplier-adder of the target matrix Each multiplier-adder determines other multiplier-adders in the target matrix, that is, the first target multiplier-adder of other multiplier-adder groups. For example, the positional arrangement number of the actually used multiplier-adder array is:
Figure PCTCN2021115799-appb-000001
Figure PCTCN2021115799-appb-000001
第一个乘加器组的首个目标乘加器也即目标矩阵的第一个乘加器在位置0处,则基于位置0处的乘加器可以确定两行两列的目标矩阵,目标矩阵在实际使用的乘加器阵列中对应的位置排布编号为The first target multiplier-adder of the first multiplier-adder group, that is, the first multiplier-adder of the target matrix, is at position 0, then the target matrix with two rows and two columns can be determined based on the multiplier-adder at position 0. The corresponding position of the matrix in the actual multiplier-adder array is numbered as
Figure PCTCN2021115799-appb-000002
Figure PCTCN2021115799-appb-000002
则其他三个乘加器组的首个目标乘加器在实际使用的乘加器阵列中的位置排布编号分别为1、4、5,如图4中所示的目标矩阵。Then, the positions of the first target multiplier-adders of the other three multiplier-adder groups in the actually used multiplier-adder array are respectively 1, 4, and 5, as shown in the target matrix in FIG. 4 .
示例性的,还可参照图5所示的矩阵中每个矩阵元素对应的公式来确定每个乘加器组的首个目标乘加器在实际使用的乘加器阵列中的目标位置,也就是说,图5所示矩阵中每个矩阵元素分别代表一个乘加器组的首个目标乘加器在实际使用的乘加器阵列中的位置。其中,A′ x为实际使用的乘加器阵列的列数,A′ y为实际使用的乘加器阵列的行数, Exemplarily, the target position of the first target multiplier-adder of each multiplier-adder group in the actually used multiplier-adder array can also be determined with reference to the formula corresponding to each matrix element in the matrix shown in FIG. That is to say, each matrix element in the matrix shown in FIG. 5 respectively represents the position of the first target multiplier-adder of a multiplier-adder group in the actually used multiplier-adder array. Among them, A' x is the number of columns of the multiplier-adder array actually used, A' y is the number of rows of the multiplier-adder array actually used,
A′ x=A x-A x%S xA' x =A x -A x %S x ;
A′ y=A y-A y%S yA' y =A y -A y %S y ,
A x为乘加器阵列的列数,A y为乘加器阵列的行数;S x为操作步长的横向移动步长,S y为操作步长的纵向移动步长。 A x is the column number of the multiplier-adder array, A y is the row number of the multiplier-adder array; S x is the horizontal movement step of the operation step, and S y is the vertical movement step of the operation step.
在实际使用的乘加器阵列中确定出每个乘加器组的首个目标乘加器之后,例如可以基于下述步骤一~步骤三所述的方法确定每个乘加器组中除首个目标乘加器外的其他目标乘加器:After the first target multiplier-adder of each multiplier-adder group is determined in the actually used multiplier-adder array, for example, the first divider-adder in each multiplier-adder group can be determined based on the methods described in the following steps 1 to 3 Target multiplier-adders other than target multiplier-adders:
步骤一:针对每个乘加器组中除首个目标乘加器之外的每个乘加器,基于操作步长,确定该乘加器与该乘加器的同行且相邻的前一乘加器在乘加器阵列中的第一位置关系;其中,该乘加器组中除首个目标乘加器之外的每个乘加器PO(i)与该乘加器的同行且相邻的前一乘加器PO(i-1)在乘加器阵列中的第一位置关系例如为:Step 1: For each multiplier-adder in each multiplier-adder group except the first target multiplier-adder, based on the operation step size, determine the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder. The first positional relationship of the multiplier-adder in the multiplier-adder array; wherein, each multiplier-adder PO(i) except the first target multiplier-adder in the multiplier-adder group is in the same row of the multiplier-adder and The first positional relationship of the adjacent previous multiplier-adder PO(i-1) in the multiplier-adder array is, for example:
PO(i-1)+S x=PO(i)。 PO(i-1)+Sx = PO(i).
示例性的,实际使用的乘加器阵列为4行4列,即A′ y=4、A′ x=4,实际使用的乘加器阵列的位置排布编号为 Exemplarily, the actually used multiplier-adder array is 4 rows and 4 columns, that is, A' y =4, A' x =4, and the positional arrangement number of the actually used multiplier-adder array is:
Figure PCTCN2021115799-appb-000003
Figure PCTCN2021115799-appb-000003
操作步长为2时,S x=2、S y=2,如图4所示四种不同的颜色代表四个乘加器组:黑色的第一个乘加器组、白色的第二个乘加器组、浅灰色的第三个乘加器组、以及深灰色的第四个乘加器组。以第一个乘加器组为例,第一个乘加器组的首个目标乘加器在位置0处,则该行同组的另一个乘加器A的位置PO(A)即为: When the operation step is 2, S x =2, S y =2, as shown in Figure 4, the four different colors represent four multiplier-adder groups: the first multiplier-adder group in black, the second in white Multiplier-adder group, third multiplier-adder group in light gray, and fourth multiplier-adder group in dark gray. Taking the first multiplier-adder group as an example, the first target multiplier-adder of the first multiplier-adder group is at position 0, then the position PO(A) of another multiplier-adder A in the same group in this row is :
PO(A)=0+S x=0+2=2, PO(A)=0+Sx = 0+2=2,
该行同组位置2之后的下一个乘加器B的位置PO(B)为:The position PO(B) of the next multiplier-adder B after position 2 in the same group is:
PO(B)=PO(A)+S x=2+S x=2+2=4, PO(B)=PO(A)+S x =2+S x =2+2=4,
但是因为实际使用的乘加器阵列的大小为4列,该行的最大位置排布编号为3,所以该行与位置0处的乘加器属于同组的乘加器只有位置2处的乘加器。However, because the size of the multiplier-accumulator array actually used is 4 columns, and the maximum position arrangement number of this row is 3, the multiplier-accumulator in this row and the multiplier-accumulator at position 0 belong to the same group only has the multiplier-accumulator at position 2. adder.
步骤二:针对每个乘加器组中除首个目标乘加器之外的每个乘加器,基于操作步长、乘加器阵列的列数,确定该乘加器与该乘加器的同列且相邻的前一乘加器在乘加器阵列中的第二位置关系;其中,该乘加器组中除首个目标乘加器之外的每个乘加器PO(j)与该乘加器的同列且相邻的前一乘加器PO(j-1)在乘加器阵列中的第二位置关系例如为:Step 2: For each multiplier-adder except the first target multiplier-adder in each multiplier-adder group, determine the multiplier-adder and the multiplier-adder based on the operation step size and the number of columns of the multiplier-adder array The second positional relationship of the previous multiplier-adder in the same column and adjacent in the multiplier-adder array; wherein, each multiplier-adder PO(j) except the first target multiplier-adder in the multiplier-adder group The second positional relationship of the previous multiplier-adder PO(j-1) in the same column and adjacent to the multiplier-adder array in the multiplier-adder array is, for example:
PO(j-1)+S y*A′ x=PO(j)。 PO(j-1)+Sy* A'x = PO(j).
示例性的,实际使用的乘加器阵列为4行4列,即A′ y=4、A′ x=4,实际使用的乘加器阵列的位置排布编号为 Exemplarily, the actually used multiplier-adder array is 4 rows and 4 columns, that is, A' y =4, A' x =4, and the positional arrangement number of the actually used multiplier-adder array is:
Figure PCTCN2021115799-appb-000004
Figure PCTCN2021115799-appb-000004
操作步长为2时,S x=2、S y=2,如图4所示,以第一个乘加器组为例,第一个乘加器组的首个目标乘加器在位置0处,则该列同组的另一个乘加器C的位置PO(C)即为: When the operation step size is 2, S x =2, S y =2, as shown in Figure 4, taking the first multiplier-adder group as an example, the first target multiplier-adder of the first multiplier-adder group is at the position 0, then the position PO(C) of another multiplier-adder C of the same group in this column is:
PO(C)=0+S y*A′ x=0+2*4=8, PO(C)=0+S y *A′ x =0+2*4=8,
该列同组位置8之后的下一个乘加器D的位置PO(D)为:The position PO(D) of the next multiplier-adder D after position 8 in the same group is:
PO(D)=PO(C)+S y*A′ x=8+S y*A′ x=8+2*4=16, PO(D)=PO(C)+S y *A′ x =8+S y *A′ x =8+2*4=16,
但是因为实际使用的乘加器阵列的大小为4行4列,该列的最大位置排布编号为12,所以该列与位置0处的乘加器属于同组的乘加器只有位置8处的乘加器。However, because the size of the multiplier-accumulator array actually used is 4 rows and 4 columns, and the maximum position arrangement number of this column is 12, the multiplier-accumulator that belongs to the same group as the multiplier-accumulator at position 0 is only at position 8. the multiplier-adder.
步骤三:基于该乘加器组中的首个目标乘加器在乘加器阵列中的位置、第一位置关系和第二位置关系,确定该乘加器组中除首个目标乘加器外的其他目标乘加器在乘加器阵列中的目标位置。Step 3: Based on the position of the first target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, determine that the multiplier-adder group divides the first target multiplier-adder The target position of other target multiplier-adders in the multiplier-adder array.
示例性的,实际使用的乘加器阵列为4行4列,即A′ y=4、A′ x=4,实际使用的乘加器阵列的位置排布编号为 Exemplarily, the actually used multiplier-adder array is 4 rows and 4 columns, that is, A' y =4, A' x =4, and the positional arrangement number of the actually used multiplier-adder array is:
Figure PCTCN2021115799-appb-000005
Figure PCTCN2021115799-appb-000005
操作步长为2时,S x=2、S y=2,在计算出每个乘加器组的首个目标乘加器在实际使用的乘加器阵列中的位置之后,就可以参照上述第一位置关系和第二位置关系计算该乘加器组中除首个目标乘加器外的其他目标乘加器在实际使用的乘加器阵列中的目标位置。具体的,在计算得到每行首个目标乘加器、每列首个目标乘加器在实际使用的乘加器阵列中的目标位置后,可以基于第一位置关系或第二位置关系计算该乘加器组中其他目标乘加器的目标位置。如图4所示,以第一个乘加器组为例,第一个乘加器组的首个目标乘加器在位置0处,同行同组的下一个乘加器A的位置PO(A)为2,则与乘加器A同一列的下一个乘加器E的位置PO(E)为: When the operation step is 2, S x =2, S y =2, after calculating the position of the first target multiplier-adder of each multiplier-adder group in the actually used multiplier-adder array, you can refer to the above The first position relationship and the second position relationship calculate the target positions of the other target multiplier-adders in the multiplier-adder group except the first target multiplier-adder in the actually used multiplier-adder array. Specifically, after calculating the target positions of the first target multiplier-adder in each row and the first target multiplier-adder in each column in the actually used multiplier-adder array, the target position can be calculated based on the first positional relationship or the second positional relationship. The target position of the other target multipliers in the multiplier-adder group. As shown in Figure 4, taking the first multiplier-adder group as an example, the first target multiplier-adder of the first multiplier-adder group is at position 0, and the position PO( A) is 2, then the position PO(E) of the next multiplier-adder E in the same column as multiplier-adder A is:
PO(E)=2+S y*A′ x=2+2*4=2+8=10; PO(E)=2+S y *A′ x =2+2*4=2+8=10;
或者例如与位置0处的乘加器同列同组的下一个乘加器C的位置PO(C)为8,则与乘加器C同行同组的下一个乘加器E的位置PO(E)为:Or for example, the position PO(C) of the next multiplier-adder C in the same column and group as the multiplier-adder at position 0 is 8, then the position PO(E of the next multiplier-adder E in the same group as the multiplier-adder C is 8. )for:
PO(E)=8+S x=8+2=10。 PO(E)=8+Sx = 8+2=10.
示例性的,如图4所示,本公开提供一种将乘加器阵列分为四个乘加器组的示例图,四种不同的颜色代表四个乘加器组,黑色的第一个乘加器组、白色的第二个乘加器组、浅灰色的第三个乘加器组、以及深灰色的第四个乘加器组;乘加器阵列的同一行中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零,以及乘加器阵列的同一列中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零。Exemplarily, as shown in FIG. 4 , the present disclosure provides an example diagram of dividing the multiplier-adder array into four multiplier-adder groups, four different colors represent the four multiplier-adder groups, and the black first Multiplier-adder group, the second multiplier-adder group in white, the third multiplier-adder group in light gray, and the fourth multiplier-adder group in dark gray; in the same row of the multiplier-adder array, adjacent two In the same group of multiplier-accumulator intervals, the number of non-same-group multiplier-adders is the same and not zero, and in the same column of the multiplier-adder array, two adjacent multiplier-adders in the same group have the same number of non-same-group multiplier-adders. zero.
针对上述S102,不同的乘加器组的卷积处理任务对应的待处理图像不同,例如每个乘加器组分别对不同的数据矩阵进行卷积。For the above S102, the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different, for example, each multiplier-adder group convolves different data matrices respectively.
在利用多个乘加器组中的每个乘加器组,并行执行与每个乘加器组对应的数据处理任务时,根据每个乘加器组中的各个目标乘加器在乘加器阵列中的位置,将与每个乘加器组对应的待处理图像数据存储至与每个乘加器组对应的寄存器阵列中。When using each multiplier-adder group in the multiple multiplier-adder groups to execute the data processing task corresponding to each multiplier-adder group in parallel, according to each target multiplier-adder group in each multiplier-adder group position in the multiplier-adder array, and store the image data to be processed corresponding to each multiplier-adder group in the register array corresponding to each multiplier-adder group.
此处,待处理图像数据例如包括下述至少一种:要处理的原始图像;要处理的原始图像中任一颜色通道对应的子图;对原始图像进行特征提取后得到的特征图;对原始图像进行特征提取后得到的特征图中至少一个通道对应的特征子图;对原始图像中的至少一个颜色通道对应的子图进行数据填充处理后得到的图像数据;对特征图至少一个通道对应的特征子图进行数据填充处理后得到的图像数据。Here, the image data to be processed includes, for example, at least one of the following: the original image to be processed; a sub-image corresponding to any color channel in the original image to be processed; a feature map obtained after feature extraction is performed on the original image; The feature sub-map corresponding to at least one channel in the feature map obtained after feature extraction of the image; the image data obtained after data filling processing is performed on the sub-map corresponding to at least one color channel in the original image; the feature map corresponding to at least one channel The image data obtained after the feature submap performs data filling processing.
以将特征图作为待处理图像数据为例,待处理图像数据被存储至寄存器阵列中时,至少部分寄存器中的每个寄存器中,存储有待处理图像数据中的一个特征点的特征值,也称乘加器所需要的操作数。Taking the feature map as the image data to be processed as an example, when the image data to be processed is stored in the register array, each register in at least some of the registers stores the feature value of a feature point in the image data to be processed, also called The operands required by the multiplier-adder.
针对每个乘加器组,确定该乘加器组的目标乘加器在各自对应寄存器阵列中固定读取的寄存器的位置;如图6A所示,乘加器阵列中包含四个乘加器组,分别对应于图6B中所示的四个寄存器阵列,黑色的乘加器组对应黑色的寄存器阵列a,白色的乘加器组对应白色的寄存器阵列b,浅灰色的乘加器组对应浅灰色的寄存器阵列c,深灰色的乘加器组对应深灰色的寄存器阵列d。目标乘加器PE0读取存储在寄存器A0中的特征值,目标乘加器PE1读取存储在寄存器B0中的特征值,目标乘加器PE2读取存储在寄存器A2中的特征值,目标乘加器PE3读取存储在寄存器B2中的特征值,目标乘加器PE4读取存储在寄存器C0中的特征值,目标乘加器PE5读取存储在寄存器D0中的特征值,目标乘加器PE6读取存储在寄存器C2中的特征值,目标乘加器PE7读取存储在寄存器D2中的特征值,目标乘加器PE8读取存储在寄存器A8中的特征值,PE9读取存储在寄存器B8中的特征值,目标乘加器PE10读取存储在寄存器A10中的特征值,目标乘加器PE11读取存储在寄存器B10中的特征值,目标乘加器PE12读取存储在寄存器C8中的特征值,目标乘加器PE13读取存储在寄存器D8中的特征值,目标乘加器PE14读取存储在寄存器C10中的特征值,目标乘加器PE15读取存储在寄存器D10中的特征值。For each multiplier-adder group, determine the position of the register that the target multiplier-adder of the multiplier-adder group reads fixedly in the respective register arrays; as shown in FIG. 6A , the multiplier-adder array includes four multiplier-adders group, respectively corresponding to the four register arrays shown in FIG. 6B, the black multiplier-adder group corresponds to the black register array a, the white multiplier-adder group corresponds to the white register array b, and the light gray multiplier-adder group corresponds to The light gray register array c, the dark gray multiplier-adder group corresponds to the dark gray register array d. The target multiplier-adder PE0 reads the eigenvalue stored in register A0, the target multiplier-adder PE1 reads the eigenvalue stored in register B0, the target multiplier-adder PE2 reads the eigenvalue stored in register A2, and the target multiplier-adder reads the eigenvalue stored in register A2. The adder PE3 reads the eigenvalue stored in the register B2, the target multiplier-adder PE4 reads the eigenvalue stored in the register C0, the target multiplier-adder PE5 reads the eigenvalue stored in the register D0, and the target multiplier-adder PE6 reads the eigenvalue stored in register C2, target multiplier-adder PE7 reads the eigenvalue stored in register D2, target multiplier-adder PE8 reads the eigenvalue stored in register A8, and PE9 reads the eigenvalue stored in register The eigenvalue in B8, the target multiplier-adder PE10 reads the eigenvalue stored in register A10, the target multiplier-adder PE11 reads the eigenvalue stored in register B10, and the target multiplier-adder PE12 reads the eigenvalue stored in register C8 The target multiplier-adder PE13 reads the characteristic value stored in the register D8, the target multiplier-adder PE14 reads the characteristic value stored in the register C10, and the target multiplier-adder PE15 reads the characteristic value stored in the register D10 value.
针对每个乘加器组,根据该乘加器组中的各个目标乘加器在乘加器阵列中的位置、该乘加器组中的目标乘加器固定读取的寄存器的位置、以及数据处理过程中对待处理图像数据包含的操作数的处理顺序,将与乘加器组对应的待处理图像数据存储至与该乘加器组对应的寄存器阵列中,使得每个数据处理周期,每个目标乘加器固定读取的寄存器的位置存储的操作数,与对应处理周期矩阵操作数中的矩阵元素对应。For each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the position of the register read by the target multiplier-adder in the multiplier-adder group, and In the data processing process, the processing sequence of the operands contained in the image data to be processed is to store the image data to be processed corresponding to the multiplier-adder group in the register array corresponding to the multiplier-adder group, so that each data processing cycle, each The operand stored in the position of the register fixedly read by the target multiplier-accumulator corresponds to the matrix element in the corresponding processing cycle matrix operand.
其中,矩阵操作数例如包括卷积计算时的卷积核,即为一个数据矩阵,示例性的,Wherein, the matrix operand includes, for example, the convolution kernel in the convolution calculation, that is, a data matrix, exemplarily,
W 0 W 1 W 0 W 1
W 3 W 2 W 3 W 2
为本公开提供的一个两行两列的矩阵操作数,包含矩阵元素:W 0、W 1、W 2、W 3。每个乘加器组对应的待处理图像数据包含的操作数的数量应一致。如图6A所示的第一个乘加器组对应的待处理图像数据为 A matrix operand with two rows and two columns provided by the present disclosure includes matrix elements: W 0 , W 1 , W 2 , and W 3 . The number of operands contained in the image data to be processed corresponding to each multiplier-adder group should be consistent. The image data to be processed corresponding to the first multiplier-adder group shown in FIG. 6A is:
Figure PCTCN2021115799-appb-000006
Figure PCTCN2021115799-appb-000006
则该待处理图像数据在第一个乘加器组对应的黑色寄存器阵列a中的存放规则如图6B中左上部分所示,第二个乘加器组对应的待处理图像数据为Then the storage rule of the image data to be processed in the black register array a corresponding to the first multiplier-adder group is shown in the upper left part of FIG. 6B , and the image data to be processed corresponding to the second multiplier-adder group is:
Figure PCTCN2021115799-appb-000007
Figure PCTCN2021115799-appb-000007
则该待处理图像数据在第二个乘加器组对应的白色寄存器阵列b中的存放规则如图6B中右上部分所示,第三个乘加器组对应的待处理图像数据为Then the storage rule of the to-be-processed image data in the white register array b corresponding to the second multiplier-adder group is shown in the upper right part of FIG. 6B , and the to-be-processed image data corresponding to the third multiplier-adder group is:
Figure PCTCN2021115799-appb-000008
Figure PCTCN2021115799-appb-000008
则该待处理图像数据在第三个乘加器组对应的浅灰色寄存器阵列c中的存放规则如图6B中左下部分所示,第四个乘加器组对应的待处理图像数据为Then the storage rule of the to-be-processed image data in the light gray register array c corresponding to the third multiplier-adder group is shown in the lower left part of FIG. 6B , and the to-be-processed image data corresponding to the fourth multiplier-adder group is:
Figure PCTCN2021115799-appb-000009
Figure PCTCN2021115799-appb-000009
则该待处理图像数据在第四个乘加器组对应的深灰色寄存器阵列d中的存放规则如图6B中右下部分所示。Then, the storage rule of the image data to be processed in the dark gray register array d corresponding to the fourth multiplier-adder group is shown in the lower right part of FIG. 6B .
将与每个乘加器组对应的待处理图像数据存储至与每个乘加器组对应的寄存器阵列中后,针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行处理,得到各乘加器组在该数据处理周期内的数据处理结果。After the to-be-processed image data corresponding to each multiplier-adder group is stored in the register array corresponding to each multiplier-adder group, for each data processing cycle in the multiple data processing cycles, data from each multiplier-adder In the register array corresponding to the group, read the image data to be processed corresponding to each multiplier-adder group in the data processing cycle; and process the read image data to be processed to obtain the data of each multiplier-adder group in the data processing cycle. data processing results.
其中,针对对待处理图像数据进行处理的首个数据处理周期,控制各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在首个数据处理周期对应的操作数作为第一操作数;并确定各乘加器组在首个数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在首个数据处理周期的第一操作数与第二操作数的乘积。Among them, for the first data processing cycle in which the image data to be processed is processed, each target multiplier-adder in each multiplier-adder group is controlled, and each target multiplier-adder is read from the fixed read register in the first data processing. The operand corresponding to the cycle is taken as the first operand; and the matrix elements in the matrix operand corresponding to the first data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of the processing cycle.
示例性的,目标乘加器PE0从对应寄存器阵列中固定读取的寄存器A0中读取操作数a0,目标乘加器PE1读取寄存器B0中的操作数b0,其他目标乘加器读取的操作数以此类推,这里不再赘述;假设矩阵操作数为:Exemplarily, the target multiplier-adder PE0 reads the operand a0 from the register A0 that is fixedly read in the corresponding register array, the target multiplier-adder PE1 reads the operand b0 in the register B0, and the other target multiplier-adders read the operand b0. The operands are analogous and will not be repeated here; assuming that the matrix operands are:
Figure PCTCN2021115799-appb-000010
Figure PCTCN2021115799-appb-000010
以乘加器PE0为例,在读取操作数a0后,将a0作为第一操作数,该数据处理周期对应的矩阵元素为W 0,将W 0作为第二操作数,然后计算W 0*a0;并将结果存储在寄存器中。 Taking the multiplier-adder PE0 as an example, after reading the operand a0, take a0 as the first operand, the matrix element corresponding to the data processing cycle is W 0 , and take W 0 as the second operand, and then calculate W 0 * a0; and store the result in a register.
针对对待处理图像数据进行处理的非首个数据处理周期,控制待处理图像数据按照该数据处理周期对应的预设数据移动方式在所述寄存器阵列中移动预设步长;并控制各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在非首个数据处理周期的操作数作为第一操作数;并确定各乘加器组在该数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在该数据处理周期的第一操作数与第二操作数的乘积。For the non-first data processing cycle in which the image data to be processed is processed, control the image data to be processed to move a preset step size in the register array according to the preset data movement mode corresponding to the data processing cycle; and control each multiplier-adder group For each target multiplier-adder in , read the operand of each target multiplier-adder in the non-first data processing cycle from the fixed read register as the first operand; and determine that each multiplier-adder group is in the data processing cycle. The matrix element in the matrix operand corresponding to the cycle is used as the second operand; the product of the first operand and the second operand of each target multiplier-adder in the data processing cycle is determined respectively.
示例性的,以图6A所示的乘加器组对应的第二个数据处理周期为例,预设数据移动方式为向左移动,预设步长为1,如图7所示,为本公开实施例中待处理图像数据在寄存器阵列中整体左移一个步长后的寄存器阵列a的示例图,乘加器PE0从寄存器A0中读取操作数a1,乘加器PE2从寄存器A2中读取操作数a3……,其他乘加器读取的操作数以此类推,不再赘述;矩阵操作数为:Exemplarily, taking the second data processing cycle corresponding to the multiplier-adder group shown in FIG. 6A as an example, the preset data movement mode is to move to the left, and the preset step size is 1, as shown in FIG. An example diagram of the register array a after the image data to be processed is shifted to the left by one step in the register array in the disclosed embodiment, the multiplier-adder PE0 reads the operand a1 from the register A0, and the multiplier-adder PE2 reads the operand a2 from the register A2 Take the operand a3..., and so on for the operands read by other multipliers and adders, and will not repeat them; the matrix operands are:
Figure PCTCN2021115799-appb-000011
Figure PCTCN2021115799-appb-000011
以乘加器PE0为例,在读取操作数a1后,将a1作为第一操作数,该数据处理周期对应的矩阵元素为W 1,将W 1作为第二操作数,然后计算W 1*a1;并将结果存储在寄存器中。 Taking the multiplier-adder PE0 as an example, after reading the operand a1, take a1 as the first operand, the matrix element corresponding to the data processing cycle is W 1 , and take W 1 as the second operand, and then calculate W 1 * a1; and store the result in a register.
同理,在第三个数据处理周期中,待处理图像数据在图7所示位置的基础上,可以整体上移一个步长,此时寄存器A0中存储操作数a5,该数据处理周期对应的矩阵元素为W 2,PE0可以执行W 2*a5的计算;在第四个数据处理周期中,待处理图像数据在第三个数据处理周期完成移动的基础上,可以整体向右移一个步长,此时寄存器A0中存储操作数a4,该数据处理周期对应的矩阵元素为W 3,PE0可以执行W 3*a4的计算,其他PE也是同理,这里不再赘述。 Similarly, in the third data processing cycle, the image data to be processed can be moved up by one step on the basis of the position shown in Figure 7. At this time, the operand a5 is stored in the register A0, and the corresponding data processing cycle The matrix element is W 2 , and PE0 can perform the calculation of W 2 *a5; in the fourth data processing cycle, the image data to be processed can be moved to the right as a whole based on the movement of the third data processing cycle. , the operand a4 is stored in the register A0 at this time, the matrix element corresponding to this data processing cycle is W 3 , PE0 can perform the calculation of W 3 *a4, and the same is true for other PEs, which will not be repeated here.
可见在每个数据处理周期,处理不同待处理图像数据的PE都完成了对应数据处理周期的计算,也就是说不同乘加器组在每个数据处理周期中并行完成了对应数据处理周期中的计算,在所有数据处理周期之后,不同乘加器组同时完成了最终的计算,节省了系统资源。It can be seen that in each data processing cycle, the PEs that process different image data to be processed have completed the calculation of the corresponding data processing cycle, that is to say, different multiplier-adder groups have completed the corresponding data processing cycles in parallel in each data processing cycle. Calculation, after all data processing cycles, different multiplier-adder groups simultaneously complete the final calculation, saving system resources.
此处,针对不同的待处理图像数据,对应的卷积核可以不同,也可以相同。例如,若两个待处理图像数据分别为同一特征图的不同特征子图,则两个待处理图像数据对应的卷积核不同。若两个待处理图像数据为同一个特征子图的不同位置的图像数据,则两个待处理图像数据对应的卷积核相同。Here, for different image data to be processed, the corresponding convolution kernels may be different or the same. For example, if the two image data to be processed are different feature submaps of the same feature map, the convolution kernels corresponding to the two image data to be processed are different. If the two to-be-processed image data are image data of different positions of the same feature submap, the convolution kernels corresponding to the two to-be-processed image data are the same.
在得到各乘加器组在多个数据处理周期内的数据处理结果后,可以根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务。After the data processing results of each multiplier-adder group in multiple data processing cycles are obtained, the data processing corresponding to each multiplier-adder group can be completed according to the data processing results corresponding to each multiplier-adder group in each data processing cycle. Task.
其中,针对每个乘加器组中的每一目标乘加器,将该目标乘加器在各个数据处理周期中得到的乘积相加得到和值;基于每个乘加器组分别包含的各目标乘加器对应的和值,完成各乘加器组分别对应的数据处理任务。Wherein, for each target multiplier-adder in each multiplier-adder group, the products obtained by the target multiplier-adder in each data processing cycle are added to obtain a sum value; The sum value corresponding to the target multiplier-adder completes the data processing task corresponding to each multiplier-adder group.
例如,以图6A所示的目标乘加器PE0为例,PE0在四个数据处理周期进行计算得到的乘积分别是:W 0*a0、W 1*a1、W 2*a5、W 3*a4;将这四个计算结果相加: For example, taking the target multiplier-adder PE0 shown in FIG. 6A as an example, the products calculated by PE0 in four data processing cycles are: W 0 *a0, W 1 *a1, W 2 *a5, W 3 *a4 ; add the four results:
W 0*a0+W 1*a1+W 2*a5+W 3*a4, W 0 *a0+W 1 *a1+W 2 *a5+W 3 *a4,
得到的和值为PE0的结果数值,该结果数值属于第一个乘加器组对应的数据处理任务的处理结果矩阵,在第一个乘加器组对应的数据处理任务的处理结果矩阵中,结果数值排布为:The obtained sum is the result value of PE0, which belongs to the processing result matrix of the data processing task corresponding to the first multiplier-adder group. In the processing result matrix of the data processing task corresponding to the first multiplier-adder group, The resulting numerical arrangement is:
Figure PCTCN2021115799-appb-000012
Figure PCTCN2021115799-appb-000012
这里,若进行卷积的待处理图像数据为特征图,该特征图包括16个通道,每次处理4个通道分别对应的特征子图,也即需要将16个通道分别对应的特征子图划分为4组,每次进行一组特征子图的处理。若4组特征子图分别为:组a、组b、组c和组d,在对组a中包括的4张特征子图执行了处理后,先将乘加器输出的组a对应的4个结果进行累加;在对组b中包括的4张特征子图执行了处理后,再将组b对应的4个结果进行累加,并将组a对应的累加结果和组b对应的累加结果进行累加;在对组c中包括的4张特征子图执行了处理后,再将组c对应的4个结果进行累加,并将组a、组b的累加结果、和组c对应的累加结果进行累加;在对组d中包括的4张特征子图执行了处理后,再将组d对应的4个结果进行累加,并将组a、组b、组c的累加结果、和组d对应的累加结果进行累加,最终,得到16个通道分别对应的卷积结果的累加和。Here, if the image data to be processed by convolution is a feature map, the feature map includes 16 channels, and the feature sub-maps corresponding to 4 channels are processed each time, that is, the feature sub-maps corresponding to the 16 channels need to be divided For 4 groups, one group of feature sub-maps are processed each time. If the four groups of feature sub-maps are: group a, group b, group c, and group d, after processing the four feature sub-maps included in group a, Accumulate the results; after processing the 4 feature sub-maps included in group b, then accumulate the 4 results corresponding to group b, and calculate the accumulated result corresponding to group a and the accumulated result corresponding to group b. Accumulation; after processing the 4 feature sub-maps included in group c, the 4 results corresponding to group c are accumulated, and the accumulated results of group a, group b, and the accumulated results corresponding to group c are accumulated. Accumulation; after processing the 4 feature sub-maps included in group d, the 4 results corresponding to group d are accumulated, and the accumulated results of group a, group b, group c, and the corresponding group d are accumulated. The accumulated results are accumulated, and finally, the accumulated sum of the convolution results corresponding to the 16 channels is obtained.
在对组a中包括的4张特征子图执行了处理后,得到的组a对应的4个输出结果分别为:a1、a2、a3和a4。在对组b中包括的4张特征子图执行了处理后,得到的组b对应的4个输出结果分别为:b1、b2、b3和b4。此时,执行a1+b1=O1,a2+b2=O2,a3+b3=O3,a4+b4=O4。在对组c中包括的4张特征子图执行了处理后,得到的组c对 应的4个输出结果分别为:c1、c2、c3和c4,再执行:O1+c1,O2+c2,O3+c3,O4+c4;以此类推,最终得到a1+b1+c1+d1,a2+b2+c2+d2,a3+b3+c3+d3,a4+b4+c4+d4,然后再将四个结果累加到一起,得到16个通道分别对应的卷积结果的累加和。After processing the 4 feature sub-maps included in the group a, the obtained 4 output results corresponding to the group a are: a1, a2, a3 and a4 respectively. After processing the 4 feature sub-maps included in group b, the obtained 4 output results corresponding to group b are: b1, b2, b3 and b4 respectively. At this time, a1+b1=O1, a2+b2=O2, a3+b3=O3, a4+b4=O4 are executed. After processing the 4 feature submaps included in group c, the obtained 4 output results corresponding to group c are: c1, c2, c3 and c4, and then execute: O1+c1, O2+c2, O3 +c3, O4+c4; and so on, and finally get a1+b1+c1+d1, a2+b2+c2+d2, a3+b3+c3+d3, a4+b4+c4+d4, and then four The results are accumulated together to obtain the accumulated sum of the convolution results corresponding to the 16 channels respectively.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
基于同一发明构思,本公开实施例中还提供了与数据处理方法对应的数据处理装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述数据处理方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides a data processing apparatus corresponding to the data processing method. Reference may be made to the implementation of the method, and repeated descriptions will not be repeated.
参照图8所示,为本公开实施例提供的一种数据处理装置的示意图,所述装置包括控制器801。所述控制器801用于:基于操作步长对所述乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务。Referring to FIG. 8 , which is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure, the apparatus includes a controller 801 . The controller 801 is configured to: group a plurality of multiplier-adders in the multiplier-adder array based on an operation step to obtain a plurality of multiplier-adder groups; use each of the plurality of multiplier-adder groups A multiplier-adder group for executing data processing tasks corresponding to each multiplier-adder group in parallel.
在一种可能的实施方式中,所述乘加器阵列的同一行中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零,以及所述乘加器阵列的同一列中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零。In a possible implementation manner, in the same row of the multiplier-adder array, the interval between two adjacent multiplier-adders in the same group is the same and non-zero in the number of multiplier-adders in the same group, and the multiplier-adder array In the same column of , the interval between two adjacent multiplier-adders of the same group is the same and not zero.
在一种可能的实施方式中,在基于操作步长对所述乘加器阵列中的多个乘加器进行分组时,所述控制器801,具体用于基于所述操作步长确定所述乘加器组的数量;基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组。In a possible implementation manner, when grouping a plurality of multiplier-adders in the multiplier-adder array based on an operation step, the controller 801 is specifically configured to determine the The number of multiplier-adder groups; the multiplier-adders in the multiplier-adder array are grouped based on the number of multiplier-adder groups.
在一种可能的实施方式中,在基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组时,所述控制器801,具体用于从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器;基于所述首个目标乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,从所述乘加器阵列中确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器。In a possible implementation manner, when a plurality of multiplier-adders in the multiplier-adder array are grouped based on the number of the multiplier-adder groups, the controller 801 is specifically configured to extract the multiplier-adder from the multiplier-adder group. determining the first target multiplier-adder in each multiplier-adder group in the adder array; based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the The size of the multiplier-adder array, from which other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group are determined from the multiplier-adder array.
在一种可能的实施方式中,在基于所述首个乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,从所述乘加器阵列中确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器时,所述控制器801,具体用于针对每个乘加器组中除所述首个目标乘加器之外的每个乘加器,基于所述操作步长,确定该乘加器与该乘加器的同行且相邻的前一乘加器在所述乘加器阵列中的第一位置关系;并基于所述操作步长、所述乘加器阵列的列数,确定该乘加器与该乘加器的同列且相邻的前一乘加器在所述乘加器阵列中的第二位置关系;基于该乘加器组首个目标乘加器在所述乘加器阵列中的位置、所述第一位置关系和所述第二位置关系,确定该乘加器在所述乘加器阵列中的目标位置。In a possible implementation, based on the position of the first multiplier-adder in the multiplier-adder array, the operation step size and the size of the multiplier-adder array, the multiplier-adder is obtained from the multiplier-adder array. When other target multiplier-adders except the first target multiplier-adder in each multiplier-adder group are determined in the array, the controller 801 is specifically configured to divide the target multiplier-adder in each multiplier-adder group For each multiplier-adder other than the first target multiplier-adder, based on the operation step size, determine that the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder are in the multiplier-adder array The first positional relationship in the second positional relationship in the adder array; based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, determine the multiplier-adder The target position of the adder in the multiplier-adder array.
在一种可能的实施方式中,在从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器时,所述控制器801,具体用于基于所述操作步长、所述乘加器阵列,确定目标矩阵;根据所述目标矩阵的矩阵元素值,确定每个乘加器组中的首个目标乘加器在所述乘加器阵列中的位置。In a possible implementation manner, when determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array, the controller 801 is specifically configured to, based on the operation The step size, the multiplier-adder array, determine the target matrix; according to the matrix element value of the target matrix, determine the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array.
在一种可能的实施方式中,在利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务时,所述控制器801,具体用于根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述每个乘加器组对应的待处理图像数据存储至与所述每个乘加器组对应的寄存器阵列中;针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理, 得到各乘加器组在该数据处理周期内的数据处理结果;根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务。In a possible implementation manner, when using each multiplier-adder group in the plurality of multiplier-adder groups to perform data processing tasks corresponding to each multiplier-adder group in parallel, the controller 801, which is specifically configured to store the to-be-processed image data corresponding to each multiplier-adder group in the multiplier-adder array according to the position of each target multiplier-adder in each multiplier-adder group. In the register array corresponding to each of the multiplier-adder groups; for each data processing cycle in the multiple data processing cycles, respectively, from the register array corresponding to each multiplier-adder group, read the multipliers of the data processing cycle. The image data to be processed corresponding to the adder group is processed in parallel; the read image data to be processed is processed in parallel to obtain the data processing result of each multiplier-adder group in the data processing cycle; The data processing results corresponding to the groups are completed, and the data processing tasks corresponding to the multiplier-adder groups are completed.
在一种可能的实施方式中,在根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中时,所述控制器801,具体用于针对每个乘加器组,确定该乘加器组的目标乘加器在各自对应寄存器阵列中固定读取的寄存器的位置;针对每个乘加器组,根据该乘加器组中的各个目标乘加器在所述乘加器阵列中的位置、该乘加器组中的目标乘加器固定读取的寄存器的位置、以及数据处理过程中对所述待处理图像数据包含的操作数的处理顺序,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中,使得每个数据处理周期,每个目标乘加器固定读取的寄存器存储的操作数,与对应处理周期矩阵操作数中的矩阵元素对应。In a possible implementation manner, according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, the to-be-processed image corresponding to the multiplier-adder group When the data is stored in the register array corresponding to the multiplier-adder group, the controller 801 is specifically configured to, for each multiplier-adder group, determine that the target multiplier-adder of the multiplier-adder group is in the corresponding register array. Fixed the position of the read register in the multiplier-adder group; for each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, the target multiplier in the multiplier-adder group The position of the register read by the adder, and the processing order of the operands contained in the image data to be processed in the data processing process, and the image data to be processed corresponding to the multiplier-adder group is stored to the multiplier-adder group. In the register array corresponding to the adder group, in each data processing cycle, the operand stored in the register fixedly read by each target multiplier-adder corresponds to the matrix element in the matrix operand corresponding to the processing cycle.
在一种可能的实施方式中,在针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理,得到各乘加器组在该数据处理周期内的数据处理结果时,所述控制器801,具体用于针对对所述待处理图像数据进行处理的首个数据处理周期,控制各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在所述首个数据处理周期对应的操作数作为第一操作数;并确定各乘加器组在所述首个数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在所述首个数据处理周期的第一操作数与第二操作数的乘积;针对对所述待处理图像数据进行处理的非首个数据处理周期,控制所述待处理图像数据按照该数据处理周期对应的预设数据移动方式在所述寄存器阵列中移动预设步长;并控制所述各乘加器组中的各目标乘加器,分别从与各目标乘加器固定读取的寄存器中读取各目标乘加器在所述非首个数据处理周期的操作数作为第一操作数;并确定各乘加器组在该数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;分别确定各目标乘加器在该数据处理周期的第一操作数与第二操作数的乘积。In a possible implementation manner, for each data processing cycle in the multiple data processing cycles, from the register array corresponding to each multiplier-adder group, respectively, the data processing cycle corresponding to each multiplier-adder group is read. Image data to be processed; and parallel processing of the read image data to be processed to obtain the data processing results of each multiplier-adder group within the data processing cycle, the controller 801 is specifically used for processing the to-be-processed image data. In the first data processing cycle of processing image data, control each target multiplier-adder in each multiplier-adder group, and read each target multiplier-adder from the fixed read register corresponding to the first data processing cycle. The operand of each multiplier-adder group is determined as the first operand; and the matrix elements in the matrix operand corresponding to the first data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of each data processing cycle; for the non-first data processing cycle of processing the to-be-processed image data, the to-be-processed image data is controlled according to the preset corresponding to the data processing cycle. Set the data movement mode to move the preset step size in the register array; and control each target multiplier-adder in each multiplier-adder group, respectively read each target from the register fixedly read with each target multiplier-adder The operand of the multiplier-adder in the non-first data processing cycle is taken as the first operand; and the matrix elements in the matrix operands corresponding to the data processing cycle of each multiplier-adder group are determined as the second operand; The product of the first operand and the second operand of each target multiplier-adder in the data processing cycle.
在一种可能的实施方式中,在根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务时,所述控制器801,具体用于针对每个乘加器组中的每一目标乘加器,将该目标乘加器在各个数据处理周期中得到的乘积相加得到和值;基于每个乘加器组分别包含的各目标乘加器对应的和值,完成各乘加器组分别对应的数据处理任务。In a possible implementation manner, when completing the data processing task corresponding to each multiplier-adder group according to the data processing results corresponding to each multiplier-adder group in each data processing cycle, the controller 801 specifically uses For each target multiplier-adder in each multiplier-adder group, the products obtained by the target multiplier-adder in each data processing cycle are added to obtain a sum value; based on each target contained in each multiplier-adder group The sum value corresponding to the multiplier-adder completes the data processing task corresponding to each multiplier-adder group.
在一种可能的实施方式中,所述数据处理任务包括:卷积处理任务;不同的所述乘加器组的卷积处理任务对应的待处理图像不同。In a possible implementation manner, the data processing tasks include: convolution processing tasks; images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.
本公开实施例提供的图像处理装置可以包括芯片、AI芯片等。The image processing apparatus provided by the embodiments of the present disclosure may include a chip, an AI chip, and the like.
本公开实施例还提供了一种计算机设备,如图9所示,为本公开实施例提供的计算机设备结构示意图,包括控制器910和存储器920。所述存储器920存储有控制器910可执行的机器可读指令,控制器910用于执行存储器920中存储的机器可读指令。所述机器可读指令被控制器910执行时,控制器910执行下述步骤:基于操作步长对所述乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务。An embodiment of the present disclosure further provides a computer device. As shown in FIG. 9 , a schematic structural diagram of the computer device provided by the embodiment of the present disclosure includes a controller 910 and a memory 920 . The memory 920 stores machine-readable instructions executable by the controller 910 , and the controller 910 is configured to execute the machine-readable instructions stored in the memory 920 . When the machine-readable instruction is executed by the controller 910, the controller 910 performs the following steps: grouping the multiplier-adders in the multiplier-adder array based on the operation step to obtain a plurality of multiplier-adder groups; Using each of the plurality of multiplier-adder groups, data processing tasks corresponding to each of the multiplier-adder groups are performed in parallel.
上述存储器920包括内存921和外部存储器922;这里的内存921也称内存储器,用于暂时存放控制器910中的运算数据,以及与硬盘等外部存储器922交换的数据,控 制器910通过内存921与外部存储器922进行数据交换。The above-mentioned memory 920 includes a memory 921 and an external memory 922; the memory 921 here is also called an internal memory, which is used to temporarily store the operation data in the controller 910 and the data exchanged with the external memory 922 such as the hard disk. The external memory 922 performs data exchange.
本公开实施例提供的计算机设备可以包括手机等智能终端,或者也可以是具有摄像头并可以进行图像处理的其他设备、服务器等,这里并不限制。The computer device provided by the embodiment of the present disclosure may include a smart terminal such as a mobile phone, or may also be other devices, servers, etc. that have a camera and can perform image processing, which is not limited here.
上述指令的具体执行过程可以参考本公开实施例中所述的数据处理方法的步骤,此处不再赘述。For the specific execution process of the above instruction, reference may be made to the steps of the data processing method described in the embodiments of the present disclosure, and details are not repeated here.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据处理方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program codes, and the instructions included in the program codes can be used to execute the steps of the data processing methods described in the foregoing method embodiments. For details, please refer to the foregoing methods. The embodiments are not repeated here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Wherein, the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化 或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure rather than limit them. The protection scope of the present disclosure is not limited thereto, although referring to the foregoing The embodiments describe the present disclosure in detail, and those skilled in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims (15)

  1. 一种数据处理方法,包括:A data processing method comprising:
    基于操作步长对乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;Group the multiplier-adders in the multiplier-adder array based on the operation step to obtain a plurality of multiplier-adder groups;
    利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务。Using each of the plurality of multiplier-adder groups, data processing tasks corresponding to each of the multiplier-adder groups are performed in parallel.
  2. 根据权利要求1所述的数据处理方法,其特征在于,The data processing method according to claim 1, wherein,
    所述乘加器阵列的同一行中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零,以及In the same row of the multiplier-adder array, the interval between two adjacent multiplier-adders in the same group is the same and non-zero in number, and
    所述乘加器阵列的同一列中,相邻两个同组乘加器间隔非同组乘加器数量相同且不为零。In the same column of the multiplier-adder array, the interval between two adjacent multiplier-adders of the same group is the same and not zero.
  3. 根据权利要求1或2所述的数据处理方法,其特征在于,所述基于操作步长对乘加器阵列中的多个乘加器进行分组,包括:The data processing method according to claim 1 or 2, wherein the grouping of the multiplier-adders in the multiplier-adder array based on the operation step size comprises:
    基于所述操作步长确定所述乘加器组的数量;determining the number of multiplier-adder groups based on the operation step size;
    基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组。The plurality of multiplier-adders in the multiplier-adder array are grouped based on the number of multiplier-adder groups.
  4. 根据权利要求3所述的数据处理方法,其特征在于,所述基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组,包括:The data processing method according to claim 3, wherein the grouping of the multiplier-adders in the multiplier-adder array based on the number of the multiplier-adder groups comprises:
    从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器;determining a first target multiplier-adder in each multiplier-adder group from the multiplier-adder array;
    基于所述首个目标乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器在所述乘加器阵列中的位置。Based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the size of the multiplier-adder array, it is determined to divide the first multiplier-adder group in each multiplier-adder group The position of the target multiplier-adder other than the target multiplier-adder in the multiplier-adder array.
  5. 根据权利要求4所述的数据处理方法,其特征在于,所述基于所述首个目标乘加器在所述乘加器阵列中的位置、所述操作步长以及所述乘加器阵列的尺寸,确定所述每个乘加器组中除所述首个目标乘加器外的其他目标乘加器在所述乘加器阵列中的位置,包括:The data processing method according to claim 4, wherein the multiplier-adder is based on the position of the first target multiplier-adder in the multiplier-adder array, the operation step size, and the value of the multiplier-adder array. size, to determine the positions of other target multiplier-adders in each multiplier-adder group except the first target multiplier-adder in the multiplier-adder array, including:
    针对每个乘加器组中除所述首个目标乘加器之外的每个乘加器,For each multiplier-adder in each multiplier-adder group except the first target multiplier-adder,
    基于所述操作步长,确定该乘加器与该乘加器的同行且相邻的前一乘加器在所述乘加器阵列中的第一位置关系;并Based on the operation step size, determining the first positional relationship between the multiplier-adder and the previous multiplier-adder that is in the same row and adjacent to the multiplier-adder in the multiplier-adder array; and
    基于所述操作步长、所述乘加器阵列的列数,确定该乘加器与该乘加器的同列且相邻的前一乘加器在所述乘加器阵列中的第二位置关系;Based on the operation step size and the number of columns of the multiplier-adder array, determine the second position of the multiplier-adder and the previous multiplier-adder in the same column and adjacent to the multiplier-adder array in the multiplier-adder array relation;
    基于该乘加器组首个目标乘加器在所述乘加器阵列中的位置、所述第一位置关系和所述第二位置关系,确定该乘加器在所述乘加器阵列中的目标位置。Based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array, the first positional relationship and the second positional relationship, it is determined that the multiplier-adder is in the multiplier-adder array target location.
  6. 根据权利要求4或5所述的数据处理方法,其特征在于,所述从所述乘加器阵列中确定所述每个乘加器组中的首个目标乘加器,包括:The data processing method according to claim 4 or 5, wherein the determining the first target multiplier-adder in each multiplier-adder group from the multiplier-adder array comprises:
    基于所述操作步长、所述乘加器阵列,确定目标矩阵;Determine a target matrix based on the operation step size and the multiplier-adder array;
    根据所述目标矩阵的矩阵元素值,确定每个乘加器组中的首个目标乘加器在所述乘加器阵列中的位置。According to the matrix element value of the target matrix, the position of the first target multiplier-adder in each multiplier-adder group in the multiplier-adder array is determined.
  7. 根据权利要求1至6任一项所述的数据处理方法,其特征在于,所述利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务,包括:The data processing method according to any one of claims 1 to 6, characterized in that, by using each multiplier-adder group in the plurality of multiplier-adder groups, executing in parallel with each multiplier-adder group Data processing tasks corresponding to the group, including:
    根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述每个乘加器组对应的待处理图像数据存储至与所述每个乘加器组对应的寄存器阵列中;According to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array, the to-be-processed image data corresponding to each multiplier-adder group is stored in the In the register array corresponding to the multiplier-adder group;
    针对多个数据处理周期中的每个数据处理周期,For each data processing cycle of the plurality of data processing cycles,
    分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并Read the image data to be processed corresponding to each multiplier-adder group in the data processing cycle from the register array corresponding to each multiplier-adder group; and
    对读取的待处理图像数据进行处理,得到各乘加器组在该数据处理周期内的数据处理结果;Process the read image data to be processed to obtain the data processing results of each multiplier-adder group within the data processing cycle;
    根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务。According to the data processing results corresponding to the multiplier-adder groups in each data processing cycle, the data processing tasks corresponding to the multiplier-adder groups are completed.
  8. 根据权利要求7所述的数据处理方法,其特征在于,所述根据所述每个乘加器组中的各个目标乘加器在所述乘加器阵列中的位置,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中,包括:The data processing method according to claim 7, wherein, according to the position of each target multiplier-adder in each multiplier-adder group in the multiplier-adder array The to-be-processed image data corresponding to the multiplier group is stored in the register array corresponding to the multiplier-adder group, including:
    针对每个乘加器组,确定该乘加器组的目标乘加器在各自对应寄存器阵列中固定读取的寄存器的位置;For each multiplier-adder group, determine the position of the register that is fixedly read by the target multiplier-adder of the multiplier-adder group in the respective corresponding register arrays;
    针对每个乘加器组,根据该乘加器组中的各个目标乘加器在所述乘加器阵列中的位置、该乘加器组中的目标乘加器固定读取的寄存器的位置、以及数据处理过程中对所述待处理图像数据包含的操作数的处理顺序,将与所述乘加器组对应的待处理图像数据存储至与所述乘加器组对应的寄存器阵列中,使得每个数据处理周期,每个目标乘加器固定读取的寄存器存储的操作数,与对应数据处理周期矩阵操作数中的矩阵元素对应。For each multiplier-adder group, according to the position of each target multiplier-adder in the multiplier-adder group in the multiplier-adder array, and the position of the register read by the target multiplier-adder in the multiplier-adder group , and the processing sequence of the operands contained in the image data to be processed in the data processing process, the image data to be processed corresponding to the multiplier-adder group is stored in the register array corresponding to the multiplier-adder group, In each data processing cycle, the operand stored in the fixed read register of each target multiplier-adder corresponds to the matrix element in the matrix operand of the corresponding data processing cycle.
  9. 根据权利要求7或8所述的数据处理方法,其特征在于,所述针对多个数据处理周期中的每个数据处理周期,分别从各乘加器组对应的寄存器阵列中,读取该数据处理周期各乘加器组对应的待处理图像数据;并对读取的待处理图像数据进行并行处理,得到各乘加器组在该数据处理周期内的数据处理结果,包括:The data processing method according to claim 7 or 8, wherein the data is read from the register array corresponding to each multiplier-adder group for each data processing cycle in the plurality of data processing cycles. Processing the image data to be processed corresponding to each multiplier-adder group in the processing period; and performing parallel processing on the read image data to be processed to obtain the data processing results of each multiplier-adder group in the data processing period, including:
    针对对所述待处理图像数据进行处理的首个数据处理周期,For the first data processing cycle of processing the to-be-processed image data,
    控制各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在所述首个数据处理周期对应的操作数作为第一操作数;Controlling each target multiplier-adder in each multiplier-adder group, respectively reading the operand corresponding to each target multiplier-adder in the first data processing cycle from the fixed read register as the first operand;
    确定各乘加器组在所述首个数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;Determine the matrix elements in the matrix operands corresponding to the first data processing cycle of each multiplier-adder group as the second operand;
    确定各目标乘加器在所述首个数据处理周期的第一操作数与第二操作数的乘积;determining the product of the first operand and the second operand of each target multiplier-adder in the first data processing cycle;
    针对对所述待处理图像数据进行处理的非首个数据处理周期,For a non-first data processing cycle for processing the image data to be processed,
    控制所述待处理图像数据按照该数据处理周期对应的预设数据移动方式在所述寄存器阵列中移动预设步长;controlling the to-be-processed image data to move a preset step size in the register array according to a preset data movement mode corresponding to the data processing period;
    控制所述各乘加器组中的各目标乘加器,分别从固定读取的寄存器中读取各目标乘加器在所述非首个数据处理周期的操作数作为第一操作数;Controlling each target multiplier-adder in each of the multiplier-adder groups, respectively reading the operand of each target multiplier-adder in the non-first data processing cycle from the fixed read register as the first operand;
    确定各乘加器组在该数据处理周期对应的矩阵操作数中的矩阵元素作为第二操作数;Determine the matrix element in the matrix operand corresponding to the data processing cycle of each multiplier-adder group as the second operand;
    确定各目标乘加器在该数据处理周期的第一操作数与第二操作数的乘积。Determine the product of the first operand and the second operand of each target multiplier-adder in the data processing cycle.
  10. 根据权利要求7至9任一项所述的数据处理方法,其特征在于,所述根据每个数据处理周期各乘加器组分别对应的数据处理结果,完成各乘加器组分别对应的数据处理任务,包括:The data processing method according to any one of claims 7 to 9, wherein, according to the data processing results corresponding to each multiplier-adder group in each data processing cycle, the data corresponding to each multiplier-adder group is completed. Processing tasks, including:
    针对每个乘加器组中的每一目标乘加器,将该目标乘加器在各个数据处理周期中得到的乘积相加得到和值;For each target multiplier-adder in each multiplier-adder group, add the products obtained by the target multiplier-adder in each data processing cycle to obtain a sum value;
    基于每个乘加器组分别包含的各目标乘加器对应的和值,完成各乘加器组分别对应的数据处理任务。Based on the sum value corresponding to each target multiplier-adder included in each multiplier-adder group, the data processing task corresponding to each multiplier-adder group is completed.
  11. 根据权利要求1至10任一项所述的数据处理方法,其特征在于,The data processing method according to any one of claims 1 to 10, characterized in that:
    所述数据处理任务包括卷积处理任务;The data processing tasks include convolution processing tasks;
    不同的所述乘加器组的卷积处理任务对应的待处理图像不同。The images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
  12. 一种数据处理装置,其特征在于,包括:控制器;所述控制器用于:A data processing device, comprising: a controller; the controller is used for:
    基于操作步长对乘加器阵列中的多个乘加器进行分组,得到多个乘加器组;Group the multiplier-adders in the multiplier-adder array based on the operation step to obtain a plurality of multiplier-adder groups;
    利用所述多个乘加器组中的每个乘加器组,并行执行与所述每个乘加器组对应的数据处理任务。Using each of the plurality of multiplier-adder groups, data processing tasks corresponding to each of the multiplier-adder groups are performed in parallel.
  13. 根据权利要求12所述的数据处理装置,其特征在于,在基于操作步长对乘加器阵列中的多个乘加器进行分组时,所述控制器,具体用于基于所述操作步长确定所述 乘加器组的数量;基于所述乘加器组的数量对所述乘加器阵列中的多个乘加器进行分组。The data processing apparatus according to claim 12, wherein, when the multiplier-adders in the multiplier-adder array are grouped based on the operation step size, the controller is specifically configured to, based on the operation step size determining the number of multiplier-adder groups; grouping the plurality of multiplier-adders in the multiplier-adder array based on the number of multiplier-adder groups.
  14. 一种计算机设备,其特征在于,包括:控制器、存储器,所述存储器存储有所述控制器可执行的机器可读指令,所述控制器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述控制器执行时,所述控制器执行如权利要求1至11任一项所述的数据处理方法的步骤。A computer device, comprising: a controller and a memory, wherein the memory stores machine-readable instructions executable by the controller, and the controller is configured to execute the machine-readable instructions stored in the memory, When the machine-readable instructions are executed by the controller, the controller performs the steps of the data processing method of any one of claims 1 to 11.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被计算机设备运行时,所述计算机设备执行如权利要求1至11任一项所述的数据处理方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a computer device, the computer device executes any one of claims 1 to 11. The steps of the data processing method described above.
PCT/CN2021/115799 2021-01-31 2021-08-31 Data processing method and apparatus, computer device, and storage medium WO2022160706A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110132573.X 2021-01-31
CN202110132573.XA CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022160706A1 true WO2022160706A1 (en) 2022-08-04

Family

ID=76169016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115799 WO2022160706A1 (en) 2021-01-31 2021-08-31 Data processing method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112927125B (en)
WO (1) WO2022160706A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927125B (en) * 2021-01-31 2023-06-23 成都商汤科技有限公司 Data processing method, device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205191A (en) * 2014-06-12 2015-12-30 济南概伦电子科技有限公司 Multi-rate parallel circuit simulation
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110659446A (en) * 2018-06-29 2020-01-07 合一智芯科技(北京)有限公司 Convolution operation control method, device, medium and program product
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
US20200371798A1 (en) * 2013-07-15 2020-11-26 Texas Instruments Incorporated Method and Apparatus for Vector Based Matrix Multiplication
CN112927125A (en) * 2021-01-31 2021-06-08 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827107A (en) * 2010-05-11 2010-09-08 南京大学 IEEE802.1AE protocol-based GCM high-speed encryption and decryption equipment
CN104915322B (en) * 2015-06-09 2018-05-01 中国人民解放军国防科学技术大学 A kind of hardware-accelerated method of convolutional neural networks
US10108397B2 (en) * 2015-08-25 2018-10-23 Samsung Electronics Co., Ltd. Fast close path solution for a three-path fused multiply-add design
CN107301455B (en) * 2017-05-05 2020-11-03 中国科学院计算技术研究所 Hybrid cube storage system for convolutional neural network and accelerated computing method
CN110796244B (en) * 2018-08-01 2022-11-08 上海天数智芯半导体有限公司 Core computing unit processor for artificial intelligence device and accelerated processing method
CN109284782B (en) * 2018-09-13 2020-10-02 北京地平线机器人技术研发有限公司 Method and apparatus for detecting features
US11544535B2 (en) * 2019-03-08 2023-01-03 Adobe Inc. Graph convolutional networks with motif-based attention
CN110058883B (en) * 2019-03-14 2023-06-16 梁磊 CNN acceleration method and system based on OPU
CN111581595B (en) * 2020-04-24 2024-02-13 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200371798A1 (en) * 2013-07-15 2020-11-26 Texas Instruments Incorporated Method and Apparatus for Vector Based Matrix Multiplication
CN105205191A (en) * 2014-06-12 2015-12-30 济南概伦电子科技有限公司 Multi-rate parallel circuit simulation
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN110659446A (en) * 2018-06-29 2020-01-07 合一智芯科技(北京)有限公司 Convolution operation control method, device, medium and program product
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN112927125A (en) * 2021-01-31 2021-06-08 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112927125B (en) 2023-06-23
CN112927125A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
JP7431913B2 (en) Efficient data layout for convolutional neural networks
US10943167B1 (en) Restructuring a multi-dimensional array
JP6821002B2 (en) Processing equipment and processing method
JP6857286B2 (en) Improved performance of neural network arrays
US11449576B2 (en) Convolution operation processing method and related product
Lai et al. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus
JP7358382B2 (en) Accelerators and systems for accelerating calculations
TW201824096A (en) Adaptive execution engine for convolution computing systems cross-reference to related applications
EP3093757B1 (en) Multi-dimensional sliding window operation for a vector processor
US10642622B2 (en) Arithmetic processing device and control method of the arithmetic processing device
WO2022160704A1 (en) Image processing method and apparatus, computer device and storage medium
CN110796236B (en) Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
JP2023109847A (en) Image transformation for machine learning
CN114092336B (en) Image scaling method, device, equipment and medium based on bilinear interpolation algorithm
WO2022121474A1 (en) Method and system for optimizing convolutional residual structure of neural network, device, and medium
WO2022160706A1 (en) Data processing method and apparatus, computer device, and storage medium
JP7174831B2 (en) Video memory processing method, apparatus and recording medium based on convolutional neural network
CN109447239B (en) Embedded convolutional neural network acceleration method based on ARM
KR20220134035A (en) Processing-in-memory method for convolution operations
CN111931937B (en) Gradient updating method, device and system of image processing model
CN112668709B (en) Computing device and method for data reuse
JP2020191012A (en) Image processing apparatus, imaging apparatus, and image processing method
CN113657587B (en) Deformable convolution acceleration method and device based on FPGA
US11915338B2 (en) Loading apparatus and method for convolution with stride or dilation of 2
KR102372869B1 (en) Matrix operator and matrix operation method for artificial neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922301

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922301

Country of ref document: EP

Kind code of ref document: A1