CN112927125B - Data processing method, device, computer equipment and storage medium - Google Patents

Data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN112927125B
CN112927125B CN202110132573.XA CN202110132573A CN112927125B CN 112927125 B CN112927125 B CN 112927125B CN 202110132573 A CN202110132573 A CN 202110132573A CN 112927125 B CN112927125 B CN 112927125B
Authority
CN
China
Prior art keywords
multiply
add
data processing
adder
multiplier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110132573.XA
Other languages
Chinese (zh)
Other versions
CN112927125A (en
Inventor
周军
周亮
常亮
王文强
吴飞
徐宁仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Original Assignee
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, Chengdu Sensetime Technology Co Ltd filed Critical University of Electronic Science and Technology of China
Priority to CN202110132573.XA priority Critical patent/CN112927125B/en
Publication of CN112927125A publication Critical patent/CN112927125A/en
Priority to PCT/CN2021/115799 priority patent/WO2022160706A1/en
Application granted granted Critical
Publication of CN112927125B publication Critical patent/CN112927125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The present disclosure provides a data processing method, apparatus, computer device, and storage medium, where the method includes: grouping a plurality of multiply-add devices in the multiply-add device array based on the matrix operand operation step length to obtain at least one multiply-add device group; and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group. The method and the device can enable the multiply-add device array to process a plurality of data processing tasks simultaneously, and improve the processing efficiency of the multiply-add device array on the data processing tasks. In addition, the multiplier-adder array is grouped based on the operand operation step length, so that the multiplier-adder which is invalid to the processing result of one data processing task is effective to the processing result of the other data processing task, the utilization rate of the multiplier-adder array is improved, and the waste of calculation resources is reduced.

Description

Data processing method, device, computer equipment and storage medium
Technical Field
The disclosure relates to the technical field of computers, and in particular relates to a data processing method, a data processing device, computer equipment and a storage medium.
Background
At present, a convolutional neural network mainly relies on a multiplier-adder array to carry out convolutional processing, wherein the multiplier-adder array stores image data to be processed in a data processing task in a corresponding register array, and the image data to be processed moves in the register array in different data processing periods; however, the current data processing method has the problems of low utilization rate of the multiplier-adder array and waste of calculation resources.
Disclosure of Invention
The embodiment of the disclosure at least provides a data processing method, a data processing device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including:
grouping a plurality of multiply-add devices in the multiply-add device array based on the matrix operand operation step length to obtain at least one multiply-add device group; and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
Therefore, based on grouping the multiply-add arrays, the multiply-add arrays can process a plurality of data processing tasks at the same time, and the processing efficiency of the multiply-add arrays on the data processing tasks is improved. In addition, the multiplier-adder array is grouped based on the operand operation step length, so that the multiplier-adder which is invalid to the processing result of one data processing task is effective to the processing result of the other data processing task, the utilization rate of the multiplier-adder array is improved, and the waste of calculation resources is reduced.
In one possible implementation, in the same row of the multiply-add array, two adjacent same-group multiply-add devices are spaced apart by the same non-same-group multiply-add device and are not zero, and in the same column of the multiply-add array, two adjacent same-group multiply-add devices are spaced apart by the same non-same-group multiply-add device and are not zero.
Therefore, based on the grouping condition of the multiplier-adder array, each multiplier-adder group can be guaranteed to process different data processing tasks, so that the multiplier-adder array can process a plurality of data processing tasks at the same time, and the processing efficiency of the multiplier-adder array on the data processing tasks is improved.
In one possible implementation, the grouping the plurality of multiply-add devices in the multiply-add device array based on the matrix operand operation step size includes: determining a number of the multiply-add groups based on the matrix operand operation steps; a plurality of multiply-add devices in the multiply-add device array are grouped based on the number of multiply-add device groups.
Therefore, the processing result of each multiply-add device group of the multiply-add device array on the data task of the multiply-add device group is ensured to be effective, so that the multiply-add device array can process a plurality of data processing tasks at the same time, and the processing efficiency of the multiply-add device array on the data processing tasks is improved.
In one possible implementation, the grouping the plurality of multiply-add devices in the multiply-add device array based on the number of multiply-add device groups includes: determining a first target multiply-adder in each multiply-adder group from the multiply-adder array; and determining other target multiply-add devices in each multiply-add device group except the first target multiply-add device from the multiply-add device array based on the position of the first multiply-add device in the multiply-add device array, the matrix operand operation step size and the size of the multiply-add device array.
Therefore, after the position of the first multiplier of each multiplier-adder group in the multiplier-adder array is determined, the positions of other target multipliers except the first target multiplier-adder in each multiplier-adder group can be determined in the multiplier-adder array based on the position of the first multiplier-adder of each multiplier-adder group in the multiplier-adder array, and the grouping efficiency of the multiplier-adder array is improved.
In one possible implementation, the determining, from the multiply-add array, other target multiply-add devices in each multiply-add device group than the first target multiply-add device based on the position of the first multiply-add device in the multiply-add array, the matrix operand operation step size, and the size of the multiply-add array includes: determining, for each multiply-add bank, a first positional relationship between each multiply-add device of each row in the multiply-add bank except for the first multiply-add device of the row and an adjacent previous multiply-add device of the multiply-add device in the multiply-add array based on the position of the first multiply-add device of the multiply-add bank in the multiply-add array and the matrix operand operation step size; determining a second positional relationship between each multiplier and an adjacent previous multiplier of the multiplier and the multiplier in the multiplier-adder array according to the position of the first multiplier in the multiplier-adder group in the multiplier-adder array, the operation step length of the matrix operand and the column number of the multiplier-adder array; and determining target positions of other target multiply-add devices except the first target multiply-add device in the multiply-add device group in the multiply-add device array based on the first position relation and/or the second position relation.
In one possible implementation, the determining the first target multiply-adder in each multiply-adder group from the multiply-adder array includes: determining a target matrix based on the matrix operand operation step size and the size of the multiply-add array; and determining the position of the first target multiply-adder in each multiply-adder group in the multiply-adder array according to the matrix element values of the target matrix.
In a possible implementation manner, the performing, with each of the at least one multiplier-adder groups, a data processing task corresponding to each of the multiplier-adder groups includes: storing the image data to be processed corresponding to each multiply-add device group into a register array corresponding to each multiply-add device group according to the position of each target multiply-add device in each multiply-add device group in the multiply-add device array; for each data processing period in a plurality of data processing periods, respectively reading image data to be processed corresponding to each multiply-add device group in the data processing period from a register array corresponding to each multiply-add device group; processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel; and according to the data processing results of each multiplier-adder group in each data processing period, completing the data processing tasks of each multiplier-adder group.
In this way, the multiplier-adder array can read the corresponding operand in different data processing periods, so that each multiplier-adder group can process the corresponding data processing task, and the effectiveness of the multiplier-adder array on the processing result of the data processing task is guaranteed.
In one possible implementation manner, according to the position of each target multiply-adder in each multiply-adder group in the multiply-adder array, storing the image data to be processed corresponding to the multiply-adder group into a register array corresponding to the multiply-adder group, including: determining the number of registers contained in the register array corresponding to each multiply-add device according to the size of the matrix operand; for each multiply-add bank, determining the position of the target multiply-add of the multiply-add bank in the corresponding register array for fixing the read register; for each multiply-add group, storing the image data to be processed corresponding to the multiply-add group into the register array corresponding to the multiply-add group according to the position of each target multiply-add in the multiply-add group in the multiply-add array, the position of the register fixedly read by each target multiply-add in the multiply-add group, and the processing sequence of the operands contained in the image data to be processed in the data processing process, so that each data processing period, the operands stored by the position of the register fixedly read by each target multiply-add group correspond to matrix elements in the matrix operands of the corresponding data processing period.
In one possible implementation manner, for each data processing period in the plurality of data processing periods, the to-be-processed image data corresponding to each multiply-add device group in the data processing period is read from the register array corresponding to each multiply-add device group; processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel, wherein the data processing results comprise: controlling each target multiply-add device in each multiply-add device group aiming at the first data processing period for processing the image data to be processed, and respectively reading an operand corresponding to each target multiply-add device in the first data processing period from a register fixedly read by each target multiply-add device as a first operand; and determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as a second operand; determining the product of the first operand and the second operand of each target multiply-add device in the first data processing period respectively; aiming at a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiply-add device in each multiply-add device group, and respectively reading the operands of each target multiply-add device in the non-first data processing period from the registers fixedly read with each target multiply-add device as a first operand; and determining matrix elements in matrix operands corresponding to the data processing period of each multiplier and adder group as a second operand; the product of the first operand and the second operand of each target multiply-adder in the data processing period is determined separately.
Therefore, based on the preset step length and the preset data movement mode, the operands are sequentially displaced along with the transformation of the data processing period in the register array, so that the corresponding multiplier-adder in the multiplier-adder array can acquire effective data, and the effectiveness of the processing result of the data processing task is ensured.
In one possible implementation manner, the completing the data processing tasks corresponding to the multiplier-adder groups according to the data processing results corresponding to the multiplier-adder groups in each data processing period includes: for each target multiply-adder in each multiply-adder group, adding products obtained by the target multiply-adder in each data processing period to obtain a sum; and completing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders contained in the multiplier-adder groups.
In one possible implementation, the data processing task includes: a convolution processing task; the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
Therefore, the multiplier-adder array can process a plurality of images to be processed at the same time, and the processing efficiency of the multiplier-adder array on the images to be processed is improved.
In a second aspect, an embodiment of the present disclosure further provides a data processing apparatus, including: a controller; the controller is used for:
grouping a plurality of multiply-add devices in the multiply-add device array based on the matrix operand operation step length to obtain at least one multiply-add device group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
In one possible implementation, in the same row of the multiply-add array, two adjacent same-group multiply-add devices are spaced apart by the same non-same-group multiply-add device and are not zero, and in the same column of the multiply-add array, two adjacent same-group multiply-add devices are spaced apart by the same non-same-group multiply-add device and are not zero.
In one possible implementation, the controller is specifically configured to determine the number of multiply-add groups based on a matrix operand operation step size when grouping a plurality of multiply-add devices in a multiply-add array based on the matrix operand operation step size; a plurality of multiply-add devices in the multiply-add device array are grouped based on the number of multiply-add device groups.
In one possible implementation, the controller is specifically configured to determine a first target multiply-add device in each of the multiply-add groups from the multiply-add array when grouping a plurality of multiply-add devices in the multiply-add array based on the number of multiply-add groups; and determining other target multiply-add devices in each multiply-add device group except the first target multiply-add device from the multiply-add device array based on the position of the first multiply-add device in the multiply-add device array, the matrix operand operation step size and the size of the multiply-add device array.
In one possible implementation, when determining, from the multiply-and-add array, other target multiply-and-add devices in each multiply-and-add device group than the first target multiply-and-add device based on the position of the first multiply-and-add device in the multiply-and-add array, the matrix operand operation step size, the controller is specifically configured to determine, for each multiply-and-add device group, a first positional relationship between each multiply-and-add device in each row in the multiply-and-add device group except the first multiply-and-add device in the row and an adjacent previous multiply-and-add device in the multiply-and-add device array based on the position of the first multiply-and-add device in the multiply-and-add device group, the matrix operand operation step size; determining a second positional relationship between each multiplier and an adjacent previous multiplier of the multiplier and the multiplier in the multiplier-adder array according to the position of the first multiplier in the multiplier-adder group in the multiplier-adder array, the operation step length of the matrix operand and the column number of the multiplier-adder array; and determining target positions of other target multiply-add devices except the first target multiply-add device in the multiply-add device group in the multiply-add device array based on the first position relation and/or the second position relation.
In one possible implementation, in determining a first target multiply-add in each of the multiply-add groups from the multiply-add array, the controller is configured to determine a target matrix based in particular on the matrix operand operation step size, the size of the multiply-add array; and determining the position of the first target multiply-adder in each multiply-adder group in the multiply-adder array according to the matrix element values of the target matrix.
In a possible implementation manner, when each of the at least one multiply-add device group is utilized to perform a data processing task corresponding to each of the multiply-add device groups, the controller is specifically configured to store image data to be processed corresponding to each of the multiply-add device groups into a register array corresponding to each of the multiply-add device groups according to a position of each target multiply-add device in each of the multiply-add device groups in the multiply-add device array; for each data processing period in a plurality of data processing periods, respectively reading image data to be processed corresponding to each multiply-add device group in the data processing period from a register array corresponding to each multiply-add device group; processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel; and according to the data processing results of each multiplier-adder group in each data processing period, completing the data processing tasks of each multiplier-adder group.
In a possible implementation manner, when storing the image data to be processed corresponding to each multiply-add device group into a register array corresponding to the multiply-add device group according to the position of each target multiply-add device in each multiply-add device group in the multiply-add device array, the controller is specifically configured to determine the number of registers included in each multiply-add device corresponding register array according to the size of a matrix operand; for each multiply-add bank, determining the position of the target multiply-add of the multiply-add bank in the corresponding register array for fixing the read register; for each multiply-add group, storing the image data to be processed corresponding to the multiply-add group into the register array corresponding to the multiply-add group according to the position of each target multiply-add in the multiply-add group in the multiply-add array, the position of the register fixedly read by each target multiply-add in the multiply-add group, and the processing sequence of the operands contained in the image data to be processed in the data processing process, so that each data processing period, the operands stored by the position of the register fixedly read by each target multiply-add group correspond to matrix elements in the matrix operands of the corresponding data processing period.
In one possible implementation, in each of a plurality of data processing cycles, the image data to be processed corresponding to each of the multiply-add groups in the data processing cycle is read from the register array corresponding to each of the multiply-add groups; when the read image data to be processed is processed and the data processing result of each multiply-add device group in the data processing period is obtained in parallel, the controller is specifically configured to control each target multiply-add device in each multiply-add device group for the first data processing period for processing the image data to be processed, and respectively read an operand corresponding to each target multiply-add device in the first data processing period from a register fixedly read by each target multiply-add device as a first operand; and determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as a second operand; determining the product of the first operand and the second operand of each target multiply-add device in the first data processing period respectively; aiming at a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiply-add device in each multiply-add device group, and respectively reading the operands of each target multiply-add device in the non-first data processing period from the registers fixedly read with each target multiply-add device as a first operand; and determining matrix elements in matrix operands corresponding to the data processing period of each multiplier and adder group as a second operand; the product of the first operand and the second operand of each target multiply-adder in the data processing period is determined separately.
In a possible implementation manner, when the corresponding data processing results are respectively received by the multiplier-adder groups according to each data processing period, the controller is specifically configured to add products obtained by the target multiplier-adder in each data processing period for each target multiplier-adder in each multiplier-adder group to obtain a sum; and completing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders contained in the multiplier-adder groups.
In one possible implementation, the data processing task includes: a convolution processing task; the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
In a third aspect, an optional implementation manner of the disclosure further provides a computer device, a controller, and a memory, where the memory stores machine-readable instructions executable by the controller, and the controller is configured to execute the machine-readable instructions stored in the memory, where the machine-readable instructions, when executed by the controller, perform the steps in the first aspect, or any possible implementation manner of the first aspect, when executed by the controller.
In a fourth aspect, an alternative implementation of the present disclosure further provides a computer readable storage medium having stored thereon a computer program which when executed performs the steps of the first aspect, or any of the possible implementation manners of the first aspect.
The description of the effects of the data processing apparatus, the computer device, and the computer-readable storage medium refers to the description of the data processing method, and is not repeated here.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a data processing method provided by an embodiment of the present disclosure;
FIG. 2 illustrates an exemplary diagram of a multiply-add array provided by embodiments of the present disclosure;
FIG. 3 illustrates an example diagram of a matrix operand based operation step size movement provided by an embodiment of the present disclosure;
FIG. 4 illustrates an exemplary diagram of a division of a multiply-add array into four multiply-add groups provided by the present disclosure;
FIG. 5 illustrates an exemplary diagram of a matrix for determining the location of a first target multiply-add in an array of multiply-add devices in each multiply-add device group provided by embodiments of the present disclosure;
FIG. 6 illustrates an example diagram of a multiplier-adder array and corresponding register array provided by an embodiment of the disclosure;
FIG. 7 illustrates an example diagram of a register array a after the image data to be processed is shifted left one step in its entirety in an embodiment of the present disclosure;
FIG. 8 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
fig. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It has been found that convolutional neural networks rely primarily on multiply-add arrays for convolutional processing. When convolution processing is carried out, the image data to be processed is stored in a register array connected with the multiplier-adder array; the image data to be processed stored in the register array is moved in the register array at different data processing cycles; the multiply-and-adder array reads the operands of the data processing cycles from the registers (belonging to the register array) connected to the multiply-and-adder for each data processing cycle and performs multiplication and/or addition operations. After being processed in a plurality of data processing cycles, the multiplier-adder array outputs partial results of convolution processing of the image data to be processed. In the case where the operation steps of the matrix operands are greater than 1, the processing results of part of the multiply-add devices in the multiply-add device array are not required in the result of processing the image data to be processed, and therefore, there are problems in that the utilization rate of the multiply-add device array is low and the computational resource is wasted in the data processing manner in this case.
Based on the above study, the disclosure provides a data processing method, apparatus, computer device and storage medium, where at least one multiply-add device group is obtained by grouping multiply-add device arrays based on operation steps of matrix operands, so that different multiply-add device groups in the multiply-add device arrays can respectively process data processing tasks corresponding to different image data to be processed in parallel, that is, the same multiply-add device array can simultaneously process a plurality of image data to be processed, each multiply-add device group processes one image data to be processed, so that multiply-add devices which are not used in a process of processing one image data to be processed are used for processing other image data to be processed, thereby improving utilization rate of the multiply-add device arrays, reducing waste of computing resources, and improving processing efficiency of the multiply-add device arrays to process the image data to be processed.
The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
For the sake of understanding the present embodiment, first, a detailed description will be given of a data processing method disclosed in an embodiment of the present disclosure, where an execution body of the data processing method provided in the embodiment of the present disclosure is generally a computer device having a certain computing capability, where the computer device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular telephone, cordless telephone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the data processing method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
The data processing method provided by the embodiment of the present disclosure is described below.
Referring to fig. 1, a flowchart of a data processing method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S102, where:
s101: grouping a plurality of multiply-add devices in the multiply-add device array based on the operation step length of the matrix operand to obtain at least one multiply-add device group;
s102: the data processing tasks corresponding to each of the multiplier-adder groups are performed in parallel with each of the at least one multiplier-adder groups.
Grouping multiplier-adder arrays based on operand operation step sizes to obtain at least one multiplier-adder group, and enabling each multiplier-adder group in the at least one multiplier-adder group to execute data processing tasks corresponding to the multiplier-adder group in parallel; the data processing tasks processed by each multiply-add device group are different, so that the multiply-add device array can process a plurality of data processing tasks at the same time, and the processing efficiency of the multiply-add device array on the data processing tasks is improved.
In addition, in the processing mode, the multiplier-adder which is not used in the processing process of one image data to be processed is used for processing other image data to be processed, so that the utilization rate of the multiplier-adder array is improved, and the waste of calculation resources is reduced.
The following describes the above-mentioned S101 to S102 in detail.
For S101, the multiplier-adder array is a matrix array formed by at least one multiplier-adder, and an exemplary diagram of a multiplier-adder array provided in the present disclosure is shown in fig. 2, where the multiplier-adder array includes 16 multipliers in total in 4 rows and 4 columns. The matrix operands include, for example, convolution kernels when processing image data to be processed; the convolution operand step size is, for example, the moving step size of the convolution. By way of example, the convolution kernel moves by 2 steps as represented in FIG. 3: s is S x =2、S y The process of moving is, for example, moving from the first target position shown in a to the second target position shown in b, and then moving from the second target position shown in b to the third target position shown in c, i.e., moving two pixels at a time in the lateral direction and moving two pixels at a time in the longitudinal direction; wherein S is x Representing pixels moving in the lateral direction, S y Representing pixels moving in the longitudinal direction.
In grouping the multiply-add devices in the multiply-add device array, the number of multiply-add device groups may be determined based on the matrix operand operation steps, for example, and the plurality of multiply-add devices in the multiply-add device array may be grouped based on the number of multiply-add device groups.
In a specific implementation, the relation of the number of the matrix operand operation step size multiply-add groups is: number of multiply-add groups = S x *S y The method comprises the steps of carrying out a first treatment on the surface of the For example, when the operation step size of the matrix operand is 2, S x =2、S y =2, then the number of multiply-add groups=s x *S y =2*2=4。
In a specific implementation, an embodiment of the present disclosure provides a specific method for grouping a plurality of multiply-add devices in a multiply-add device array based on a matrix operand operation step size to obtain at least one multiply-add device group, including:
determining a first target multiply-adder in each multiply-adder group from the multiply-adder array;
the other target multiply-add devices in each multiply-add device group, except for the first target multiply-add device, are determined from the multiply-add device array based on the position of the first multiply-add device in the multiply-add device array, the matrix operand operation step size, and the size of the multiply-add device array.
In some cases, since the size of the multiplier-adder array is fixed, and the size of the image data to be processed may be different according to the actual image processing situation, even if the data processing method provided by the embodiment of the present disclosure is used to process multiple image data to be processed in parallel, the utilization rate of the multiplier-adder array may not reach one hundred percent in many cases. Thus, in the embodiments of the present disclosure, size information of an actually used multiply-add array is first determined based on a matrix operand operation step size and size information of the multiply-add array; the size information of the multiplier-adder array comprises the number of rows and the number of columns of the multiplier-adder array, and the size information of the actual-use multiplier-adder array comprises the number of rows and the number of columns of the actual-use multiplier-adder array; the relation between the size information of the actually used multiply-add array and the operation step length of the matrix operand and the size information of the multiply-add array is as follows: a's' x =A x -A x %S x ;A′ y =A y -A y %S y The method comprises the steps of carrying out a first treatment on the surface of the Wherein A is x For the number of columns of the multiplier-adder array, A y The number of rows of the multiply-add array; a's' x For the number of columns of the practical multiply-add array, A' y For the number of rows of the multiply-add array actually used,% is the remainder operation. Exemplary, when the matrix operand operation step size is 2, S x =2、S y The size information of the multiply-add array is =2: a is that x =5,A y =5; the number of columns A 'of the multiply-add array thus in practical use' x =A x -A x %S x The number of rows of the actual multiply-add array used is a =5-5%2=4%' y =A y -A y %S y =5-5%2=4。
The first target multiply-adder for each multiply-adder group is then determined in the actually used multiply-adder array.
In particular implementations, the first target Cheng Chengjia of each multiplier-adder set may be determined, for example, based on the following:
determining a target matrix based on the matrix operand operation step size and the size of the multiply-add array; determining the position of a first target multiply-adder in each multiply-adder group in the multiply-adder array according to the matrix element values of the target matrix; wherein the matrix element values are multiply-add devices in the target matrix.
Exemplary, the size information of the actual use multiplier-adder array is 4 rows and 4 columns, and when the operation step size of the matrix operand is 2, S x =2、S y =2, then the number of multiply-add groups=s x *S y For example, the position arrangement number of the multiply-add array used in practice is
Figure BDA0002925913150000101
The first target multiply-adder of the first multiply-adder group, i.e. the first multiply-adder of the target matrix, is at position 0, and then the corresponding position arrangement number of the two-row and two-column target matrix determined based on the first target multiply-adder of the first multiply-adder group at position 0 in the actually used multiply-adder array is->
Figure BDA0002925913150000102
The first target multiply-add of the other three multiply-add groups have position numbers 1, 4, 5, respectively, in the actually used multiply-add array, as in the target matrix shown in fig. 4.
For example, the target position of the first target multiply-add device of each multiply-add array set in the target image may be determined by referring to the formula corresponding to each position of the matrix shown in fig. 5, that is, each matrix element in the matrix shown in fig. 5 represents the position of the first target multiply-add device of one multiply-add array set, respectively. Wherein A 'is' x For the number of columns of the practical multiply-add array, A' y For the number of rows of the multiplier-adder array in practical use, A' x =A x -A x %S x ;A′ y =A y -A yy ,A x For the number of columns of the multiplier-adder array, A y The number of rows of the multiply-add array; s is S x For the transverse movement step length of the operation step length of the matrix operand, S y A longitudinal movement step for the matrix operand operation step.
After determining the first target multiply-adder of each multiply-adder group in the actually used multiply-adder array, for example, the other target multiply-adders except the first target multiply-adder in each multiply-adder group can be determined based on the following method from step one to step three:
step one: for each multiplier-adder group, determining a first position relation between each multiplier-adder of each row except the first multiplier-adder of the row in the multiplier-adder group and the adjacent previous multiplier-adder of the multiplier-adder based on the position of the first multiplier-adder in the multiplier-adder array and the operation step length of a matrix operand;
wherein the multiplicationThe first positional relationship between each multiply-add device except the first multiply-add device in each row in the adder group and the adjacent previous multiply-add device in the multiply-add device array is as follows: position +S of adjacent previous multiply-add x The position of the multiply-add for each row except the first multiply-add.
The actual multiplier-adder array used is, for example, 4 rows and 4 columns, i.e., A' y =4、A′ x The position arrangement number of the actually used multiply-add array is =4
Figure BDA0002925913150000111
When the operation step length of the matrix operand is 2, S x =2、S y =2, four different colors represent four multiplier-adder groups as shown in fig. 4: a first multiplier-adder group in black, a second multiplier-adder group in white, a third multiplier-adder group in light gray, and a fourth multiplier-adder group in dark gray, taking the first multiplier-adder group as an example, the first multiplier-adder of the first multiplier-adder group is at position 0, then the position of another multiplier-adder a in the same row is: 0+S x The position of the next multiply-add B after the row at position 2 in the same group is =0+2=2: 2+S x =2+2=4, but since the size of the actually used multiply-add array is 4 columns, the row has a maximum position arrangement number of 3, so the same group of multiply-add at position 0 has only multiply-add at position 2.
Step two: determining a second positional relationship between each multiply-add device except the first multiply-add device in each column of the multiply-add device group and the adjacent previous multiply-add device of the multiply-add device in the multiply-add device array based on the position of the first multiply-add device in the multiply-add device group in the multiply-add device array, the operation step length of the matrix operand and the column number of the multiply-add device array;
Wherein, the second positional relationship between each multiplier and the adjacent previous multiplier in the multiplier-adder array of each column except the first multiplier in the multiplier-adder group is as follows: position +S of adjacent previous multiply-adder in multiply-adder array y *A′ x The position of the multiply-add for each column except the first multiply-add.
The actual multiplier-adder array used is, for example, 4 rows and 4 columns, i.e., A' y =4、A′ x The position arrangement number of the actually used multiply-add array is =4
Figure BDA0002925913150000112
When the operation step length of the matrix operand is 2, S x =2、S y As shown in fig. 4, taking the first multiplier-adder group as an example, if the first multiplier-adder of the first multiplier-adder group is at position 0, the position of the other multiplier-adder C of the same group is: 0+S y *A′ x =0+2×4=8, the position of the next multiplier-adder D after the same set of positions 8 is: 8+S y *A′ x =8+2×4=16, but since the size of the actually used multiply-add array is 4 rows, the column has a maximum position arrangement number of 12, and therefore the same set of multiply-add devices as at position 0 has only multiply-add devices at position 8.
Step three: and determining the target positions of other target multiply-add devices except the first target multiply-add device in the multiply-add device group in the multiply-add device array based on the first position relation and/or the second position relation.
The actual multiplier-adder array used is, for example, 4 rows and 4 columns, i.e., A' y =4、A′ x The position arrangement number of the actually used multiply-add array is =4
Figure BDA0002925913150000121
When the operation step length of the matrix operand is 2, S x =2、S y After calculating the position of the first multiply-add device in each row or each column, the above formula for calculating the positions of multiply-add devices adjacent to the same column or the target positions of other target multiply-add devices in the multiply-add device array except the first multiply-add device in the multiply-add device group may be referred to, as shown in fig. 4, by taking the first multiply-add device group as an example, where the first multiply-add device in the first multiply-add device group is at position 0, and where the position of the next multiply-add device a in the same column is 2, the position of the next multiply-add device E in the same column as the multiply-add device a is: 2+S y *A′ x =2+2×4=2+8=10; or (b)For example, if the position of the next multiplier C in the same column as the multiplier C at the position 0 is 8, the position of the next multiplier E in the same column as the multiplier C is: 8+S x =8+2=10。
Illustratively, as shown in FIG. 4, the present disclosure provides an exemplary diagram of a multiply-add array divided into four multiply-add groups, four different colors representing the four multiply-add groups, a first multiply-add group that is black, a second multiply-add group that is white, a third multiply-add group that is light gray, and a fourth multiply-add group that is dark gray; in the same row of the multiply-add array, the interval between two adjacent same-group multiply-add devices is the same and is not zero, and in the same column of the multiply-add array, the interval between two adjacent same-group multiply-add devices is the same and is not zero.
For S102, the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different, for example, each multiplier-adder group convolves a different data matrix.
When the data processing task corresponding to each of the multiplier-adder groups is executed in parallel by using each of the at least one multiplier-adder groups, the image data to be processed corresponding to each of the multiplier-adder groups is stored in the register array corresponding to each of the multiplier-adder groups according to the positions of the respective target multiplier-adders in each of the multiplier-adder groups in the multiplier-adder array.
Here, the image data to be processed includes, for example, at least one of:
an original image to be processed;
subgraphs corresponding to any color channel in an original image to be processed;
extracting features of the original image to obtain a feature map;
performing feature extraction on the original image to obtain a feature subgraph corresponding to at least one channel in the feature graph;
the sub-image corresponding to at least one color channel in the original image is subjected to data filling processing to obtain the sub-image;
and performing data filling processing on the feature subgraphs corresponding to at least one channel of the feature map.
Taking the feature map as the image data to be processed, when the image data to be processed is stored in the register array, a feature value of a feature point in the image data to be processed, which is also called an operand required by the multiply adder, is stored in each of at least some of the registers.
For each multiply-add bank, determining the position of the target multiply-add of the multiply-add bank in the corresponding register array for fixing the read register; as shown in fig. 6, four multiply-add arrays in n include four multiply-add arrays in n corresponding to the four register arrays shown in m, respectively, black multiply-add array corresponding to the black register array a, white multiply-add array corresponding to the white register array B, light gray multiply-add array corresponding to the light gray register array C, dark gray multiply-add array corresponding to the dark gray register array D, target multiply-add PE0 reading the characteristic value stored in A0 from the fixedly read registers A0 in the respective corresponding register arrays, target multiply-add PE1 reading the characteristic value stored in B0 in the register B0, target multiply-add PE2 reading the characteristic value stored in A2 in the register B2, target multiply-add PE3 reading the characteristic value stored in B2 in the register B4 reading the characteristic value stored in C0 in the register C0, target multiplier PE5 reads the characteristic value stored in D0 in register D0, target multiplier PE6 reads the characteristic value stored in C2 in register C2, target multiplier PE7 reads the characteristic value stored in D2 in register D2, target multiplier PE8 reads the characteristic value stored in A8 in register A8, PE9 reads the characteristic value stored in B8 in register B8, target multiplier PE10 reads the characteristic value stored in a10 in register a10, target multiplier PE11 reads the characteristic value stored in B10 in register B10, target multiplier PE12 reads the characteristic value stored in C8 in register C8, target multiplier PE13 reads the characteristic value stored in D8 in register D10, target multiplier PE14 reads the characteristic value stored in C10 in register C10, the target multiply adder PE15 reads the characteristic value stored in D10 in the register D10.
For each multiply-add group, according to the position of each target multiply-add in the multiply-add group in the multiply-add array, the position of the target multiply-add fixedly read register in the multiply-add group, and the processing sequence of the operands contained in the image data to be processed in the data processing process, storing the image data to be processed corresponding to the multiply-add group into the register array corresponding to the multiply-add group, so that each data processing period, each target multiply-add fixedly read register stores the operands corresponding to the matrix elements in the matrix operands of the corresponding processing period.
Wherein the matrix operands comprise, for example, a convolution kernel, i.e., a data matrix, in a convolution calculation, exemplary,
Figure BDA0002925913150000131
a matrix operand, two rows and two columns, comprising matrix elements: w (W) 0 、W 1 、W 2 、W 3 . The number of operands contained in the image data to be processed corresponding to each multiplier-adder group should be consistent. The image data to be processed corresponding to the first multiplier-adder group as shown in FIG. 6 is +.>
Figure BDA0002925913150000141
The storage rule of the image data to be processed in the black register array corresponding to the first multiplier-adder group is shown as a in fig. 6, and the image data to be processed corresponding to the second multiplier-adder group is
Figure BDA0002925913150000142
The storage rule of the image data to be processed in the white register array corresponding to the second multiplier-adder set is shown as b in fig. 6, and the image data to be processed corresponding to the third multiplier-adder set is +.>
Figure BDA0002925913150000143
The storage rule of the image data to be processed in the light gray register array corresponding to the third multiply-add bank is shown as c in fig. 6, and the image data to be processed corresponding to the fourth multiply-add bank is +.>
Figure BDA0002925913150000144
The storage rule of the image data to be processed in the dark gray register array corresponding to the fourth multiply-add group is shown as d in fig. 6.
After storing the image data to be processed corresponding to each multiply-add device group into a register array corresponding to each multiply-add device group, respectively reading the image data to be processed corresponding to each multiply-add device group in each data processing period from a fixed register array corresponding to each multiply-add device group in each data processing period; and processing the read image data to be processed to obtain the data processing result of each multiplier-adder group in the data processing period in parallel.
Wherein, for the first data processing period for processing the image data to be processed, controlling each target multiply-add device in each multiply-add device group, and respectively reading the operand of each target multiply-add device corresponding to the first data processing period from the register fixedly read by each target multiply-add device as the first operand; and determining matrix elements in matrix operands corresponding to the first data processing period of each multiplier-adder group as second operands; respectively determining the product of a first operand and a second operand of each target multiply-add device in the first data processing period;
For example, the target multiplier adder PE0 reads the operand A0 from the fixedly read register A0 in the corresponding register array, and the target multiplier adder PE1 reads the read operand B0 in the register B0, and so on, which is not described herein; the matrix operands are assumed to be:
Figure BDA0002925913150000145
taking PE0 as an example, after the operand a0 is read, a0 is taken as a first operand, and the matrix element corresponding to the data processing period is W 0 Will W 0 As a second operand, then calculate W 0 * a0; and stores the result in a register.
Aiming at a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiply-add device in each multiply-add device group, and respectively reading the operands of each target multiply-add device in the non-first data processing period from the registers fixedly read with each target multiply-add device as a first operand; and determining matrix elements in matrix operands corresponding to the data processing period of each multiplier and adder group as a second operand; the product of the first operand and the second operand of each target multiply-adder in the data processing period is determined separately.
Taking the period of the second data position corresponding to the multiply-add unit shown in fig. 6 as an example, the left shift step is 1, as shown in fig. 7, which is an example diagram of a register array a after the image data to be processed is shifted left by one step in the register array, in the embodiment of the disclosure, PE0 reads A1 from A0, PE2 reads A3 … … from A2, and other multiply-add units read operands and the like, which are not described herein; the matrix operands are:
Figure BDA0002925913150000151
taking PE0 as an example, after the operand a1 is read, a1 is taken as a first operand, and the matrix element corresponding to the data processing period is W 1 Will W 1 As a second operand, then calculate W 1 * a1; and stores the result in a register carried by itself.
Similarly, in the third data processing period, the data to be processed may be moved up by one step based on the position shown in fig. 7, where a5 is stored in A0, and PE0 may perform calculation of W2 x a 5; in the fourth data processing period, the data to be processed may be moved to the right by one step integrally on the basis of the completion of the movement of the third data processing period, where a4 is stored in A0, and the PE0 may perform the calculation of W3 x a4, and the other PEs are the same, which is not repeated herein.
It can be seen that in each data processing cycle, the PEs storing different image data to be processed complete the computation of the corresponding data processing cycle, that is, different multiplier-adder groups complete the computation of the corresponding data processing cycle in parallel in each data processing cycle, and after all data processing cycles, the different multiplier-adder groups complete the final computation at the same time, thereby saving system resources.
Here, the corresponding convolution kernels may be different or the same for different image data to be processed. For example, if the two pieces of image data to be processed are different feature subgraphs of the same feature map, the convolution kernels corresponding to the two pieces of image data to be processed are different. If the two pieces of image data to be processed are the image data of different positions of the same characteristic subgraph, the convolution kernels corresponding to the two pieces of images to be processed are the same.
And according to the data processing results of each multiplier-adder group in each data processing period, completing the data processing tasks of each multiplier-adder group.
Wherein for each target multiply-adder in each multiply-adder group, the products obtained by the target multiply-adder in each data processing period are added to obtain a sum; and completing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders contained in the multiplier-adder groups.
For example, taking PE0 as shown in fig. 6 as an example, the calculation performed by the four data processing cycles of PE0 are respectively: w (W) 0 *a0、W 1 *a1、W 2 *a5、W 3 * a4; adding the four calculation results: w (W) 0 *a0+W 1 *a1+W 2 *a5+W 3 * a4, the obtained result is a result value in a processing result matrix of a data processing task of the image data to be processed corresponding to the first multiplier-adder set, and the result value in the processing result matrix of the data processing task of the image data to be processed corresponding to the first multiplier-adder set is arranged as follows
Figure BDA0002925913150000152
Here, if the image data to be processed, which is subjected to convolution, is a feature map, the feature map includes 16 channels, and feature subgraphs corresponding to the 4 channels are processed each time, that is, the feature subgraphs corresponding to the 16 channels are required to be divided into 4 groups, and processing of a group of feature subgraphs is performed each time. If the 4 groups of characteristic subgraphs are respectively: when the group a, the group b, the group c and the group d are used, after 4 characteristic subgraphs included in the group a are processed, 4 results corresponding to the group a output by the multiplier-adder are accumulated; after the 4 feature subgraphs included in the group b are processed, accumulating 4 results corresponding to the group b, and accumulating the accumulated results corresponding to the group a and the accumulated results corresponding to the group b; after the 4 feature subgraphs included in the group c are processed, accumulating 4 results corresponding to the group c, and accumulating accumulated results of the group a and the group b and accumulated results corresponding to the group c; after the 4 feature subgraphs included in the group d are processed, the 4 results corresponding to the group d are accumulated, the accumulated results of the group a, the group b and the group c and the accumulated results corresponding to the group d are accumulated, and finally, the accumulated sum of the convolution results corresponding to the 16 channels respectively is obtained.
After the 4 feature subgraphs included in the group a are processed, the 4 output results corresponding to the group a are respectively: a1, a2, a3 and a4. After the 4 feature subgraphs included in the group b are processed, the 4 output results corresponding to the group b are respectively: b1, b2, b3 and b4. At this time, a1+b1=o1, a2+b2=o2, a3+b3=o3, a4+b4=o4 are performed. After the 4 feature subgraphs included in the group c are processed, the 4 output results corresponding to the group c are respectively: c1, c2, c3 and c4, and then: o1+c1, O2+c2, O3+c3, O4+c4; similarly, a1+b1+c1+d1, a2+b2+c2+d2, a3+b3+c3+d3, a4+b4+c4+d4 are finally obtained, and then the four results are accumulated together to obtain the accumulated sum of the convolution results corresponding to the 16 channels respectively.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide a data processing device corresponding to the data processing method, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of the data processing method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 8, a schematic diagram of a data processing apparatus according to an embodiment of the disclosure is provided, where the apparatus includes: a controller 801; the controller 801 is configured to:
grouping a plurality of multiply-add devices in the multiply-add device array based on a matrix operand operation step size to obtain at least one multiply-add device group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
In one possible implementation, in the same row of the multiply-add array, two adjacent same-group multiply-add devices are spaced apart by the same non-same-group multiply-add device and are not zero, and in the same column of the multiply-add array, two adjacent same-group multiply-add devices are spaced apart by the same non-same-group multiply-add device and are not zero.
In one possible implementation, the controller 801 is specifically configured to determine the number of multiply-add groups based on a matrix operand operation step size when grouping a plurality of multiply-add devices in the multiply-add array based on the matrix operand operation step size; a plurality of multiply-add devices in the multiply-add device array are grouped based on the number of multiply-add device groups.
In one possible implementation, the controller 801 is specifically configured to determine, from the multiply-add array, a first target multiply-add in each of the multiply-add groups when grouping a plurality of multiply-add devices in the multiply-add array based on the number of multiply-add groups; and determining other target multiply-add devices in each multiply-add device group except the first target multiply-add device from the multiply-add device array based on the position of the first multiply-add device in the multiply-add device array, the matrix operand operation step size and the size of the multiply-add device array.
In a possible implementation manner, when determining, from the multiply-add array, other target multiply-add devices in each multiply-add device group except the first target multiply-add device based on the position of the first multiply-add device in the multiply-add device array, the matrix operand operation step size, and the size of the multiply-add device array, the controller 801 is specifically configured to determine, for each multiply-add device group, a first positional relationship between each multiply-add device in each row in the multiply-add device group except the first multiply-add device and an adjacent previous multiply-add device in the multiply-add device array based on the position of the first multiply-add device in the multiply-add device array, the matrix operand operation step size; determining a second positional relationship between each multiplier and an adjacent previous multiplier of the multiplier and the multiplier in the multiplier-adder array according to the position of the first multiplier in the multiplier-adder group in the multiplier-adder array, the operation step length of the matrix operand and the column number of the multiplier-adder array; and determining target positions of other target multiply-add devices except the first target multiply-add device in the multiply-add device group in the multiply-add device array based on the first position relation and/or the second position relation.
In one possible implementation, in determining the first target multiply-adder in each multiply-adder group from the multiply-adder array, the controller 801 is specifically configured to determine a target matrix based on the matrix operand operation step size and the size of the multiply-adder array; and determining the position of the first target multiply-adder in each multiply-adder group in the multiply-adder array according to the matrix element values of the target matrix.
In a possible implementation manner, when executing a data processing task corresponding to each of the at least one multiply-add device groups by using each of the at least one multiply-add device groups, the controller 801 is specifically configured to store image data to be processed corresponding to each of the multiply-add device groups into a register array corresponding to each of the multiply-add device groups according to a position of each of the target multiply-add devices in the each of the multiply-add device groups in the multiply-add device array; for each data processing period in a plurality of data processing periods, respectively reading image data to be processed corresponding to each multiply-add device group in the data processing period from a register array corresponding to each multiply-add device group; processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel; and according to the data processing results of each multiplier-adder group in each data processing period, completing the data processing tasks of each multiplier-adder group.
In a possible implementation manner, when storing the image data to be processed corresponding to the multiply-add device groups into the register arrays corresponding to the multiply-add device groups according to the positions of the respective target multiply-add devices in the multiply-add device groups, the controller 801 is specifically configured to determine the number of registers included in each multiply-add device corresponding register array according to the size of the matrix operand; for each multiply-add bank, determining the position of the target multiply-add of the multiply-add bank in the corresponding register array for fixing the read register; for each multiply-add group, storing the image data to be processed corresponding to the multiply-add group into the register array corresponding to the multiply-add group according to the position of each target multiply-add in the multiply-add group in the multiply-add array, the position of the register fixedly read by each target multiply-add in the multiply-add group, and the processing sequence of the operands contained in the image data to be processed in the data processing process, so that each data processing period, the operands stored by the position of the register fixedly read by each target multiply-add group correspond to matrix elements in the matrix operands of the corresponding processing period.
In one possible implementation, in each of a plurality of data processing cycles, the image data to be processed corresponding to each of the multiply-add groups in the data processing cycle is read from the register array corresponding to each of the multiply-add groups; when the read image data to be processed is processed and data processing results of each multiply-add device group in the data processing period are obtained in parallel, the controller 801 is specifically configured to control each target multiply-add device in each multiply-add device group for a first data processing period for processing the image data to be processed, and respectively read an operand of each target multiply-add device corresponding to the first data processing period from a register fixedly read by each target multiply-add device as a first operand; and determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as a second operand; determining the product of the first operand and the second operand of each target multiply-add device in the first data processing period respectively; aiming at a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiply-add device in each multiply-add device group, and respectively reading the operands of each target multiply-add device in the non-first data processing period from the registers fixedly read with each target multiply-add device as a first operand; and determining matrix elements in matrix operands corresponding to the data processing period of each multiplier and adder group as a second operand; the product of the first operand and the second operand of each target multiply-adder in the data processing period is determined separately.
In a possible implementation manner, when the respective multiplier-adder groups respectively correspond to the data processing results according to each data processing period and the respective multiplier-adder groups respectively correspond to the data processing tasks are completed, the controller 801 is specifically configured to add, for each target multiplier-adder in each multiplier-adder group, products obtained by the target multiplier-adder in each data processing period to obtain a sum; and completing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders contained in the multiplier-adder groups.
In one possible implementation, the data processing task includes: a convolution processing task; the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
The image processing apparatus provided by the embodiments of the present disclosure may include a chip, an AI chip, and the like.
The embodiment of the disclosure further provides a computer device, as shown in fig. 9, which is a schematic structural diagram of the computer device provided by the embodiment of the disclosure, including:
A controller 91 and a memory 92; the memory 92 stores machine readable instructions executable by the controller 91, the controller 91 being configured to execute the machine readable instructions stored in the memory 92, the machine readable instructions when executed by the controller 91, the controller 91 performing the steps of:
grouping a plurality of multiply-add devices in the multiply-add device array based on a matrix operand operation step size to obtain at least one multiply-add device group;
and executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group.
The memory 92 includes a memory 921 and an external memory 922; the memory 921 is also referred to as an internal memory, and is used for temporarily storing operation data in the controller 91 and data exchanged with an external memory 922 such as a hard disk, and the controller 91 exchanges data with the external memory 922 via the memory 921.
The computer device provided by the embodiment of the disclosure may include an intelligent terminal such as a mobile phone, or may also be other devices, servers, etc. that have a camera and may perform image processing, which is not limited herein.
The specific execution process of the above instructions may refer to the steps of the data processing method described in the embodiments of the present disclosure, which is not described herein.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the data processing method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program code, where instructions included in the program code may be used to perform steps of a data processing method described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (13)

1. A method of data processing, comprising:
grouping a plurality of multiply-add devices in the multiply-add device array based on the matrix operand operation step length to obtain at least one multiply-add device group;
executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group; wherein, for each data processing period, the images to be processed corresponding to each data processing task are different;
The performing, with each of the at least one multiply-add bank, a data processing task corresponding to each of the multiply-add banks, comprising:
storing the image data to be processed corresponding to each multiply-add device group into a register array corresponding to each multiply-add device group according to the position of each target multiply-add device in each multiply-add device group in the multiply-add device array;
for each data processing period in a plurality of data processing periods, respectively reading image data to be processed corresponding to each multiply-add device group in the data processing period from a register array corresponding to each multiply-add device group; and is combined with
Processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel;
and according to the data processing results of each multiplier-adder group in each data processing period, completing the data processing tasks of each multiplier-adder group.
2. The method of claim 1, wherein in the same row of the multiply-add array, two adjacent same-set multiply-add devices are spaced apart by a non-same-set multiply-add device by the same number and are not zero, and wherein in the same column of the multiply-add array, two adjacent same-set multiply-add devices are spaced apart by the same non-same-set multiply-add device by the same number and are not zero.
3. A data processing method according to claim 1 or 2, wherein said grouping a plurality of multiply-add devices in a multiply-add array based on a matrix operand operation step size comprises:
determining a number of the multiply-add groups based on the matrix operand operation steps;
a plurality of multiply-add devices in the multiply-add device array are grouped based on the number of multiply-add device groups.
4. A data processing method according to claim 3, wherein said grouping a plurality of multiply-add devices in said multiply-add device array based on the number of said multiply-add device groups comprises:
determining a first target multiply-adder in each multiply-adder group from the multiply-adder array;
determining other target multiply-add devices in each multiply-add device group, except the first target multiply-add device, from the multiply-add array based on the position of the first target multiply-add device in the multiply-add array, the matrix operand operation step size, and the size of the multiply-add array.
5. The method of claim 4, wherein said determining the other target multiply-add devices in each of said groups of multiply-add devices, except for said first target multiply-add device, from said array of multiply-add devices based on the position of said first target multiply-add device in said array of multiply-add devices, said matrix operand operation steps, and the size of said array of multiply-add devices, comprises:
For each multiplier-adder group, determining a first positional relationship between each multiplier-adder of each row of the multiplier-adder group except for the first target multiplier-adder of the row and the adjacent previous multiplier-adder of the multiplier-adder based on the position of the first target multiplier-adder of the multiplier-adder group in the multiplier-adder array and the operation step length of the matrix operand; and is combined with
Determining a second positional relationship between each multiply-add device of each column except the first multiply-add device of the column and an adjacent previous multiply-add device of the multiply-add device in the multiply-add device array based on the position of the first target multiply-add device of the multiply-add device group in the multiply-add device array, the operation step length of the matrix operand and the column number of the multiply-add device array;
and determining target positions of other target multiply-add devices except the first target multiply-add device in the multiply-add device group in the multiply-add device array based on the first position relation and/or the second position relation.
6. The method of claim 4 or 5, wherein said determining the first target multiply-adder in each of said groups of multiply-adders from said array of multiply-adders comprises:
determining a target matrix based on the matrix operand operation step size and the size of the multiply-add array;
And determining the position of the first target multiply-adder in each multiply-adder group in the multiply-adder array according to the matrix element values of the target matrix.
7. The method according to claim 1, wherein storing the image data to be processed corresponding to the multiplier-adder group in the register array corresponding to the multiplier-adder group according to the positions of the respective target multiplier-adders in the multiplier-adder array, comprises:
determining the number of registers contained in the register array corresponding to each multiply-add device according to the size of the matrix operand;
for each multiply-add bank, determining the position of the target multiply-add of the multiply-add bank in the corresponding register array for fixing the read register;
for each multiply-add group, storing the image data to be processed corresponding to the multiply-add group into the register array corresponding to the multiply-add group according to the position of each target multiply-add in the multiply-add group in the multiply-add array, the position of the register fixedly read by each target multiply-add in the multiply-add group, and the processing sequence of the operands contained in the image data to be processed in the data processing process, so that each data processing period, the operands stored by the position of the register fixedly read by each target multiply-add group correspond to matrix elements in the matrix operands of the corresponding data processing period.
8. The method according to claim 1 or 7, wherein for each of the plurality of data processing periods, the image data to be processed corresponding to each of the plurality of multiply-add groups in the data processing period is read from the register array corresponding to each of the plurality of multiply-add groups, respectively; processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel, wherein the data processing results comprise:
controlling each target multiply-add device in each multiply-add device group aiming at the first data processing period for processing the image data to be processed, and respectively reading an operand corresponding to each target multiply-add device in the first data processing period from a register fixedly read by each target multiply-add device as a first operand; and determining matrix elements of each multiplier-adder group in a matrix operand corresponding to the first data processing period as a second operand; determining the product of the first operand and the second operand of each target multiply-add device in the first data processing period respectively;
aiming at a non-first data processing period for processing the image data to be processed, controlling the image data to be processed to move a preset step length in the register array according to a preset data moving mode corresponding to the data processing period; controlling each target multiply-add device in each multiply-add device group, and respectively reading the operands of each target multiply-add device in the non-first data processing period from the registers fixedly read with each target multiply-add device as a first operand; and determining matrix elements in matrix operands corresponding to the data processing period of each multiplier and adder group as a second operand; the product of the first operand and the second operand of each target multiply-adder in the data processing period is determined separately.
9. The method according to claim 1 or 7, wherein the step of completing the data processing tasks corresponding to the respective multiply-add devices according to the data processing results corresponding to the respective multiply-add devices in each data processing period comprises:
for each target multiply-adder in each multiply-adder group, adding products obtained by the target multiply-adder in each data processing period to obtain a sum;
and completing the data processing tasks corresponding to the multiplier-adder groups respectively based on the sum values corresponding to the target multiplier-adders contained in the multiplier-adder groups.
10. A data processing method according to claim 1 or 2, wherein the data processing task comprises: a convolution processing task;
the images to be processed corresponding to the convolution processing tasks of different multiplier-adder groups are different.
11. A data processing apparatus, comprising: a controller; the controller is used for:
grouping a plurality of multiply-add devices in the multiply-add device array based on the matrix operand operation step length to obtain at least one multiply-add device group;
executing data processing tasks corresponding to each multiplier-adder group in parallel by utilizing each multiplier-adder group in the at least one multiplier-adder group; wherein, for each data processing period, the images to be processed corresponding to each data processing task are different;
Wherein the controller, when executing the data processing task corresponding to each of the at least one multiply-add group in parallel with each of the at least one multiply-add group, is configured to:
storing the image data to be processed corresponding to each multiply-add device group into a register array corresponding to each multiply-add device group according to the position of each target multiply-add device in each multiply-add device group in the multiply-add device array;
for each data processing period in a plurality of data processing periods, respectively reading image data to be processed corresponding to each multiply-add device group in the data processing period from a register array corresponding to each multiply-add device group; and is combined with
Processing the read image data to be processed to obtain data processing results of the multiplier-adder groups in the data processing period in parallel;
and according to the data processing results of each multiplier-adder group in each data processing period, completing the data processing tasks of each multiplier-adder group.
12. A computer device, comprising: a controller, a memory storing machine readable instructions executable by the controller for executing machine readable instructions stored in the memory, which when executed by the controller, perform the steps of the data processing method according to any one of claims 1 to 10.
13. A computer-readable storage medium, on which a computer program is stored which, when being run by a computer device, performs the steps of the data processing method according to any one of claims 1 to 10.
CN202110132573.XA 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium Active CN112927125B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110132573.XA CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium
PCT/CN2021/115799 WO2022160706A1 (en) 2021-01-31 2021-08-31 Data processing method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110132573.XA CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112927125A CN112927125A (en) 2021-06-08
CN112927125B true CN112927125B (en) 2023-06-23

Family

ID=76169016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110132573.XA Active CN112927125B (en) 2021-01-31 2021-01-31 Data processing method, device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112927125B (en)
WO (1) WO2022160706A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927125B (en) * 2021-01-31 2023-06-23 成都商汤科技有限公司 Data processing method, device, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827107A (en) * 2010-05-11 2010-09-08 南京大学 IEEE802.1AE protocol-based GCM high-speed encryption and decryption equipment
US9606803B2 (en) * 2013-07-15 2017-03-28 Texas Instruments Incorporated Highly integrated scalable, flexible DSP megamodule architecture
CN105205191B (en) * 2014-06-12 2018-10-12 济南概伦电子科技有限公司 Multi tate parallel circuit emulates
US10108397B2 (en) * 2015-08-25 2018-10-23 Samsung Electronics Co., Ltd. Fast close path solution for a three-path fused multiply-add design
CN107301455B (en) * 2017-05-05 2020-11-03 中国科学院计算技术研究所 Hybrid cube storage system for convolutional neural network and accelerated computing method
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN110659446B (en) * 2018-06-29 2022-09-23 合一智芯科技(北京)有限公司 Convolution operation control method, device and medium
CN110796244B (en) * 2018-08-01 2022-11-08 上海天数智芯半导体有限公司 Core computing unit processor for artificial intelligence device and accelerated processing method
CN109284782B (en) * 2018-09-13 2020-10-02 北京地平线机器人技术研发有限公司 Method and apparatus for detecting features
US11544535B2 (en) * 2019-03-08 2023-01-03 Adobe Inc. Graph convolutional networks with motif-based attention
CN110058883B (en) * 2019-03-14 2023-06-16 梁磊 CNN acceleration method and system based on OPU
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110705687B (en) * 2019-09-05 2020-11-03 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN111581595B (en) * 2020-04-24 2024-02-13 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit
CN112927125B (en) * 2021-01-31 2023-06-23 成都商汤科技有限公司 Data processing method, device, computer equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof

Also Published As

Publication number Publication date
CN112927125A (en) 2021-06-08
WO2022160706A1 (en) 2022-08-04

Similar Documents

Publication Publication Date Title
CN110050267B (en) System and method for data management
US11734006B2 (en) Deep vision processor
Lai et al. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus
CN108765247B (en) Image processing method, device, storage medium and equipment
US20230305808A1 (en) Accelerated mathematical engine
US10943167B1 (en) Restructuring a multi-dimensional array
TWI639119B (en) Adaptive execution engine for convolution computing systems cross-reference to related applications
CN108491359B (en) Submatrix operation device and method
CA2929403C (en) Multi-dimensional sliding window operation for a vector processor
WO2019135873A1 (en) Systems and methods for hardware-based pooling
KR20210074992A (en) Accelerating 2d convolutional layer mapping on a dot product architecture
CN110796236B (en) Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
WO2022160704A1 (en) Image processing method and apparatus, computer device and storage medium
CN112927125B (en) Data processing method, device, computer equipment and storage medium
JP7174831B2 (en) Video memory processing method, apparatus and recording medium based on convolutional neural network
CN109447239B (en) Embedded convolutional neural network acceleration method based on ARM
CN112966729A (en) Data processing method and device, computer equipment and storage medium
CN115485656A (en) In-memory processing method for convolution operation
CN112668709B (en) Computing device and method for data reuse
Hwang et al. An efficient FPGA-based architecture for convolutional neural networks
CN113657587B (en) Deformable convolution acceleration method and device based on FPGA
CN110765413B (en) Matrix summation structure and neural network computing platform
CN113111891B (en) Image reconstruction method and device, terminal equipment and storage medium
US10956776B2 (en) 2D convolutional accelerator that generates 3D results
CN111178505B (en) Acceleration method of convolutional neural network and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40051173

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant