WO2023103551A1 - Image data processing method and apparatus, device, and storage medium - Google Patents

Image data processing method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023103551A1
WO2023103551A1 PCT/CN2022/122544 CN2022122544W WO2023103551A1 WO 2023103551 A1 WO2023103551 A1 WO 2023103551A1 CN 2022122544 W CN2022122544 W CN 2022122544W WO 2023103551 A1 WO2023103551 A1 WO 2023103551A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
image matrix
convolution kernel
image
target elements
Prior art date
Application number
PCT/CN2022/122544
Other languages
French (fr)
Chinese (zh)
Inventor
胡宇
姬彬斐
刘嘉超
刘兰个川
Original Assignee
广州小鹏自动驾驶科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州小鹏自动驾驶科技有限公司 filed Critical 广州小鹏自动驾驶科技有限公司
Publication of WO2023103551A1 publication Critical patent/WO2023103551A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the technical field of image processing, and in particular to an image data processing method, device, equipment and storage medium.
  • the current mainstream machine translation is mainly based on neural network machine translation.
  • This type of method is an "encoder-decoder" (encoder-decoder) architecture system.
  • the encoder encodes the source language sequence, extracts information, and then The translator converts the information into the target language and completes the language translation process.
  • the deep self-attention transform (Transformer) model based on the "encoder-decoder” architecture design has become the mainstream model in the field of machine translation due to its superior performance, and has had a huge impact in the field of deep learning.
  • the Transformer model is generally deployed in a chip that supports matrix multiplication.
  • the present application provides an image data processing method, device, equipment and storage medium, which can improve the flexibility of the Transformer module.
  • the first aspect of the present application provides an image data processing method, including: acquiring the first image matrix A m ⁇ p and the second image matrix B p ⁇ n ; determining the first feature corresponding to the first image matrix A m ⁇ p Figure; determine the X convolution kernels corresponding to the second image matrix B p ⁇ n ; convolve the first feature map with the X convolution kernels to obtain a second feature map, the second feature
  • the figure corresponds to the third image matrix, which is the result of matrix multiplication of the first image matrix A m ⁇ p and the second image matrix B p ⁇ n .
  • the size of each convolution kernel is f ⁇ f ⁇ p, wherein the f is an integer greater than 1, and the X is an integer greater than or equal to 1.
  • each convolution kernel includes one or more columns of elements in the second image matrix B p ⁇ n .
  • the determining the X convolution kernels corresponding to the second image matrix B p ⁇ n includes: determining the values of n groups of target elements in the X convolution kernels, and the n groups of target elements
  • the value of the i-th group of target elements in the element corresponds to the value of the i-th column element in the second image matrix, the i is an integer from 1 to n, and each convolution kernel contains one or more of the n groups of target elements A plurality of groups of target elements; determine the values of other elements in the X convolution kernels except for the n groups of target elements to be 0.
  • each convolution kernel contains a group of target elements Z 1 , Z 2 ,..., Z p in the n groups of target elements, and for each convolution kernel, the target elements contained in the convolution kernel Z 1 , Z 2 ,..., Z p are respectively located in row j, column k of the 1st, 2nd,..., p channels of the convolution kernel;
  • each convolution kernel includes u groups of target elements in the n groups of target elements, the u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the convolution kernel
  • the p target elements contained in the vth group of target elements are respectively located in the first, second..., p channels of the convolution kernel, the j vth row and the kvth column kv , where v is an integer from 1 to u;
  • the corresponding positions of the target elements in the convolution kernel are related, the length of the zero data module is 1, the width is 1, and the depth is n, and the values of all elements in the zero data module are zero.
  • the second aspect of the present application provides an image data processing device, including: an acquisition module, used to acquire a first image matrix A m ⁇ p and a second image matrix B p ⁇ n ; a first determination module, used to determine the first image matrix A first feature map corresponding to an image matrix A m ⁇ p ; a second determination module, used to determine X convolution kernels corresponding to the second image matrix B p ⁇ n ; a convolution module, used to convert the first A feature map is convolved with the X convolution kernels to obtain a second feature map, the second feature map corresponds to a third image matrix, and the third image matrix is the first image matrix A m ⁇ p and the second image matrix The result of matrix multiplication of the two image matrix B p ⁇ n .
  • each convolution kernel contains one or more columns of elements in the second image matrix B p ⁇ n .
  • the second determination module includes: a first determination unit, configured to determine the values of n groups of target elements in the X convolution kernels, the i-th group of target elements in the n groups of target elements The value corresponds to the value of the i-th column element in the second image matrix, the i is an integer from 1 to n, and each convolution kernel contains one or more groups of target elements in the n groups of target elements; the second A determining unit, configured to determine that values of other elements in the X convolution kernels except the n groups of target elements are 0.
  • each convolution kernel contains a group of target elements Z 1 , Z 2 ,..., Z p in the n groups of target elements, and for each convolution kernel, the target elements contained in the convolution kernel Z 1 , Z 2 ,..., Z p are respectively located in row j, column k of the 1st, 2nd,..., p channels of the convolution kernel;
  • the first determination module includes: a third determination unit, configured to determine m data modules corresponding to the m row vectors of the first image matrix A m ⁇ p , the length of the data modules is 1, and the width is 1.
  • the depth is p;
  • the first zero padding unit is used to insert zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map respectively to obtain the first feature map , the length of the zero data module is 1, the width is 1, and the depth is p, and the values of all elements in the zero data module are zero.
  • each convolution kernel includes u groups of target elements in the n groups of target elements, the u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the convolution kernel
  • the p target elements contained in the vth group of target elements are respectively located in the first, second..., p channels of the convolution kernel, the j vth row and the kvth column kv , where v is an integer from 1 to u;
  • the first determination module includes: a fourth determination unit, configured to determine m data modules corresponding to the m row vectors of the first image matrix A m ⁇ p , the length of the data modules is 1, and the width is 1.
  • the depth is p;
  • the second zero padding unit is used to insert a zero data module at the target position of the third feature map to obtain the first feature map, and the target position includes an internal position and an edge position,
  • the target position is related to the corresponding positions of the plurality of groups of target elements in the convolution kernel, the length of the zero data module is 1, the width is 1, and the depth is n, and all elements in the zero data module The value is zero.
  • the third aspect of the present application provides an electronic device, including: a processor; and a memory, on which executable code is stored, and when the executable code is executed by the processor, the processor is made to execute the above-mentioned Methods.
  • a fourth aspect of the present application provides a computer-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to execute the above-mentioned method.
  • the technical solution of this application can acquire the first image matrix A m ⁇ p and the second image matrix B p ⁇ n , and determine the first feature map corresponding to the first image matrix A m ⁇ p and the second image matrix B p ⁇ n corresponds to X convolution kernels of size f ⁇ f ⁇ p, and then the first feature map is convolved with these X convolution kernels and multiplied with the feature map corresponding to the third image matrix, where f is An integer greater than 1, the third image matrix is obtained by multiplying the first image matrix and the second image matrix.
  • the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module .
  • FIG. 1 is a schematic flow diagram of an image data processing method shown in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a third feature map in the image data processing method shown in the embodiment of the present application.
  • Fig. 3 is a schematic diagram of the first feature map in the image data processing method shown in the embodiment of the present application.
  • Fig. 4 is another schematic diagram of the first feature map in the image data processing method shown in the embodiment of the present application.
  • Fig. 5 is another schematic diagram of the first feature map in the image data processing method shown in the embodiment of the present application.
  • FIG. 6 is a schematic diagram of X convolution kernels in the image data processing method shown in the embodiment of the present application.
  • FIG. 7 is another schematic diagram of X convolution kernels in the image data processing method shown in the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image data processing device shown in an embodiment of the present application.
  • FIG. 9 is another schematic structural diagram of an image data processing device shown in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
  • first information may also be called second information, and similarly, second information may also be called first information.
  • second information may also be called first information.
  • a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • “plurality” means two or more, unless otherwise specifically defined.
  • the embodiment of the present application provides an image data processing method, which can improve the flexibility of the Transformer module.
  • Image matrix Digital image data can be represented by a matrix, so matrix theory and matrix algorithms can be used to analyze and process digital images. Since a digital image can be expressed in the form of a matrix, in a computer digital image processing program, a two-dimensional array is usually used to store image data.
  • Convolution kernel is when the image is processed, given the input image, the weighted average of the pixels in a small area of the input image becomes each corresponding pixel in the output image, where the weight is defined by a function, this function is called convolution kernel.
  • Feature map In the convolutional layer of the neural network, the data exists in three-dimensional form, which can be regarded as many two-dimensional pictures stacked together, each of which is called a feature map. In the input layer, if it is a grayscale image, there is only one feature map; if it is a color image, there are generally 3 feature maps (red, green, and blue). There will be several convolution kernels between layers, and the convolution of the previous layer and each feature map with each convolution kernel will generate a feature map of the next layer.
  • Data module In the image field, it is usually a three-dimensional array, which is used to represent the pixel value of the image, the length represents the height of the image, the width represents the width of the image, and the depth represents the number of color channels of the image.
  • image data processing method in this embodiment can be used for image processing of a neural network including a Transformer module, and can also be used for other neural networks that require matrix multiplication, which is not specifically limited in this embodiment.
  • the image data processing apparatus in this embodiment may include a Transformer module, or other modules that need to perform matrix multiplication, which is not specifically limited in this embodiment.
  • the method of the present application can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence calculations.
  • Artificial intelligence computing may include machine learning computing, brain-like computing, and the like. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and the like.
  • the artificial intelligence processor includes, for example, an artificial intelligence chip processor, GPU (Graphics Processing Unit, graphics processing unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), One or a combination of Field-Programmable Gate Array (FPGA) chips.
  • GPU Graphics Processing Unit, graphics processing unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • FPGA Field-Programmable Gate Array
  • the artificial intelligence processor may be a processor applied in an artificial intelligence chip.
  • the artificial intelligence chip may be, for example, a neural network chip or other chips, and the neural network chip may be, for example, a convolutional neural network reasoning chip, an ASIC chip, or the like.
  • the present application does not limit the specific type of the processor.
  • the processor mentioned in this application may include a plurality of processing units, and each processing unit may independently run various assigned tasks, such as: convolution operation tasks, pooling tasks Or fully connected tasks, etc.
  • the application does not limit the processing unit and the tasks run by the processing unit.
  • Multiple processing units in the processor can not only share part of the storage space, for example share part of the RAM storage space and the register file, but also have their own storage space at the same time.
  • FIG. 1 is a schematic flowchart of an image data processing method shown in an embodiment of the present application.
  • the method can be applied to a processor, and the processor can include a general-purpose processor, an artificial intelligence processor, etc., wherein the artificial intelligence processor can be an artificial intelligence chip processor or a GPU, etc.
  • image data processing method comprises in the present embodiment:
  • An image data processing apparatus acquires a first image matrix and a second image matrix.
  • the image data processing device acquires the first image matrix A m ⁇ p and the second image matrix B p ⁇ n that need to be multiplied.
  • the first image matrix A m ⁇ p or the second image matrix B p ⁇ n may be an original image matrix, or an image matrix obtained after matrix operation, which is not limited in this embodiment.
  • the image data processing apparatus determines a first feature map corresponding to the first image matrix.
  • the image data processing device After the image data processing device acquires the first image matrix A m ⁇ p , it generates a first feature map corresponding to the first image matrix A m ⁇ p according to a first preset rule.
  • the image data processing device may generate the first feature map corresponding to the first image matrix A m ⁇ p in the following manner:
  • the image data processing device For each row vector of the first image matrix A m ⁇ p , the image data processing device generates a data module corresponding to the row vector, the length of the data module is 1, the width is 1, and the depth is p, and the data module contains p elements correspond one-to-one to the p elements of the row vector.
  • the image data processing device may generate the data module corresponding to the row vector through a matrix transformation function, or may generate the data module corresponding to the row vector through other methods, which are not specifically limited in this embodiment.
  • the splicing rules include the splicing order, which can be from top to bottom, from left to right, that is, a data module is spliced in the order from top to bottom, the first column of the third feature map is arranged, and then according to The order from top to bottom is the 2nd column, the 3rd column, ..., and the b column gets the third feature map.
  • the splicing sequence can also be from top to bottom, from right to left; or from bottom to top, from left to right; or from bottom to top, from right to left; or other splicing sequences, which are not limited in this embodiment.
  • the length of the zero data module is 1, the width is 1, and the depth is n, and the values of all elements in the zero data module are zero.
  • the zero padding rule is related to the number of column vectors of the second image matrix B p ⁇ n contained in the convolution kernel and the position of the column vectors in the convolution kernel.
  • each convolution kernel corresponding to the second image matrix B p ⁇ n contains a column vector in the second image matrix B p ⁇ n
  • the corresponding image data processing device can be inserted in the following way Zero data module: the image data processing device respectively inserts zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map to obtain the first feature map.
  • each convolution kernel contains a column vector, f is an odd number, and the column vector is located at the center of the convolution kernel
  • the image data processing device can insert Row zero data module, inserted on the left edge and right edge of the third feature map
  • the column zero data module obtains the first feature map, that is, the front of the first to n channels of the first feature map column, last column, front row and last The row values are all zero.
  • the third feature map obtained by the image data processing device is shown in Figure 2, and then insert a row of zero data modules on the upper edge and lower edge of the third feature map, and insert zero data modules on the left edge and right edge of the third feature map respectively.
  • the first feature map is obtained from a zero-data module in one column, as shown in FIG. 3 .
  • the image data processing device can insert Row zero data module, lower edge inserted Row zero data modules, each inserted at the left edge Column zero data module, right edge insert
  • the column zero data module obtains the first feature map, and the values of L 1 , L 2 , L 3 and L 4 are related to the position of the column vector in the convolution kernel.
  • the column vector is located in the first row and the first column of the convolution kernel, then the image data processing device inserts 1 row of zero data modules at the upper edge, inserts 2 rows of zero data modules at the lower edge, and inserts at the left edge 1 row of zero data modules, insert 2 rows of zero data modules at the right edge, as shown in Figure 4.
  • the system sets that each convolution kernel corresponding to the second image matrix B p ⁇ n contains multiple column vectors in the second image matrix B p ⁇ n , then the corresponding image data processing device can
  • the zero data module is inserted in the following manner: the image data processing device inserts the zero data module at the target position of the third feature map to obtain the first feature map, the target position includes an internal position and an edge position, and the target position is in the volume with the plurality of column vectors
  • the corresponding positions in the product kernel are related.
  • each convolution kernel contains two column vectors W 1 and W 2 , W 1 is located in row 2, column 2 of the convolution kernel, W 2 is located in row 2, column 3 of the convolution kernel List.
  • the image data processing device inserts a column of zero data modules between each column of the third feature map, inserts a row of zero data modules at the upper edge and lower edge of the third feature map, and inserts a row of zero data modules at the left edge and right edge respectively
  • the first feature map is obtained by a zero-data module in one column, as shown in FIG. 5 .
  • the zero padding rule is related to the number of column vectors contained in the convolution kernel and the corresponding position of each column vector in the convolution kernel. Specifically, the user can set the number of column vectors and the corresponding position of the column vector in the convolution kernel. It can also be set in other ways, which is not limited in this embodiment.
  • the image data processing apparatus determines X convolution kernels corresponding to the second image matrix.
  • the image data processing device After the image data processing device acquires the second image matrix B p ⁇ n , it generates X convolution kernels with a size of f ⁇ f ⁇ p corresponding to the second image matrix B p ⁇ n according to the second preset rule, and each convolution
  • the kernel contains one or more column vectors in the second image matrix B p ⁇ n , that is, each convolution kernel contains one or more column elements in the second image matrix B p ⁇ n , and f is an integer greater than 1 , X is an integer greater than or equal to 1.
  • the image data processing device can determine the X convolution kernels corresponding to the second image matrix B p ⁇ n in the following manner: the image data processing device determines the values of n groups of target elements in the X convolution kernels, and determines the X convolution kernels
  • the values of other elements in the product kernel sum except the n groups of target elements are 0, wherein each group of target elements corresponds to a column vector of the second image matrix B p ⁇ n , and the value of the i-th group of target elements is the same as that of the second image
  • the values of the i-th column elements of the matrix B p ⁇ n correspond one-to-one, i is an integer from 1 to n, and each convolution kernel contains one or more groups of target elements in the n groups of target elements.
  • the size of the convolution kernel is preset, that is, the value of f is a preset value.
  • the system can also set the number of column vectors contained in each convolution kernel, that is, set how many column elements in the second image matrix B p ⁇ n each convolution kernel contains. Further, the system can also set the corresponding position of the column vector contained in each convolution kernel in the convolution kernel.
  • the target elements Z 1 , Z 2 are respectively located at The first... of the convolution kernel, the first p channel row number List.
  • the second image matrix B 3 ⁇ 3 corresponds to three groups of target elements b 1,1 , b 2,1 , b 3,1 ; b 1,2 , b 2,2 , b 3,2 ; b 1,3 , b 2,3 , b 3,3 ; and determine the values of elements other than the target element in each convolution kernel to be 0, where each convolution kernel contains a set of targets Elements, the target elements contained in each convolution kernel are located in the second row and second column of the first, second and third channels of the convolution kernel, as shown in Figure 6.
  • the system presets that each convolution kernel includes u column vectors, and u is an integer greater than 1 and less than or equal to f.
  • each convolution kernel contains u groups of target elements.
  • the convolution kernel contains the vth group of target elements respectively located in the jth row vth row k of the convolution kernel's 1st, 2nd, ..., p channels v column, wherein, v is an integer from 1 to u;
  • X is the smallest integer greater than n/u, and the 1st to X-1th convolution kernels among the X convolution kernels each contain u group of target elements, and the Xth convolution kernel contains nu(X-1) groups of target elements.
  • the 1st to X-1th convolution kernels contain the vth group of target elements in the u group of target elements, which are respectively located in the 1st, 2nd, ..., jth row, kvth column , kvth column of the p channel of the convolution kernel
  • the r-th group of target elements in the nu(X-1) group of target elements contained in X convolution kernels is located in the j r row and k r column of the 1st, 2nd, ..., p channels of the convolution kernel, where, r is an integer from 1 to nu(X-1).
  • each convolution kernel contains 2 column vectors W 1 and W 2 , W 1 is located in row 2, column 2 of the convolution kernel, W 2 is located in row 2, column 3 of the convolution kernel List.
  • the image data processing device determines that the second image matrix B 3 ⁇ 3 corresponds to three groups of target elements b 1,1 , b 2,1 , b 3,1 ; b 1,2 , b 2,2 , b 3,2 ; b 1,3 , b 2,3 , b 3,3 ; and determine the values of other elements in each convolution kernel except the target element to be 0, wherein the first convolution kernel contains the second First column elements b 1,1 , b 2,1 , b 3,1 and second column elements b 1,2 , b 2,2 , b 3,2 in image matrix B 3 ⁇ 3 , volume 2
  • the product kernel includes elements b 1,3 , b 2,3 , and b 3,3 in the third column of the second image matrix B 3 ⁇ 3 , as shown in FIG. 7 .
  • the image data processing device convolves the first feature map with X convolution kernels to obtain a second feature map.
  • the first feature map is convolved with the X convolution kernels to obtain the second feature map, and the second feature map and the third image matrix C m ⁇ Corresponding to n , the third image matrix C m ⁇ n is the result of matrix multiplication of the first image matrix A m ⁇ p and the second image matrix B p ⁇ n .
  • each convolution kernel contains a column vector of a second image matrix, and the column vector is located at the center of the convolution kernel, then the image data processing device combines the first feature map with n convolution kernels
  • the length of the second feature map obtained by performing convolution is a, the width is b, and the depth is n.
  • Each data module of the second feature map in the depth direction 1 ⁇ 1 ⁇ n corresponds to the column vector of the third image matrix C m ⁇ n , and the order of the column vectors is concatenated in the distribution order of the second feature map and the third feature map Sequential correspondence, that is, the mapping method of the second feature map to C m ⁇ n is the same as the mapping method of the third feature map to A m ⁇ p .
  • each convolution kernel contains a plurality of column vectors of the second image matrix, and after the image data processing device convolutes the first feature map with n convolution kernels to obtain the second feature map, according to The preset rearrangement rule rearranges the data modules in the second feature map to obtain a fourth feature map.
  • the length of the fourth feature map is a
  • the width is b
  • the depth is n.
  • Each of the fourth feature maps corresponds to the column vector of the third image matrix C m ⁇ n in the depth direction of 1 ⁇ 1 ⁇ n data, and the order of the column vectors is in the distribution order of the fourth feature map and the concatenation order of the third feature map correspond.
  • each convolution kernel contains a column vector of the second image matrix, and the column vector is not located at the center of the convolution kernel, then the image data processing device convolves the first feature map with n convolution kernels After the second feature map is obtained, the zero data module at the preset position is deleted to obtain the fifth feature map.
  • the length of the fifth feature map is a, the width is b, and the depth is n.
  • Each fifth feature map with 1 ⁇ 1 ⁇ n data in the depth direction corresponds to the column vector of the third image matrix C m ⁇ n , and the order of the column vectors is in the distribution order of the fifth feature map and the splicing order of the third feature map correspond.
  • the technical solution of this application can obtain the first image matrix A m ⁇ p and the second image matrix B p ⁇ n , and determine the first feature map corresponding to the first image matrix A m ⁇ p , and the second image matrix B p ⁇ n n corresponds to X convolution kernels of size f ⁇ f ⁇ p, and then the first feature map is convolved with these X convolution kernels and multiplied with the feature map corresponding to the third image matrix, where f is greater than An integer of 1, the third image matrix is obtained by multiplying the first image matrix and the second image matrix.
  • the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module.
  • the embodiment of the present application provides multiple ways of determining the first feature map and multiple ways of determining the convolution kernel, which improves the flexibility of the solution.
  • the present application also provides an image data processing device, electronic equipment, and corresponding embodiments.
  • FIG. 8 is a schematic structural diagram of an image data processing device shown in an embodiment of the present application.
  • the image data processing apparatus 800 in this embodiment includes: an acquisition module 801 , a first determination module 802 , a second determination module 803 , and a convolution module 804 .
  • An acquisition module 801 configured to acquire a first image matrix A m ⁇ p and a second image matrix B p ⁇ n ;
  • the first determining module 802 is configured to determine the first feature map corresponding to the first image matrix A m ⁇ p ;
  • the second determination module 803 is configured to determine X convolution kernels corresponding to the second image matrix B p ⁇ n , the size of each convolution kernel is f ⁇ f ⁇ p, where f is an integer greater than 1, and X is An integer greater than or equal to 1, each convolution kernel contains one or more columns of elements in the second image matrix B p ⁇ n ;
  • the convolution module 804 is used to convolve the first feature map with X convolution kernels to obtain a second feature map, the second feature map corresponds to the third image matrix, and the third image matrix is the first image matrix A m ⁇ The result of matrix multiplication of p and the second image matrix B p ⁇ n .
  • the acquisition module 801 can acquire the first image matrix A p ⁇ p and the second image matrix B p ⁇ n , and the first determination module 802 can determine the first feature map corresponding to the first image matrix A m ⁇ p , the second determination module 803 can determine the X convolution kernels whose size is f ⁇ f ⁇ p corresponding to the second image matrix B p ⁇ n , and then the convolution module 804 can combine the first feature map with the X convolution kernels Carry out convolution multiplication with the feature map corresponding to the third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by matrix multiplication of the first image matrix and the second image matrix.
  • the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module.
  • Convolution module 904 For ease of understanding, the image data processing device in this application will be described in detail below. Please refer to FIG. Convolution module 904 .
  • An acquisition module 901 configured to acquire a first image matrix A m ⁇ p and a second image matrix B p ⁇ n ;
  • the first determining module 902 is configured to determine the first feature map corresponding to the first image matrix A m ⁇ p ;
  • the second determining module 903 is configured to determine X convolution kernels corresponding to the second image matrix B p ⁇ n , the size of each convolution kernel is f ⁇ f ⁇ p, where f is an integer greater than 1, and X is An integer greater than or equal to 1, each convolution kernel contains one or more columns of elements in the second image matrix B p ⁇ n ;
  • Convolution module 904 configured to convolve the first feature map with X convolution kernels to obtain a second feature map, the second feature map corresponds to the third image matrix, and the third image matrix is the first image matrix A m ⁇ The result of matrix multiplication of p and the second image matrix B p ⁇ n .
  • the second determination module 903 includes: a first determination unit 9031 and a second determination unit 9032 .
  • the first determination unit 9031 is configured to determine the values of n groups of target elements in the X convolution kernels, the value of the i-th group of target elements in the n groups of target elements corresponds to the value of the i-th column element in the second image matrix, i is an integer from 1 to n, and each convolution kernel contains one or more sets of target elements in the n sets of target elements;
  • the second determination unit 9032 is configured to determine that values of other elements in the X convolution kernels are 0 except for the n groups of target elements.
  • each convolution kernel contains a group of target elements Z 1 , Z 2 ,...,Z p in n groups of target elements, and for each convolution kernel, the convolution kernel contains target elements Z 1 , Z 2 ,..., Z p are respectively located in row j and column k of the 1st, 2nd,...,p channels of the convolution kernel.
  • the first determining module 902 includes: a third determining unit 9021 , a first splicing unit 9022 , and a first zero padding unit 9023 .
  • the third determination unit 9021 is configured to determine m data modules corresponding to the m row vectors of the first image matrix A m ⁇ p , the length of these m data modules is 1, the width is 1, and the depth is p;
  • the first zero padding unit 9023 is used to insert zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map respectively to obtain the first feature map, the length of the zero data module is 1, and the width is 1 , at depth p, all elements in the zero data module have the value zero.
  • each convolution kernel contains u group of target elements in n groups of target elements, u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the vth group of targets contained in the convolution kernel
  • the p target elements in the element are respectively located in the 1st, 2nd..., p channels of the convolution kernel at the j vth row and the k vth column, where v is an integer from 1 to u.
  • the first determination module 902 includes: a fourth determination unit 9024 , a second splicing unit 9025 , and a second zero padding unit 9026 .
  • the fourth determination unit 9024 is configured to determine m data modules corresponding to the m row vectors of the first image matrix A m ⁇ p , the length of these m data modules is 1, the width is 1, and the depth is p;
  • the second zero padding unit 9026 is used to insert a zero data module at the target position of the third feature map to obtain the first feature map.
  • the target position includes internal positions and edge positions, and the target position corresponds to multiple groups of target elements in the convolution kernel. Position-dependent, the length of the zero data block is 1, the width is 1, and the depth is n, and the value of all elements in the zero data block is zero.
  • the acquisition module 901 can acquire the first image matrix A m ⁇ p and the second image matrix B p ⁇ n , and the first determination module 902 can determine the first feature map corresponding to the first image matrix A m ⁇ p , the second determination module 903 can determine the X convolution kernels whose size is f ⁇ f ⁇ p corresponding to the second image matrix B p ⁇ n , and then the convolution module 904 can combine the first feature map with the X convolution kernels Carry out convolution multiplication with the feature map corresponding to the third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by matrix multiplication of the first image matrix and the second image matrix.
  • the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module.
  • the embodiment of the present application provides ways to determine the first feature map and the convolution kernel in multiple ways, which improves the flexibility of the solution.
  • FIG. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
  • an electronic device 1000 includes a memory 1010 and a processor 1020 .
  • the processor 1020 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 1010 may include various types of storage units such as system memory, read only memory (ROM), and persistent storage.
  • the ROM may store static data or instructions required by the processor 1020 or other modules of the computer.
  • the persistent storage device may be a readable and writable storage device.
  • Persistent storage may be a non-volatile storage device that does not lose stored instructions and data even if the computer is powered off.
  • the permanent storage device adopts a mass storage device (such as a magnetic or optical disk, flash memory) as the permanent storage device.
  • the permanent storage device may be a removable storage device (such as a floppy disk, an optical drive).
  • the system memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory.
  • System memory can store some or all of the instructions and data that the processor needs at runtime.
  • the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (such as DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and magnetic disks and/or optical disks may also be used.
  • memory 1010 may include a readable and/or writable removable storage device, such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc.
  • a readable and/or writable removable storage device such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc.
  • Computer-readable storage media do not contain carrier waves and transient electronic signals transmitted by wireless or wire.
  • Executable codes are stored in the memory 1010 , and when the executable codes are processed by the processor 1020 , the processor 1020 may execute part or all of the methods mentioned above.
  • the method according to the present application can also be implemented as a computer program or computer program product, the computer program or computer program product including computer program code instructions for executing some or all of the steps in the above method of the present application.
  • the present application may also be implemented as a computer-readable storage medium (or a non-transitory machine-readable storage medium or a machine-readable storage medium), on which executable code (or computer program or computer instruction code) is stored,
  • executable code or computer program or computer instruction code
  • the processor of the electronic device or server, etc.
  • the processor is made to perform part or all of the steps of the above-mentioned method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to an image data processing method and apparatus, a device, and a storage medium. The method comprises: acquiring a first image matrix Am×p and a second image matrix Bp×n; determining a first feature map corresponding to the first image matrix Am×p; determining X convolution kernels corresponding to the second image matrix Bp×n, wherein the size of each convolution kernel is f × f × p, f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel comprises one or more columns of elements in the second image matrix Bp×n; and performing convolution on the first feature map and the X convolution kernels to obtain a second feature map, wherein the second feature map corresponds to a third image matrix, and the third image matrix is a result obtained by performing matrix multiplication of the first image matrix Am×p and the second image matrix Bp×n. The solution provided by the present application can improve the flexibility of a transformer module.

Description

图像数据处理方法、装置、设备及存储介质Image data processing method, device, equipment and storage medium
本申请要求于2021年12月6日提交国家知识产权局、申请号为2021114776272、申请名称为“图像数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application with application number 2021114776272 and application title "Image Data Processing Method and Apparatus" filed with the State Intellectual Property Office on December 6, 2021, the entire contents of which are incorporated herein by reference .
技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种图像数据处理方法、装置、设备及存储介质。The present application relates to the technical field of image processing, and in particular to an image data processing method, device, equipment and storage medium.
背景技术Background technique
当前的主流机器翻译主要是基于神经网络机器翻译,这类方法是一个“编码器-解码器”(encoder-decoder)架构的系统,编码器对源语言序列进行编码,并提取信息,然后通过解码器把信息转换为目标语言,完成语言翻译过程。基于“编码器-解码器”架构设计的深度自注意力变换(Transformer)模型凭借其优越的性能,已然成为机器翻译领域的主流模型,在深度学习领域产生了巨大影响。The current mainstream machine translation is mainly based on neural network machine translation. This type of method is an "encoder-decoder" (encoder-decoder) architecture system. The encoder encodes the source language sequence, extracts information, and then The translator converts the information into the target language and completes the language translation process. The deep self-attention transform (Transformer) model based on the "encoder-decoder" architecture design has become the mainstream model in the field of machine translation due to its superior performance, and has had a huge impact in the field of deep learning.
而在以Transformer为主要模块的神经网络中,存在两个二维的数据张量进行矩阵相乘的操作,在一些方案中,一般将Transformer模型部署在支持矩阵相乘的芯片中使用。In the neural network with Transformer as the main module, there are two two-dimensional data tensors for matrix multiplication operations. In some solutions, the Transformer model is generally deployed in a chip that supports matrix multiplication.
但是市面上有些芯片只支持卷积计算,这样需要进行矩阵相乘操作的Transformer模块就无法部署在这些芯片上,这就使得Transformer模块的使用受限,灵活性较差。However, some chips on the market only support convolution calculations, so Transformer modules that require matrix multiplication operations cannot be deployed on these chips, which limits the use of Transformer modules and is less flexible.
发明内容Contents of the invention
为解决或部分解决相关技术中存在的问题,本申请提供一种图像数据处理方法、装置、设备及存储介质,能够提高Transformer模块的灵活性。In order to solve or partially solve the problems existing in the related technologies, the present application provides an image data processing method, device, equipment and storage medium, which can improve the flexibility of the Transformer module.
本申请第一方面提供一种图像数据处理方法,包括:获取第一图像矩阵A m×p以及第二图像矩阵B p×n;确定所述第一图像矩阵A m×p对应的第一特征图;确定所述第二图像矩阵B p×n对应的X个卷积核;将所述第一特征图与所述X个卷积核进行卷积得到第二特征图,所述第二特征图与第三图像矩阵对应,所述第三图像矩阵为第一图像矩阵A m×p以及第二图像矩阵B p×n进行矩阵相乘得到的结果。 The first aspect of the present application provides an image data processing method, including: acquiring the first image matrix A m×p and the second image matrix B p×n ; determining the first feature corresponding to the first image matrix A m×p Figure; determine the X convolution kernels corresponding to the second image matrix B p×n ; convolve the first feature map with the X convolution kernels to obtain a second feature map, the second feature The figure corresponds to the third image matrix, which is the result of matrix multiplication of the first image matrix A m×p and the second image matrix B p×n .
在一实施方式中,每个卷积核的大小为f×f×p,其中,所述f为大于1的整数,所述X为大于或等于1的整数。In one embodiment, the size of each convolution kernel is f×f×p, wherein the f is an integer greater than 1, and the X is an integer greater than or equal to 1.
在一实施方式中,每个卷积核包含所述第二图像矩阵B p×n中的一列或多列元素。 In one embodiment, each convolution kernel includes one or more columns of elements in the second image matrix B p×n .
在一实施方式中,所述确定所述第二图像矩阵B p×n对应的X个卷积核包括:确定所述X个卷积核中的n组目标元素的值,所述n组目标元素中第i组目标元素的值与第二图像矩阵中第i列元素的值对应,所述i为1至n的整数,每个卷积核包含所述n组目标元素中的一组或多组目标元素;确定所述X个卷积核中除了所述n组目标元素以外的其他元素的值为0。 In an embodiment, the determining the X convolution kernels corresponding to the second image matrix B p×n includes: determining the values of n groups of target elements in the X convolution kernels, and the n groups of target elements The value of the i-th group of target elements in the element corresponds to the value of the i-th column element in the second image matrix, the i is an integer from 1 to n, and each convolution kernel contains one or more of the n groups of target elements A plurality of groups of target elements; determine the values of other elements in the X convolution kernels except for the n groups of target elements to be 0.
在一实施方式中,每个卷积核包含所述n组目标元素中的一组目标元素Z 1,Z 2,…,Z p,针对每个卷积核,该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1,2,…,p个通道的第j行第k列; In one embodiment, each convolution kernel contains a group of target elements Z 1 , Z 2 ,..., Z p in the n groups of target elements, and for each convolution kernel, the target elements contained in the convolution kernel Z 1 , Z 2 ,..., Z p are respectively located in row j, column k of the 1st, 2nd,..., p channels of the convolution kernel;
所述确定所述第一图像矩阵A m×p对应的第一特征图包括:确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p;按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;在所述第三特征图的上方边缘、下方边缘、左方边缘以及右方边缘分别插入零数据模块得到第一特征图,所述零数据模块的长度为1,宽度为1,深度为p,所述零数据模块中所有元素的值为零。 The determining the first feature map corresponding to the first image matrix A m×p includes: determining m data modules corresponding to the m row vectors of the first image matrix A m×p , the data modules The length is 1, the width is 1, and the depth is p; the m data modules are spliced according to a preset splicing rule to obtain a third feature map, and the length of the third feature map is a, the width is b, and the depth is p, said a×b=m; insert zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map respectively to obtain the first feature map, the length of the zero data module is 1, the width is 1, the depth is p, and the values of all elements in the zero data module are zero.
在一实施方式中,每个卷积核包含所述n组目标元素中的u组目标元素,所述u为大于1且小于或等于f的整数,针对每个卷积核,该卷积核包含的第v组目标元素中的p个目标元素分别位于该卷积核的第1,2…,p个通道的第j v行第k v列,其中v为1至u的整数; In one embodiment, each convolution kernel includes u groups of target elements in the n groups of target elements, the u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the convolution kernel The p target elements contained in the vth group of target elements are respectively located in the first, second..., p channels of the convolution kernel, the j vth row and the kvth column kv , where v is an integer from 1 to u;
所述确定所述第一图像矩阵A m×p对应的第一特征图包括:确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p;按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;在所述第三特征图的目标位置插入零数据模块得到第一特征图,所述目标位置包括内部位置和边缘位置,所述目标位置与所述多组目标元素在所述卷积核中对应的位置相关,所述零数据模块的长度为1,宽度为1,深度为n,所述零数据模块中所有元素的值为零。 The determining the first feature map corresponding to the first image matrix A m×p includes: determining m data modules corresponding to the m row vectors of the first image matrix A m×p , the data modules The length is 1, the width is 1, and the depth is p; the m data modules are spliced according to a preset splicing rule to obtain a third feature map, and the length of the third feature map is a, the width is b, and the depth is p, said a×b=m; a zero data module is inserted into the target position of the third feature map to obtain the first feature map, the target position includes an internal position and an edge position, and the target position and the multiple groups The corresponding positions of the target elements in the convolution kernel are related, the length of the zero data module is 1, the width is 1, and the depth is n, and the values of all elements in the zero data module are zero.
本申请第二方面提供一种图像数据处理装置,包括:获取模块,用于获取第一图像矩阵A m×p以及第二图像矩阵B p×n;第一确定模块,用于确定所述第一图像矩阵A m×p对应的第一特征图;第二确定模块,用于确定所述第二图像矩阵B p×n对应的X个卷积核;卷积模块,用于将所述第一特征 图与所述X个卷积核进行卷积得到第二特征图,所述第二特征图与第三图像矩阵对应,所述第三图像矩阵为第一图像矩阵A m×p以及第二图像矩阵B p×n进行矩阵相乘得到的结果。 The second aspect of the present application provides an image data processing device, including: an acquisition module, used to acquire a first image matrix A m×p and a second image matrix B p×n ; a first determination module, used to determine the first image matrix A first feature map corresponding to an image matrix A m×p ; a second determination module, used to determine X convolution kernels corresponding to the second image matrix B p×n ; a convolution module, used to convert the first A feature map is convolved with the X convolution kernels to obtain a second feature map, the second feature map corresponds to a third image matrix, and the third image matrix is the first image matrix A m×p and the second image matrix The result of matrix multiplication of the two image matrix B p×n .
在一实施方式中,所述第二确定模块确定的X个卷积核中,每个卷积核的大小为f×f×p,其中,所述f为大于1的整数,所述X为大于或等于1的整数,每个卷积核包含所述第二图像矩阵B p×n中的一列或多列元素。 In one embodiment, among the X convolution kernels determined by the second determination module, the size of each convolution kernel is f×f×p, wherein, the f is an integer greater than 1, and the X is An integer greater than or equal to 1, each convolution kernel contains one or more columns of elements in the second image matrix B p×n .
在一实施方式中,所述第二确定模块包括:第一确定单元,用于确定所述X个卷积核中的n组目标元素的值,所述n组目标元素中第i组目标元素的值与第二图像矩阵中第i列元素的值对应,所述i为1至n的整数,每个卷积核包含所述n组目标元素中的一组或多组目标元素;第二确定单元,用于确定所述X个卷积核中除了所述n组目标元素以外的其他元素的值为0。In an embodiment, the second determination module includes: a first determination unit, configured to determine the values of n groups of target elements in the X convolution kernels, the i-th group of target elements in the n groups of target elements The value corresponds to the value of the i-th column element in the second image matrix, the i is an integer from 1 to n, and each convolution kernel contains one or more groups of target elements in the n groups of target elements; the second A determining unit, configured to determine that values of other elements in the X convolution kernels except the n groups of target elements are 0.
在一实施方式中,每个卷积核包含所述n组目标元素中的一组目标元素Z 1,Z 2,…,Z p,针对每个卷积核,该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1,2,…,p个通道的第j行第k列; In one embodiment, each convolution kernel contains a group of target elements Z 1 , Z 2 ,..., Z p in the n groups of target elements, and for each convolution kernel, the target elements contained in the convolution kernel Z 1 , Z 2 ,..., Z p are respectively located in row j, column k of the 1st, 2nd,..., p channels of the convolution kernel;
所述第一确定模块包括:第三确定单元,用于确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p;第一拼接单元,用于按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;第一补零单元,用于在所述第三特征图的上方边缘,下方边缘,左方边缘以及右方边缘分别插入零数据模块得到第一特征图,所述零数据模块的长度为1,宽度为1,深度为p,所述零数据模块中所有元素的值为零。 The first determination module includes: a third determination unit, configured to determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of the data modules is 1, and the width is 1. The depth is p; the first splicing unit is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is is p, the a×b=m; the first zero padding unit is used to insert zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map respectively to obtain the first feature map , the length of the zero data module is 1, the width is 1, and the depth is p, and the values of all elements in the zero data module are zero.
在一实施方式中,每个卷积核包含所述n组目标元素中的u组目标元素,所述u为大于1且小于或等于f的整数,针对每个卷积核,该卷积核包含的第v组目标元素中的p个目标元素分别位于该卷积核的第1,2…,p个通道的第j v行第k v列,其中v为1至u的整数; In one embodiment, each convolution kernel includes u groups of target elements in the n groups of target elements, the u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the convolution kernel The p target elements contained in the vth group of target elements are respectively located in the first, second..., p channels of the convolution kernel, the j vth row and the kvth column kv , where v is an integer from 1 to u;
所述第一确定模块包括:第四确定单元,用于确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p;第二拼接单元,用于按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;第二补零单元,用于在所述第三特征图的目标位置插入零数据模块得到第一特征图,所述目标位置包括内部位置 和边缘位置,所述目标位置与所述多组目标元素在所述卷积核中对应的位置相关,所述零数据模块的长度为1,宽度为1,深度为n,所述零数据模块中所有元素的值为零。 The first determination module includes: a fourth determination unit, configured to determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of the data modules is 1, and the width is 1. The depth is p; the second splicing unit is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is is p, the a×b=m; the second zero padding unit is used to insert a zero data module at the target position of the third feature map to obtain the first feature map, and the target position includes an internal position and an edge position, The target position is related to the corresponding positions of the plurality of groups of target elements in the convolution kernel, the length of the zero data module is 1, the width is 1, and the depth is n, and all elements in the zero data module The value is zero.
本申请第三方面提供一种电子设备,包括:处理器;以及存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如上所述的方法。The third aspect of the present application provides an electronic device, including: a processor; and a memory, on which executable code is stored, and when the executable code is executed by the processor, the processor is made to execute the above-mentioned Methods.
本申请第四方面提供一种计算机可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如上所述的方法。A fourth aspect of the present application provides a computer-readable storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is caused to execute the above-mentioned method.
本申请的技术方案可以获取获取第一图像矩阵A m×p以及第二图像矩阵B p×n,并确定第一图像矩阵A m×p对应的第一特征图,以及第二图像矩阵B p×n对应的X个大小为f×f×p的卷积核,然后将第一特征图与这X个卷积核进行卷积相乘与第三图像矩阵对应的特征图,其中,f为大于1的整数,第三图像矩阵即第一图像矩阵以及第二图像矩阵进行矩阵相乘得到的。也就是说,本申请实施例可以用卷积运算来替代矩阵乘法运算,得出对应的特征图,解决了Transformer模块无法部署在只支持卷积计算的芯片的问题,提高了Transformer模块的灵活性。 The technical solution of this application can acquire the first image matrix A m×p and the second image matrix B p×n , and determine the first feature map corresponding to the first image matrix A m×p and the second image matrix B p ×n corresponds to X convolution kernels of size f×f×p, and then the first feature map is convolved with these X convolution kernels and multiplied with the feature map corresponding to the third image matrix, where f is An integer greater than 1, the third image matrix is obtained by multiplying the first image matrix and the second image matrix. That is to say, in the embodiment of the present application, the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module .
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
附图说明Description of drawings
通过结合附图对本申请示例性实施方式进行更详细的描述,本申请的上述以及其它目的、特征和优势将变得更加明显,其中,在本申请示例性实施方式中,相同的参考标号通常代表相同部件。The above and other objects, features and advantages of the present application will become more apparent by describing the exemplary embodiments of the present application in more detail with reference to the accompanying drawings, wherein, in the exemplary embodiments of the present application, the same reference numerals generally represent same parts.
图1是本申请实施例示出的图像数据处理方法的流程示意图;FIG. 1 is a schematic flow diagram of an image data processing method shown in an embodiment of the present application;
图2是本申请实施例示出的图像数据处理方法中第三特征图的示意图;2 is a schematic diagram of a third feature map in the image data processing method shown in the embodiment of the present application;
图3是本申请实施例示出的图像数据处理方法中第一特征图的示意图;Fig. 3 is a schematic diagram of the first feature map in the image data processing method shown in the embodiment of the present application;
图4是本申请实施例示出的图像数据处理方法中第一特征图的另一示意图;Fig. 4 is another schematic diagram of the first feature map in the image data processing method shown in the embodiment of the present application;
图5是本申请实施例示出的图像数据处理方法中第一特征图的另一示意图;Fig. 5 is another schematic diagram of the first feature map in the image data processing method shown in the embodiment of the present application;
图6是本申请实施例示出的图像数据处理方法中X个卷积核的示意图;6 is a schematic diagram of X convolution kernels in the image data processing method shown in the embodiment of the present application;
图7是本申请实施例示出的图像数据处理方法中X个卷积核的另一示意图;7 is another schematic diagram of X convolution kernels in the image data processing method shown in the embodiment of the present application;
图8是本申请实施例示出的图像数据处理装置的结构示意图;FIG. 8 is a schematic structural diagram of an image data processing device shown in an embodiment of the present application;
图9是本申请实施例示出的图像数据处理装置的另一结构示意图;FIG. 9 is another schematic structural diagram of an image data processing device shown in an embodiment of the present application;
图10是本申请实施例示出的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本申请的实施方式。虽然附图中显示了本申请的实施方式,然而应该理解,可以以各种形式实现本申请而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本申请更加透彻和完整,并且能够将本申请的范围完整地传达给本领域的技术人员。Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the scope of this application to those skilled in the art.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。应当理解,尽管在本申请可能采用术语“第一”、“第二”、“第三”等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should be understood that although the terms "first", "second", "third" and so on may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present application, "plurality" means two or more, unless otherwise specifically defined.
相关技术中,Transformer模块的使用受限,灵活性较差。针对上述问题,本申请实施例提供一种图像数据处理方法,能够提高Transformer模块的灵活性。In the related art, the use of the Transformer module is limited and the flexibility is poor. In view of the above problems, the embodiment of the present application provides an image data processing method, which can improve the flexibility of the Transformer module.
为了便于理解,下面对本申请实施例涉及的一些术语进行介绍。For ease of understanding, some terms involved in the embodiments of the present application are introduced below.
图像矩阵:数字图像数据可以用矩阵来表示,因此可以采用矩阵理论和矩阵算法对数字图像进行分析和处理。由于数字图像可以表示为矩阵的形式,所以在计算机数字图像处理程序中,通常用二维数组来存放图像数据。Image matrix: Digital image data can be represented by a matrix, so matrix theory and matrix algorithms can be used to analyze and process digital images. Since a digital image can be expressed in the form of a matrix, in a computer digital image processing program, a two-dimensional array is usually used to store image data.
卷积核:卷积核就是图像处理时,给定输入图像,输入图像中一个小区域中像素加权平均后成为输出图像中的每个对应像素,其中权值由一个函数定义,这个函数称为卷积核。Convolution kernel: The convolution kernel is when the image is processed, given the input image, the weighted average of the pixels in a small area of the input image becomes each corresponding pixel in the output image, where the weight is defined by a function, this function is called convolution kernel.
特征图(feature map):在神经网络的卷积层,数据都是以三维形式存在的,可以把它看成许多个二维图片叠在一起,其中每一个称为一个特征图。在输入层,如果是灰度图片,那就只有一个特征图;如果是彩色图片,一般就是3个特征图(红绿蓝)。层与层之间会有若干个卷积核,上 一层和每个特征图跟每个卷积核做卷积,都会产生下一层的一个特征图。Feature map: In the convolutional layer of the neural network, the data exists in three-dimensional form, which can be regarded as many two-dimensional pictures stacked together, each of which is called a feature map. In the input layer, if it is a grayscale image, there is only one feature map; if it is a color image, there are generally 3 feature maps (red, green, and blue). There will be several convolution kernels between layers, and the convolution of the previous layer and each feature map with each convolution kernel will generate a feature map of the next layer.
数据模块:在图像领域通常为一个三维数组,用于表示图像的像素值,长度表示图像的高度,宽度表示图像的宽度,深度表示图像的色彩通道数。Data module: In the image field, it is usually a three-dimensional array, which is used to represent the pixel value of the image, the length represents the height of the image, the width represents the width of the image, and the depth represents the number of color channels of the image.
需要说明的是,本实施例中的图像数据处理方法可以用于包含Transformer模块的的神经网络的图像处理,也可以用于其他需要进行矩阵乘法运算的神经网络,具体本实施例不作限定。It should be noted that the image data processing method in this embodiment can be used for image processing of a neural network including a Transformer module, and can also be used for other neural networks that require matrix multiplication, which is not specifically limited in this embodiment.
需要说明的是,本实施例中的图像数据处理装置可以包括Transformer模块,或者其他需要进行矩阵乘法的模块,具体本实施例不作限定。It should be noted that the image data processing apparatus in this embodiment may include a Transformer module, or other modules that need to perform matrix multiplication, which is not specifically limited in this embodiment.
本申请方法可应用于处理器中,该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器。人工智能运算可包括机器学习运算、类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器例如包括人工智能芯片处理器、GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。The method of the present application can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence calculations. Artificial intelligence computing may include machine learning computing, brain-like computing, and the like. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and the like. The artificial intelligence processor includes, for example, an artificial intelligence chip processor, GPU (Graphics Processing Unit, graphics processing unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), One or a combination of Field-Programmable Gate Array (FPGA) chips.
该人工智能处理器可以是应用于人工智能芯片中的处理器。人工智能芯片例如可以是神经网络芯片或其他芯片,神经网络芯片例如可以是卷积神经网络推理芯片、ASIC芯片等。本申请对处理器的具体类型不作限制。The artificial intelligence processor may be a processor applied in an artificial intelligence chip. The artificial intelligence chip may be, for example, a neural network chip or other chips, and the neural network chip may be, for example, a convolutional neural network reasoning chip, an ASIC chip, or the like. The present application does not limit the specific type of the processor.
在一种可能的实现方式中,本申请中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本申请对处理单元及处理单元所运行的任务不作限制。处理器中的多个处理单元既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。In a possible implementation, the processor mentioned in this application may include a plurality of processing units, and each processing unit may independently run various assigned tasks, such as: convolution operation tasks, pooling tasks Or fully connected tasks, etc. The application does not limit the processing unit and the tasks run by the processing unit. Multiple processing units in the processor can not only share part of the storage space, for example share part of the RAM storage space and the register file, but also have their own storage space at the same time.
以下结合附图详细描述本申请实施例的技术方案。The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.
图1是本申请实施例示出的图像数据处理方法的流程示意图。该方法可应用于处理器,该处理器可以包括通用处理器、人工智能处理器等,其中人工智能处理器可以是人工智能芯片处理器或GPU等。FIG. 1 is a schematic flowchart of an image data processing method shown in an embodiment of the present application. The method can be applied to a processor, and the processor can include a general-purpose processor, an artificial intelligence processor, etc., wherein the artificial intelligence processor can be an artificial intelligence chip processor or a GPU, etc.
参见图1,本实施例中图像数据处理方法包括:Referring to Fig. 1, image data processing method comprises in the present embodiment:
101、图像数据处理装置获取第一图像矩阵以及第二图像矩阵。101. An image data processing apparatus acquires a first image matrix and a second image matrix.
图像数据处理装置获取需要进行矩阵相乘的第一图像矩阵A m×p以及第二图像矩阵B p×n。其中,第一图像矩阵A m×p或第二图像矩阵B p×n可以是原始图像矩阵,也可以是经过矩阵运算后得到的图像矩阵,具体本实施例不作限定。 The image data processing device acquires the first image matrix A m×p and the second image matrix B p×n that need to be multiplied. Wherein, the first image matrix A m×p or the second image matrix B p×n may be an original image matrix, or an image matrix obtained after matrix operation, which is not limited in this embodiment.
102、图像数据处理装置确定第一图像矩阵对应的第一特征图。102. The image data processing apparatus determines a first feature map corresponding to the first image matrix.
图像数据处理装置获取第一图像矩阵A m×p后,按照第一预设规则生成第一图像矩阵A m×p对应的第一特征图。 After the image data processing device acquires the first image matrix A m×p , it generates a first feature map corresponding to the first image matrix A m×p according to a first preset rule.
例如例如,图像数据处理装置可以通过如下方式生成第一图像矩阵A m×p对应的第一特征图: For example, the image data processing device may generate the first feature map corresponding to the first image matrix A m×p in the following manner:
S1、确定与第一图像矩阵A m×p的m个行向量对应的m个数据模块。 S1. Determine m data modules corresponding to the m row vectors of the first image matrix A m×p .
图像数据处理装置针对第一图像矩阵A m×p的每一个行向量,生成该行向量对应的数据模块,该数据模块的长度为1,宽度为1,深度为p,该数据模块包含的p个元素与行向量的p个元素一一对应。例如,图像数据处理装置可以通过矩阵变换函数生成该行向量对应的数据模块,还可以通过其他方式生成行向量对应的数据模块,具体本实施例不作限定。 For each row vector of the first image matrix A m×p , the image data processing device generates a data module corresponding to the row vector, the length of the data module is 1, the width is 1, and the depth is p, and the data module contains p elements correspond one-to-one to the p elements of the row vector. For example, the image data processing device may generate the data module corresponding to the row vector through a matrix transformation function, or may generate the data module corresponding to the row vector through other methods, which are not specifically limited in this embodiment.
S2、按照预设的拼接规则对m个数据模块进行拼接得到第三特征图。S2. Splicing the m data modules according to a preset splicing rule to obtain a third feature map.
图像数据处理装置生成m个数据模块后,按照预设的拼接规则对m个数据模块进行拼接得到第三特征图,第三特征图的长度为a,宽度为b,深度为p,a×b=m。After the image data processing device generates m data modules, the m data modules are spliced according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is p, a×b =m.
例如,拼接规则包括拼接顺序,该拼接顺序可以是从上到下,从左倒右,即按照从上到下的顺序拼接a个数据模块,排完第三特征图的第1列,再按照从上到下的顺序排第2列,第3列,…,第b列得到第三特征图。拼接顺序还可以是从上到下,从右到左;或者从下到上,从左到右;或者从下到上,从右到左;或者其他拼接顺序,具体本实施例不作限定。For example, the splicing rules include the splicing order, which can be from top to bottom, from left to right, that is, a data module is spliced in the order from top to bottom, the first column of the third feature map is arranged, and then according to The order from top to bottom is the 2nd column, the 3rd column, ..., and the b column gets the third feature map. The splicing sequence can also be from top to bottom, from right to left; or from bottom to top, from left to right; or from bottom to top, from right to left; or other splicing sequences, which are not limited in this embodiment.
示例性的,第一图像矩阵
Figure PCTCN2022122544-appb-000001
a=2,b=2,拼接顺序为从上到下,从左到右,图像数据处理装置生成A 4×3中4个行向量[a 1,1 a 1,2 a 1,3],[a 2,1 a 2,2 a 2,3],[a 3,1 a 3,2 a 3,3],[a 4,1 a 4,2 a 4,3]对应的4个数据模块M 1,M 2,M 3和M 4,然后图像数据处理装置按照从上到下的顺序将M 2拼接在M 1的下面得到第1列,再将M 3拼接到M 1的左边,然后再将M 4拼接到M 3的下面得到第2列,如图2所示。
Exemplary, the first image matrix
Figure PCTCN2022122544-appb-000001
a=2, b=2, the splicing sequence is from top to bottom, from left to right, and the image data processing device generates 4 row vectors in A 4×3 [a 1,1 a 1,2 a 1,3 ], [a 2,1 a 2,2 a 2,3 ], [a 3,1 a 3,2 a 3,3 ], [a 4,1 a 4,2 a 4,3 ] correspond to 4 data modules M 1 , M 2 , M 3 and M 4 , then the image data processing device splices M 2 under M 1 in order from top to bottom to obtain the first column, then splices M 3 to the left of M 1 , and then Then splice M4 to the bottom of M3 to get the second column, as shown in Figure 2.
S3、按照预设的补零规则在第三特征图中插入零数据模块得到第一特征图。S3. Insert a zero data module into the third feature map according to a preset zero padding rule to obtain the first feature map.
应理解,本实施例中,零数据模块的长度为1,宽度为1,深度为n,零数据模块中所有元素的值均为零。补零规则与卷积核包含的第二图像矩阵B p×n的列向量的数量,以及列向量在卷积核的位置相关。 It should be understood that, in this embodiment, the length of the zero data module is 1, the width is 1, and the depth is n, and the values of all elements in the zero data module are zero. The zero padding rule is related to the number of column vectors of the second image matrix B p×n contained in the convolution kernel and the position of the column vectors in the convolution kernel.
作为一种可选的方式,第二图像矩阵B p×n对应的每个卷积核包含第二图像矩阵B p×n中的一个列向量,则对应的图像数据处理装置可以通过如下方式插入零数据模块:图像数据处理装置在第三特征图的上方边缘,下方边缘,左方边缘以及右方边缘分别插入零数据模块得到第一特征图。 As an optional way, each convolution kernel corresponding to the second image matrix B p×n contains a column vector in the second image matrix B p×n , then the corresponding image data processing device can be inserted in the following way Zero data module: the image data processing device respectively inserts zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map to obtain the first feature map.
例如,若每个卷积核包含一个列向量,f为奇数,列向量位于卷积核的中心,则图像数据处理装置可以在第三特征图的上方边缘和下方边缘各插入
Figure PCTCN2022122544-appb-000002
行零数据模块,在第三特征图的左方边缘和右方边缘各插入
Figure PCTCN2022122544-appb-000003
列零数据模块得到第一特征图,即第一特征图的第1至n个通道的前
Figure PCTCN2022122544-appb-000004
列,最后
Figure PCTCN2022122544-appb-000005
列,前
Figure PCTCN2022122544-appb-000006
行以及最后
Figure PCTCN2022122544-appb-000007
行的值均为零。示例性的,f=3,第一图像矩阵
Figure PCTCN2022122544-appb-000008
图像数据处理装置得到的第三特征图如图2所示,然后在第三特征图上方边缘和下方边缘各插入1行零数据模块,在第三特征图的左方边缘和右方边缘各插入1列零数据模块得到第一特征图,如图3所示。
For example, if each convolution kernel contains a column vector, f is an odd number, and the column vector is located at the center of the convolution kernel, then the image data processing device can insert
Figure PCTCN2022122544-appb-000002
Row zero data module, inserted on the left edge and right edge of the third feature map
Figure PCTCN2022122544-appb-000003
The column zero data module obtains the first feature map, that is, the front of the first to n channels of the first feature map
Figure PCTCN2022122544-appb-000004
column, last
Figure PCTCN2022122544-appb-000005
column, front
Figure PCTCN2022122544-appb-000006
row and last
Figure PCTCN2022122544-appb-000007
The row values are all zero. Exemplary, f=3, the first image matrix
Figure PCTCN2022122544-appb-000008
The third feature map obtained by the image data processing device is shown in Figure 2, and then insert a row of zero data modules on the upper edge and lower edge of the third feature map, and insert zero data modules on the left edge and right edge of the third feature map respectively. The first feature map is obtained from a zero-data module in one column, as shown in FIG. 3 .
若每个卷积核包含一个列向量,f为奇数,列向量不位于卷积核的中心,则图像数据处理装置可以在第三特征图的上方边缘插入
Figure PCTCN2022122544-appb-000009
行零数据模块,下方边缘插入
Figure PCTCN2022122544-appb-000010
行零数据模块,左方边缘各插入
Figure PCTCN2022122544-appb-000011
列零数据模块,右方边缘插入
Figure PCTCN2022122544-appb-000012
列零数据模块得到第一特征图,L 1,L 2,L 3和L 4的值与列向量在卷积核中的位置相关。示例性的,f=3,列向量位于卷积核的第1行第1列,则图像数据处理装置在上方边缘插入1行零数据模块,下方边缘插入2行零数据模块,左方边缘插入1行零数据模块,右方边缘插入2行零数据模块,如图4所示。
If each convolution kernel contains a column vector, f is an odd number, and the column vector is not located in the center of the convolution kernel, the image data processing device can insert
Figure PCTCN2022122544-appb-000009
Row zero data module, lower edge inserted
Figure PCTCN2022122544-appb-000010
Row zero data modules, each inserted at the left edge
Figure PCTCN2022122544-appb-000011
Column zero data module, right edge insert
Figure PCTCN2022122544-appb-000012
The column zero data module obtains the first feature map, and the values of L 1 , L 2 , L 3 and L 4 are related to the position of the column vector in the convolution kernel. Exemplarily, f=3, the column vector is located in the first row and the first column of the convolution kernel, then the image data processing device inserts 1 row of zero data modules at the upper edge, inserts 2 rows of zero data modules at the lower edge, and inserts at the left edge 1 row of zero data modules, insert 2 rows of zero data modules at the right edge, as shown in Figure 4.
作为一种可选的方式,系统设定第二图像矩阵B p×n对应的每个卷积核包含第二图像矩阵B p×n中的多个列向量,则对应的图像数据处理装置可以通过如下方式插入零数据模块:图像数据处理装置在第三特征图的目标位 置插入零数据模块得到第一特征图,目标位置包括内部位置和边缘位置,该目标位置与这多个列向量在卷积核中对应的位置相关。 As an optional way, the system sets that each convolution kernel corresponding to the second image matrix B p×n contains multiple column vectors in the second image matrix B p×n , then the corresponding image data processing device can The zero data module is inserted in the following manner: the image data processing device inserts the zero data module at the target position of the third feature map to obtain the first feature map, the target position includes an internal position and an edge position, and the target position is in the volume with the plurality of column vectors The corresponding positions in the product kernel are related.
示例性的,f=3,每个卷积核包含两个列向量W 1和W 2,W 1位于卷积核的第2行第2列,W 2位于卷积核的第2行第3列。图像数据处理装置在第三特征图的各列之间插入1列零数据模块,并在第三特征图的上方边缘和下方边缘各插入1行零数据模块,左方边缘和右方边缘各插入1列零数据模块得到第一特征图,如图5所示。 Exemplarily, f=3, each convolution kernel contains two column vectors W 1 and W 2 , W 1 is located in row 2, column 2 of the convolution kernel, W 2 is located in row 2, column 3 of the convolution kernel List. The image data processing device inserts a column of zero data modules between each column of the third feature map, inserts a row of zero data modules at the upper edge and lower edge of the third feature map, and inserts a row of zero data modules at the left edge and right edge respectively The first feature map is obtained by a zero-data module in one column, as shown in FIG. 5 .
应理解,补零规则与卷积核包含的列向量数量以及各列向量在卷积核对应的位置相关,具体可以由用户依据设定的列向量数量和列向量在卷积核对应的位置设定,还可以通过其他方式设定,本实施例不作限定。It should be understood that the zero padding rule is related to the number of column vectors contained in the convolution kernel and the corresponding position of each column vector in the convolution kernel. Specifically, the user can set the number of column vectors and the corresponding position of the column vector in the convolution kernel. It can also be set in other ways, which is not limited in this embodiment.
103、图像数据处理装置确定第二图像矩阵对应的X个卷积核。103. The image data processing apparatus determines X convolution kernels corresponding to the second image matrix.
图像数据处理装置获取第二图像矩阵B p×n后,按照第二预设规则生成第二图像矩阵B p×n对应的X个大小为f×f×p的卷积核,每个卷积核包含第二图像矩阵B p×n中的一个或多个列向量,即每个卷积核包含所述第二图像矩阵B p×n中的一列或多列元素,f为大于1的整数,X为大于或等于1的整数。 After the image data processing device acquires the second image matrix B p×n , it generates X convolution kernels with a size of f×f ×p corresponding to the second image matrix B p×n according to the second preset rule, and each convolution The kernel contains one or more column vectors in the second image matrix B p×n , that is, each convolution kernel contains one or more column elements in the second image matrix B p×n , and f is an integer greater than 1 , X is an integer greater than or equal to 1.
例如,图像数据处理装置可以通过如下方式确定第二图像矩阵B p×n对应的X个卷积核:图像数据处理装置确定X个卷积核中的n组目标元素的值,确定X个卷积核和中除了这n组目标元素以外的其他元素的值为0,其中,每组目标元素对应第二图像矩阵B p×n的一个列向量,第i组目标元素的值与第二图像矩阵B p×n的第i列元素的值一一对应,i为1至n的整数,每个卷积核包含这n组目标元素中的一组或多组目标元素。 For example, the image data processing device can determine the X convolution kernels corresponding to the second image matrix B p×n in the following manner: the image data processing device determines the values of n groups of target elements in the X convolution kernels, and determines the X convolution kernels The values of other elements in the product kernel sum except the n groups of target elements are 0, wherein each group of target elements corresponds to a column vector of the second image matrix B p×n , and the value of the i-th group of target elements is the same as that of the second image The values of the i-th column elements of the matrix B p×n correspond one-to-one, i is an integer from 1 to n, and each convolution kernel contains one or more groups of target elements in the n groups of target elements.
需要说明的是,卷积核的大小是预先设定的,即f的值为预设值。系统还可以设定每个卷积核包含的列向量的个数,即设定每个卷积核包含第二图像矩阵B p×n中的多少个列元素。进一步地,系统还可以设定每个卷积核包含的列向量在卷积核中对应的位置。 It should be noted that the size of the convolution kernel is preset, that is, the value of f is a preset value. The system can also set the number of column vectors contained in each convolution kernel, that is, set how many column elements in the second image matrix B p×n each convolution kernel contains. Further, the system can also set the corresponding position of the column vector contained in each convolution kernel in the convolution kernel.
在一些实施例中,系统设定每个卷积核包含一个列向量,则X=n,每个卷积核包含一组目标元素。例如,针对每个卷积核,该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1…,p个通道的第j行第k列。例如,系统设定卷积核包含的列向量位于卷积核的中心,f为奇数,则该 卷积核包含的该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1…,p个通道的第
Figure PCTCN2022122544-appb-000013
行第
Figure PCTCN2022122544-appb-000014
列。示例性的,f=3,第二图像矩阵
Figure PCTCN2022122544-appb-000015
图像数据处理装置确定第二图像矩阵B 3×3对应3个卷积核中的3组目标元素b 1,1,b 2,1,b 3,1;b 1,2,b 2,2,b 3,2;b 1,3,b 2,3,b 3,3;并确定各个卷积核中除了目标元素以外的其他元素的值为0,其中,每个卷积核包含1组目标元素,每个卷积核包含的目标元素分别位于该卷积核的第1,第2和第3个通道的第2行第2列,如图6所示。
In some embodiments, the system assumes that each convolution kernel contains a column vector, then X=n, and each convolution kernel contains a set of target elements. For example, for each convolution kernel, the target elements Z 1 , Z 2 , . For example, if the system sets the column vector contained in the convolution kernel to be located at the center of the convolution kernel, and f is an odd number, then the target elements Z 1 , Z 2 ,...,Z p contained in the convolution kernel contained in the convolution kernel are respectively located at The first... of the convolution kernel, the first p channel
Figure PCTCN2022122544-appb-000013
row number
Figure PCTCN2022122544-appb-000014
List. Exemplary, f=3, the second image matrix
Figure PCTCN2022122544-appb-000015
The image data processing device determines that the second image matrix B 3×3 corresponds to three groups of target elements b 1,1 , b 2,1 , b 3,1 ; b 1,2 , b 2,2 , b 3,2 ; b 1,3 , b 2,3 , b 3,3 ; and determine the values of elements other than the target element in each convolution kernel to be 0, where each convolution kernel contains a set of targets Elements, the target elements contained in each convolution kernel are located in the second row and second column of the first, second and third channels of the convolution kernel, as shown in Figure 6.
在一些实施例中,系统预先设定每个卷积核包含u个列向量,u为大于1且小于或等于f的整数。In some embodiments, the system presets that each convolution kernel includes u column vectors, and u is an integer greater than 1 and less than or equal to f.
若u能被n整除,则X=n/u,每个卷积核包含u组目标元素。例如,针对X个卷积核中的每个卷积核,该卷积核包含第v组目标元素分别位于该卷积核的第1,2,…,p个通道的第j v行第k v列,其中,v为1至u的整数; If u can be divisible by n, then X=n/u, and each convolution kernel contains u groups of target elements. For example, for each of the X convolution kernels, the convolution kernel contains the vth group of target elements respectively located in the jth row vth row k of the convolution kernel's 1st, 2nd, ..., p channels v column, wherein, v is an integer from 1 to u;
若u不能被n整除,则X为大于n/u的最小整数,X个卷积核中的第1至第X-1个卷积核各包含u组目标元素,第X个卷积核包含n-u(X-1)组目标元素。第1至第X-1个卷积核包含u组目标元素中的第v组目标元素分别位于卷积核的第1,2,…,p个通道的第j v行第k v列,第X个卷积核包含的n-u(X-1)组目标元素中的第r组目标元素位于卷积核的第1,2,…,p个通道的第j r行第k r列,其中,r为1至n-u(X-1)的整数。 If u is not divisible by n, then X is the smallest integer greater than n/u, and the 1st to X-1th convolution kernels among the X convolution kernels each contain u group of target elements, and the Xth convolution kernel contains nu(X-1) groups of target elements. The 1st to X-1th convolution kernels contain the vth group of target elements in the u group of target elements, which are respectively located in the 1st, 2nd, ..., jth row, kvth column , kvth column of the p channel of the convolution kernel The r-th group of target elements in the nu(X-1) group of target elements contained in X convolution kernels is located in the j r row and k r column of the 1st, 2nd, ..., p channels of the convolution kernel, where, r is an integer from 1 to nu(X-1).
示例性的,f=3,每个卷积核包含2个列向量W 1和W 2,W 1位于卷积核的第2行第2列,W 2位于卷积核的第2行第3列。 Exemplarily, f=3, each convolution kernel contains 2 column vectors W 1 and W 2 , W 1 is located in row 2, column 2 of the convolution kernel, W 2 is located in row 2, column 3 of the convolution kernel List.
第二图像矩阵
Figure PCTCN2022122544-appb-000016
图像数据处理装置确定第二图像矩阵B 3×3对应3个卷积核中的3组目标元素b 1,1,b 2,1,b 3,1;b 1,2,b 2,2,b 3,2;b 1,3,b 2,3,b 3,3;并确定各个卷积核中除了目标元素以外 的其他元素的值为0,其中,第1个卷积核包含第二图像矩阵B 3×3中的第一列元素b 1,1,b 2,1,b 3,1和第二列元素b 1,2,b 2,2,b 3,2,第2个卷积核包含第二图像矩阵B 3×3中的第三列元素b 1,3,b 2,3,b 3,3,如图7所示。
second image matrix
Figure PCTCN2022122544-appb-000016
The image data processing device determines that the second image matrix B 3×3 corresponds to three groups of target elements b 1,1 , b 2,1 , b 3,1 ; b 1,2 , b 2,2 , b 3,2 ; b 1,3 , b 2,3 , b 3,3 ; and determine the values of other elements in each convolution kernel except the target element to be 0, wherein the first convolution kernel contains the second First column elements b 1,1 , b 2,1 , b 3,1 and second column elements b 1,2 , b 2,2 , b 3,2 in image matrix B 3× 3 , volume 2 The product kernel includes elements b 1,3 , b 2,3 , and b 3,3 in the third column of the second image matrix B 3×3 , as shown in FIG. 7 .
104、图像数据处理装置将第一特征图与X个卷积核进行卷积得到第二特征图。104. The image data processing device convolves the first feature map with X convolution kernels to obtain a second feature map.
图像数据处理装置确生成第一特征图以及X个卷积核后,将第一特征图与X个卷积核进行卷积得到第二特征图,第二特征图与第三图像矩阵C m×n对应,第三图像矩阵C m×n即第一图像矩阵A m×p与第二图像矩阵B p×n进行矩阵相乘得到的结果。 After the image data processing device generates the first feature map and the X convolution kernels, the first feature map is convolved with the X convolution kernels to obtain the second feature map, and the second feature map and the third image matrix C Corresponding to n , the third image matrix C m×n is the result of matrix multiplication of the first image matrix A m×p and the second image matrix B p×n .
在一些实施例中,f为奇数,每个卷积核包含一个第二图像矩阵的列向量,列向量位于卷积核的中心,则图像数据处理装置将第一特征图与n个卷积核进行卷积得到的第二特征图的长度为a,宽度为b,深度为n。第二特征图每一个以深度方向1×1×n的数据模块对应于第三图像矩阵C m×n的列向量,并且列向量顺序在第二特征图的分布顺序与第三特征图的拼接顺序对应,即第二特征图与C m×n的映射方式与第三特征图与A m×p的映射方式相同。 In some embodiments, f is an odd number, each convolution kernel contains a column vector of a second image matrix, and the column vector is located at the center of the convolution kernel, then the image data processing device combines the first feature map with n convolution kernels The length of the second feature map obtained by performing convolution is a, the width is b, and the depth is n. Each data module of the second feature map in the depth direction 1×1×n corresponds to the column vector of the third image matrix C m×n , and the order of the column vectors is concatenated in the distribution order of the second feature map and the third feature map Sequential correspondence, that is, the mapping method of the second feature map to C m×n is the same as the mapping method of the third feature map to A m×p .
在一些实施例中,每个卷积核包含多个第二图像矩阵的列向量,则图像数据处理装置将第一特征图与n个卷积核进行卷积得到的第二特征图后,按照预设的重排规则对第二特征图中的数据模块进行重排得到第四特征图,第四特征图的长度为a,宽度为b,深度为n。第四特征图每一个以深度方向1×1×n的数据对应于第三图像矩阵C m×n的列向量,并且列向量顺序在第四特征图的分布顺序与第三特征图的拼接顺序对应。 In some embodiments, each convolution kernel contains a plurality of column vectors of the second image matrix, and after the image data processing device convolutes the first feature map with n convolution kernels to obtain the second feature map, according to The preset rearrangement rule rearranges the data modules in the second feature map to obtain a fourth feature map. The length of the fourth feature map is a, the width is b, and the depth is n. Each of the fourth feature maps corresponds to the column vector of the third image matrix C m×n in the depth direction of 1×1×n data, and the order of the column vectors is in the distribution order of the fourth feature map and the concatenation order of the third feature map correspond.
在一些实施例中,每个卷积核包含一个第二图像矩阵的列向量,列向量不位于卷积核的中心,则图像数据处理装置将第一特征图与n个卷积核进行卷积得到的第二特征图后,将预设位置的零数据模块删除得到第五特征图,第五特征图的长度为a,宽度为b,深度为n。第五特征图每一个以深度方向1×1×n的数据对应于第三图像矩阵C m×n的列向量,并且列向量顺序在第五特征图的分布顺序与第三特征图的拼接顺序对应。 In some embodiments, each convolution kernel contains a column vector of the second image matrix, and the column vector is not located at the center of the convolution kernel, then the image data processing device convolves the first feature map with n convolution kernels After the second feature map is obtained, the zero data module at the preset position is deleted to obtain the fifth feature map. The length of the fifth feature map is a, the width is b, and the depth is n. Each fifth feature map with 1×1×n data in the depth direction corresponds to the column vector of the third image matrix C m×n , and the order of the column vectors is in the distribution order of the fifth feature map and the splicing order of the third feature map correspond.
本申请的技术方案可以获取第一图像矩阵A m×p以及第二图像矩阵B p×n,并确定第一图像矩阵A m×p对应的第一特征图,以及第二图像矩阵B p×n对应的X个大小为f×f×p的卷积核,然后将第一特征图与这X个 卷积核进行卷积相乘与第三图像矩阵对应的特征图,其中,f为大于1的整数,第三图像矩阵即第一图像矩阵以及第二图像矩阵进行矩阵相乘得到的。也就是说,本实施例可以用卷积运算来替代矩阵乘法运算,得出对应的特征图,解决了Transformer模块无法部署在只支持卷积计算的芯片的问题,提高了Transformer模块的灵活性。 The technical solution of this application can obtain the first image matrix A m×p and the second image matrix B p×n , and determine the first feature map corresponding to the first image matrix A m×p , and the second image matrix B p×n n corresponds to X convolution kernels of size f×f×p, and then the first feature map is convolved with these X convolution kernels and multiplied with the feature map corresponding to the third image matrix, where f is greater than An integer of 1, the third image matrix is obtained by multiplying the first image matrix and the second image matrix. That is to say, in this embodiment, the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module.
其次,本申请实施例提供了多种确定第一特征图的方式和多种确定卷积核的方式,提高了方案的灵活性。Secondly, the embodiment of the present application provides multiple ways of determining the first feature map and multiple ways of determining the convolution kernel, which improves the flexibility of the solution.
与前述应用功能实现方法实施例相对应,本申请还提供了一种图像数据处理装置、电子设备及相应的实施例。Corresponding to the aforementioned embodiments of the method for implementing application functions, the present application also provides an image data processing device, electronic equipment, and corresponding embodiments.
图8是本申请实施例示出的图像数据处理装置的结构示意图。FIG. 8 is a schematic structural diagram of an image data processing device shown in an embodiment of the present application.
参见图8,本实施例中的图像数据处理装置800包括:获取模块801、第一确定模块802、第二确定模块803、卷积模块804。Referring to FIG. 8 , the image data processing apparatus 800 in this embodiment includes: an acquisition module 801 , a first determination module 802 , a second determination module 803 , and a convolution module 804 .
获取模块801,用于获取第一图像矩阵A m×p以及第二图像矩阵B p×nAn acquisition module 801, configured to acquire a first image matrix A m×p and a second image matrix B p×n ;
第一确定模块802,用于确定第一图像矩阵A m×p对应的第一特征图; The first determining module 802 is configured to determine the first feature map corresponding to the first image matrix A m×p ;
第二确定模块803,用于确定第二图像矩阵B p×n对应的X个卷积核,每个卷积核的大小为f×f×p,其中,f为大于1的整数,X为大于或等于1的整数,每个卷积核包含第二图像矩阵B p×n中的一列或多列元素; The second determination module 803 is configured to determine X convolution kernels corresponding to the second image matrix B p×n , the size of each convolution kernel is f×f×p, where f is an integer greater than 1, and X is An integer greater than or equal to 1, each convolution kernel contains one or more columns of elements in the second image matrix B p×n ;
卷积模块804,用于将第一特征图与X个卷积核进行卷积得到第二特征图,第二特征图与第三图像矩阵对应,第三图像矩阵为第一图像矩阵A m×p以及第二图像矩阵B p×n进行矩阵相乘得到的结果。 The convolution module 804 is used to convolve the first feature map with X convolution kernels to obtain a second feature map, the second feature map corresponds to the third image matrix, and the third image matrix is the first image matrix A m × The result of matrix multiplication of p and the second image matrix B p×n .
本申请的技术方案,获取模块801可以获取第一图像矩阵A p×p以及第二图像矩阵B p×n,第一确定模块802可以确定第一图像矩阵A m×p对应的第一特征图,第二确定模块803可以确定第二图像矩阵B p×n对应的X个大小为f×f×p的卷积核,然后卷积模块804可以将第一特征图与这X个卷积核进行卷积相乘与第三图像矩阵对应的特征图,其中,f为大于1的整数,第三图像矩阵即第一图像矩阵以及第二图像矩阵进行矩阵相乘得到的。也就是说,本实施例可以用卷积运算来替代矩阵乘法运算,得出对应的特征图,解决了Transformer模块无法部署在只支持卷积计算的芯片的问题,提高了Transformer模块的灵活性。 In the technical solution of this application, the acquisition module 801 can acquire the first image matrix A p×p and the second image matrix B p×n , and the first determination module 802 can determine the first feature map corresponding to the first image matrix A m×p , the second determination module 803 can determine the X convolution kernels whose size is f×f×p corresponding to the second image matrix B p×n , and then the convolution module 804 can combine the first feature map with the X convolution kernels Carry out convolution multiplication with the feature map corresponding to the third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by matrix multiplication of the first image matrix and the second image matrix. That is to say, in this embodiment, the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module.
为了便于理解,下面对本申请中的图像数据处理装置进行详细介绍,请参阅图9,本实施例中的图像数据处理装置900包括:获取模块901、第一确定模块902、第二确定模块903、卷积模块904。For ease of understanding, the image data processing device in this application will be described in detail below. Please refer to FIG. Convolution module 904 .
获取模块901,用于获取第一图像矩阵A m×p以及第二图像矩阵B p×nAn acquisition module 901, configured to acquire a first image matrix A m×p and a second image matrix B p×n ;
第一确定模块902,用于确定第一图像矩阵A m×p对应的第一特征图; The first determining module 902 is configured to determine the first feature map corresponding to the first image matrix A m×p ;
第二确定模块903,用于确定第二图像矩阵B p×n对应的X个卷积核,每个卷积核的大小为f×f×p,其中,f为大于1的整数,X为大于或等于1的整数,每个卷积核包含第二图像矩阵B p×n中的一列或多列元素; The second determining module 903 is configured to determine X convolution kernels corresponding to the second image matrix B p×n , the size of each convolution kernel is f×f×p, where f is an integer greater than 1, and X is An integer greater than or equal to 1, each convolution kernel contains one or more columns of elements in the second image matrix B p×n ;
卷积模块904,用于将第一特征图与X个卷积核进行卷积得到第二特征图,第二特征图与第三图像矩阵对应,第三图像矩阵为第一图像矩阵A m×p以及第二图像矩阵B p×n进行矩阵相乘得到的结果。 Convolution module 904, configured to convolve the first feature map with X convolution kernels to obtain a second feature map, the second feature map corresponds to the third image matrix, and the third image matrix is the first image matrix A m × The result of matrix multiplication of p and the second image matrix B p×n .
其中第二确定模块903包括:第一确定单元9031、第二确定单元9032。The second determination module 903 includes: a first determination unit 9031 and a second determination unit 9032 .
第一确定单元9031,用于确定X个卷积核中的n组目标元素的值,这n组目标元素中第i组目标元素的值与第二图像矩阵中第i列元素的值对应,i为1至n的整数,每个卷积核包含这n组目标元素中的一组或多组目标元素;The first determination unit 9031 is configured to determine the values of n groups of target elements in the X convolution kernels, the value of the i-th group of target elements in the n groups of target elements corresponds to the value of the i-th column element in the second image matrix, i is an integer from 1 to n, and each convolution kernel contains one or more sets of target elements in the n sets of target elements;
第二确定单元9032,用于确定X个卷积核中除了n组目标元素以外的其他元素的值为0。The second determination unit 9032 is configured to determine that values of other elements in the X convolution kernels are 0 except for the n groups of target elements.
可选地,每个卷积核包含n组目标元素中的一组目标元素Z 1,Z 2,…,Z p,针对每个卷积核,该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1,2,…,p个通道的第j行第k列。 Optionally, each convolution kernel contains a group of target elements Z 1 , Z 2 ,...,Z p in n groups of target elements, and for each convolution kernel, the convolution kernel contains target elements Z 1 , Z 2 ,..., Z p are respectively located in row j and column k of the 1st, 2nd,...,p channels of the convolution kernel.
第一确定模块902包括:第三确定单元9021、第一拼接单元9022、第一补零单元9023。The first determining module 902 includes: a third determining unit 9021 , a first splicing unit 9022 , and a first zero padding unit 9023 .
第三确定单元9021,用于确定与第一图像矩阵A m×p的m个行向量对应的m个数据模块,这m个数据模块的长度为1,宽度为1,深度为p; The third determination unit 9021 is configured to determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of these m data modules is 1, the width is 1, and the depth is p;
第一拼接单元9022,用于按照预设的拼接规则对m个数据模块进行拼接得到第三特征图,第三特征图的长度为a,宽度为b,深度为p,且a×b=m;The first splicing unit 9022 is configured to splice m data modules according to preset splicing rules to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is p, and a×b=m ;
第一补零单元9023,用于在第三特征图的上方边缘,下方边缘,左方边缘以及右方边缘分别插入零数据模块得到第一特征图,零数据模块的长度为1,宽度为1,深度为p,零数据模块中所有元素的值为零。The first zero padding unit 9023 is used to insert zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map respectively to obtain the first feature map, the length of the zero data module is 1, and the width is 1 , at depth p, all elements in the zero data module have the value zero.
可选地,每个卷积核包含n组目标元素中的u组目标元素,u为大于1且小于或等于f的整数,针对每个卷积核,该卷积核包含的第v组目标元素中的p个目标元素分别位于该卷积核的第1,2…,p个通道的第j v行第k v列,其中v为1至u的整数。 Optionally, each convolution kernel contains u group of target elements in n groups of target elements, u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the vth group of targets contained in the convolution kernel The p target elements in the element are respectively located in the 1st, 2nd..., p channels of the convolution kernel at the j vth row and the k vth column, where v is an integer from 1 to u.
第一确定模块902包括:第四确定单元9024、第二拼接单元9025、第二补零单元9026。The first determination module 902 includes: a fourth determination unit 9024 , a second splicing unit 9025 , and a second zero padding unit 9026 .
第四确定单元9024,用于确定与第一图像矩阵A m×p的m个行向量对应的m个数据模块,这m个数据模块的长度为1,宽度为1,深度为p; The fourth determination unit 9024 is configured to determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of these m data modules is 1, the width is 1, and the depth is p;
第二拼接单元9025,用于按照预设的拼接规则对m个数据模块进行拼接得到第三特征图,第三特征图的长度为a,宽度为b,深度为p,a×b=m;The second splicing unit 9025 is configured to splice m data modules according to a preset splicing rule to obtain a third feature map, where the length of the third feature map is a, the width is b, and the depth is p, where a×b=m;
第二补零单元9026,用于在第三特征图的目标位置插入零数据模块得到第一特征图,目标位置包括内部位置和边缘位置,目标位置与多组目标元素在卷积核中对应的位置相关,零数据模块的长度为1,宽度为1,深度为n,零数据模块中所有元素的值为零。The second zero padding unit 9026 is used to insert a zero data module at the target position of the third feature map to obtain the first feature map. The target position includes internal positions and edge positions, and the target position corresponds to multiple groups of target elements in the convolution kernel. Position-dependent, the length of the zero data block is 1, the width is 1, and the depth is n, and the value of all elements in the zero data block is zero.
本申请的技术方案,获取模块901可以获取第一图像矩阵A m×p以及第二图像矩阵B p×n,第一确定模块902可以确定第一图像矩阵A m×p对应的第一特征图,第二确定模块903可以确定第二图像矩阵B p×n对应的X个大小为f×f×p的卷积核,然后卷积模块904可以将第一特征图与这X个卷积核进行卷积相乘与第三图像矩阵对应的特征图,其中,f为大于1的整数,第三图像矩阵即第一图像矩阵以及第二图像矩阵进行矩阵相乘得到的。也就是说,本实施例可以用卷积运算来替代矩阵乘法运算,得出对应的特征图,解决了Transformer模块无法部署在只支持卷积计算的芯片的问题,提高了Transformer模块的灵活性。 In the technical solution of this application, the acquisition module 901 can acquire the first image matrix A m×p and the second image matrix B p×n , and the first determination module 902 can determine the first feature map corresponding to the first image matrix A m×p , the second determination module 903 can determine the X convolution kernels whose size is f×f×p corresponding to the second image matrix B p×n , and then the convolution module 904 can combine the first feature map with the X convolution kernels Carry out convolution multiplication with the feature map corresponding to the third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by matrix multiplication of the first image matrix and the second image matrix. That is to say, in this embodiment, the convolution operation can be used to replace the matrix multiplication operation to obtain the corresponding feature map, which solves the problem that the Transformer module cannot be deployed on a chip that only supports convolution calculation, and improves the flexibility of the Transformer module.
其次,本申请实施例提供了可以通过多种方式确定第一特征图和卷积核的方式,提高了方案的灵活性。Secondly, the embodiment of the present application provides ways to determine the first feature map and the convolution kernel in multiple ways, which improves the flexibility of the solution.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不再做详细阐述说明。Regarding the apparatus in the above embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
图10是本申请实施例示出的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
参见图10,电子设备1000包括存储器1010和处理器1020。Referring to FIG. 10 , an electronic device 1000 includes a memory 1010 and a processor 1020 .
处理器1020可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 1020 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
存储器1010可以包括各种类型的存储单元,例如系统内存、只读存储器(ROM)和永久存储装置。其中,ROM可以存储处理器1020或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令 和数据的非易失性存储设备。在一些实施方式中,永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中,永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备,例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外,存储器1010可以包括任意计算机可读存储媒介的组合,包括各种类型的半导体存储芯片(例如DRAM,SRAM,SDRAM,闪存,可编程只读存储器),磁盘和/或光盘也可以采用。在一些实施方式中,存储器1010可以包括可读和/或写的可移除的存储设备,例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM,双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等)、磁性软盘等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。The memory 1010 may include various types of storage units such as system memory, read only memory (ROM), and persistent storage. Wherein, the ROM may store static data or instructions required by the processor 1020 or other modules of the computer. The persistent storage device may be a readable and writable storage device. Persistent storage may be a non-volatile storage device that does not lose stored instructions and data even if the computer is powered off. In some embodiments, the permanent storage device adopts a mass storage device (such as a magnetic or optical disk, flash memory) as the permanent storage device. In some other implementations, the permanent storage device may be a removable storage device (such as a floppy disk, an optical drive). The system memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory. System memory can store some or all of the instructions and data that the processor needs at runtime. In addition, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (such as DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and magnetic disks and/or optical disks may also be used. In some embodiments, memory 1010 may include a readable and/or writable removable storage device, such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc. Computer-readable storage media do not contain carrier waves and transient electronic signals transmitted by wireless or wire.
存储器1010上存储有可执行代码,当可执行代码被处理器1020处理时,可以使处理器1020执行上文述及的方法中的部分或全部。Executable codes are stored in the memory 1010 , and when the executable codes are processed by the processor 1020 , the processor 1020 may execute part or all of the methods mentioned above.
此外,根据本申请的方法还可以实现为一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括用于执行本申请的上述方法中部分或全部步骤的计算机程序代码指令。In addition, the method according to the present application can also be implemented as a computer program or computer program product, the computer program or computer program product including computer program code instructions for executing some or all of the steps in the above method of the present application.
或者,本申请还可以实施为一种计算机可读存储介质(或非暂时性机器可读存储介质或机器可读存储介质),其上存储有可执行代码(或计算机程序或计算机指令代码),当可执行代码(或计算机程序或计算机指令代码)被电子设备(或服务器等)的处理器执行时,使处理器执行根据本申请的上述方法的各个步骤的部分或全部。Alternatively, the present application may also be implemented as a computer-readable storage medium (or a non-transitory machine-readable storage medium or a machine-readable storage medium), on which executable code (or computer program or computer instruction code) is stored, When the executable code (or computer program or computer instruction code) is executed by the processor of the electronic device (or server, etc.), the processor is made to perform part or all of the steps of the above-mentioned method according to the present application.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其他普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (13)

  1. 一种图像数据处理方法,其特征在于,包括:An image data processing method, characterized in that, comprising:
    获取第一图像矩阵A m×p以及第二图像矩阵B p×nObtain the first image matrix A m×p and the second image matrix B p×n ;
    确定所述第一图像矩阵A m×p对应的第一特征图; determining a first feature map corresponding to the first image matrix A m×p ;
    确定所述第二图像矩阵B p×n对应的X个卷积核; Determining X convolution kernels corresponding to the second image matrix B p×n ;
    将所述第一特征图与所述X个卷积核进行卷积得到第二特征图,所述第二特征图与第三图像矩阵对应,所述第三图像矩阵为第一图像矩阵A m×p以及第二图像矩阵B p×n进行矩阵相乘得到的结果。 Convolving the first feature map with the X convolution kernels to obtain a second feature map, the second feature map corresponds to a third image matrix, and the third image matrix is the first image matrix A m ×p and the second image matrix B p×n are obtained by matrix multiplication.
  2. 根据权利要求1所述的图像数据处理方法,其特征在于:The image data processing method according to claim 1, characterized in that:
    每个卷积核的大小为f×f×p,其中,所述f为大于1的整数,所述X为大于或等于1的整数。The size of each convolution kernel is f×f×p, wherein, the f is an integer greater than 1, and the X is an integer greater than or equal to 1.
  3. 根据权利要求2所述的图像数据处理方法,其特征在于:The image data processing method according to claim 2, characterized in that:
    每个卷积核包含所述第二图像矩阵B p×n中的一列或多列元素。 Each convolution kernel contains one or more columns of elements in the second image matrix B p×n .
  4. 根据权利要求1所述的图像数据处理方法,其特征在于,所述确定所述第二图像矩阵B p×n对应的X个卷积核包括: The image data processing method according to claim 1, wherein the determining the X convolution kernels corresponding to the second image matrix Bp ×n comprises:
    确定所述X个卷积核中的n组目标元素的值,所述n组目标元素中第i组目标元素的值与第二图像矩阵中第i列元素的值对应,所述i为1至n的整数,每个卷积核包含所述n组目标元素中的一组或多组目标元素;Determine the value of n groups of target elements in the X convolution kernels, the value of the i-th group of target elements in the n groups of target elements corresponds to the value of the i-th column element in the second image matrix, and the i is 1 An integer to n, each convolution kernel contains one or more groups of target elements in the n groups of target elements;
    确定所述X个卷积核中除了所述n组目标元素以外的其他元素的值为0。Determine the values of other elements in the X convolution kernels except for the n groups of target elements to be 0.
  5. 根据权利要求4所述的图像数据处理方法,其特征在于:The image data processing method according to claim 4, characterized in that:
    每个卷积核包含所述n组目标元素中的一组目标元素Z 1,Z 2,…,Z p,针对每个卷积核,该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1,2,…,p个通道的第j行第k列; Each convolution kernel contains a group of target elements Z 1 , Z 2 ,..., Z p in the n groups of target elements, and for each convolution kernel, the convolution kernel contains target elements Z 1 , Z 2 , ..., Z p are respectively located in row j, column k of the 1st, 2nd, ..., p channels of the convolution kernel;
    所述确定所述第一图像矩阵A m×p对应的第一特征图包括: The determining the first feature map corresponding to the first image matrix Am ×p includes:
    确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p; Determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of the data modules is 1, the width is 1, and the depth is p;
    按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;splicing the m data modules according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is p, and the a×b=m;
    在所述第三特征图的上方边缘、下方边缘、左方边缘以及右方边缘分别插入零数据模块得到第一特征图,所述零数据模块的长度为1,宽度为 1,深度为p,所述零数据模块中所有元素的值为零。Inserting zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map respectively to obtain the first feature map, the length of the zero data module is 1, the width is 1, and the depth is p, All elements in the zero data module have a value of zero.
  6. 根据权利要求4所述的图像数据处理方法,其特征在于:The image data processing method according to claim 4, characterized in that:
    每个卷积核包含所述n组目标元素中的u组目标元素,所述u为大于1且小于或等于f的整数,针对每个卷积核,该卷积核包含的第v组目标元素中的p个目标元素分别位于该卷积核的第1,2…,p个通道的第j v行第k v列,其中v为1至u的整数; Each convolution kernel contains u group of target elements in the n groups of target elements, the u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the vth group of targets contained in the convolution kernel The p target elements in the element are respectively located in the 1st, 2nd..., the jvth row and the kvth column of the p channel of the convolution kernel, where v is an integer from 1 to u;
    所述确定所述第一图像矩阵A m×p对应的第一特征图包括: The determining the first feature map corresponding to the first image matrix Am ×p includes:
    确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p; Determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of the data modules is 1, the width is 1, and the depth is p;
    按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;splicing the m data modules according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is p, and the a×b=m;
    在所述第三特征图的目标位置插入零数据模块得到第一特征图,所述目标位置包括内部位置和边缘位置,所述目标位置与所述多组目标元素在所述卷积核中对应的位置相关,所述零数据模块的长度为1,宽度为1,深度为n,所述零数据模块中所有元素的值为零。Insert a zero data module into the target position of the third feature map to obtain the first feature map, the target position includes an internal position and an edge position, and the target position corresponds to the plurality of groups of target elements in the convolution kernel , the length of the zero data module is 1, the width is 1, and the depth is n, and the values of all elements in the zero data module are zero.
  7. 一种图像数据装置,其特征在于,包括:An image data device, characterized in that it comprises:
    获取模块,用于获取第一图像矩阵A m×p以及第二图像矩阵B p×nAn acquisition module, configured to acquire the first image matrix A m×p and the second image matrix B p×n ;
    第一确定模块,用于确定所述第一图像矩阵A m×p对应的第一特征图; A first determining module, configured to determine a first feature map corresponding to the first image matrix A m×p ;
    第二确定模块,用于确定所述第二图像矩阵B p×n对应的X个卷积核; A second determining module, configured to determine X convolution kernels corresponding to the second image matrix B p×n ;
    卷积模块,用于将所述第一特征图与所述X个卷积核进行卷积得到第二特征图,所述第二特征图与第三图像矩阵对应,所述第三图像矩阵为第一图像矩阵A m×p以及第二图像矩阵B p×n进行矩阵相乘得到的结果。 A convolution module, configured to convolve the first feature map with the X convolution kernels to obtain a second feature map, the second feature map corresponds to a third image matrix, and the third image matrix is The result obtained by matrix multiplication of the first image matrix A m×p and the second image matrix B p×n .
  8. 根据权利要求7所述的图像数据处理装置,其特征在于:The image data processing device according to claim 7, characterized in that:
    所述第二确定模块确定的X个卷积核中,每个卷积核的大小为f×f×p,其中,所述f为大于1的整数,所述X为大于或等于1的整数,每个卷积核包含所述第二图像矩阵B p×n中的一列或多列元素。 Among the X convolution kernels determined by the second determination module, the size of each convolution kernel is f×f×p, wherein the f is an integer greater than 1, and the X is an integer greater than or equal to 1 , each convolution kernel contains one or more columns of elements in the second image matrix B p×n .
  9. 根据权利要求8所述的图像数据处理装置,其特征在于,所述第二确定模块包括:The image data processing device according to claim 8, wherein the second determining module comprises:
    第一确定单元,用于确定所述X个卷积核中的n组目标元素的值,所述n组目标元素中第i组目标元素的值与第二图像矩阵中第i列元素的值对应,所述i为1至n的整数,每个卷积核包含所述n组目标元素中的一 组或多组目标元素;The first determination unit is configured to determine the values of n groups of target elements in the X convolution kernels, the value of the i-th group of target elements in the n groups of target elements and the value of the i-th column element in the second image matrix Correspondingly, the i is an integer from 1 to n, and each convolution kernel contains one or more groups of target elements in the n groups of target elements;
    第二确定单元,用于确定所述X个卷积核中除了所述n组目标元素以外的其他元素的值为0。The second determination unit is configured to determine that values of other elements in the X convolution kernels except the n groups of target elements are 0.
  10. 根据权利要求9所述的图像数据处理装置,其特征在于:The image data processing device according to claim 9, characterized in that:
    每个卷积核包含所述n组目标元素中的一组目标元素Z 1,Z 2,…,Z p,针对每个卷积核,该卷积核包含的目标元素Z 1,Z 2,…,Z p分别位于该卷积核的第1,2,…,p个通道的第j行第k列; Each convolution kernel contains a group of target elements Z 1 , Z 2 ,..., Z p in the n groups of target elements, and for each convolution kernel, the convolution kernel contains target elements Z 1 , Z 2 , ..., Z p are respectively located in row j, column k of the 1st, 2nd, ..., p channels of the convolution kernel;
    所述第一确定模块包括:The first determination module includes:
    第三确定单元,用于确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p; The third determination unit is configured to determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of the data modules is 1, the width is 1, and the depth is p;
    第一拼接单元,用于按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;The first splicing unit is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is p, and the a ×b=m;
    第一补零单元,用于在所述第三特征图的上方边缘,下方边缘,左方边缘以及右方边缘分别插入零数据模块得到第一特征图,所述零数据模块的长度为1,宽度为1,深度为p,所述零数据模块中所有元素的值为零。The first zero padding unit is used to respectively insert zero data modules into the upper edge, lower edge, left edge and right edge of the third feature map to obtain the first feature map, the length of the zero data module is 1, The width is 1, the depth is p, and the values of all elements in the zero data module are zero.
  11. 根据权利要求9所述的图像数据处理装置,其特征在于:The image data processing device according to claim 9, characterized in that:
    每个卷积核包含所述n组目标元素中的u组目标元素,所述u为大于1且小于或等于f的整数,针对每个卷积核,该卷积核包含的第v组目标元素中的p个目标元素分别位于该卷积核的第1,2…,p个通道的第j v行第k v列,其中v为1至u的整数; Each convolution kernel contains u group of target elements in the n groups of target elements, the u is an integer greater than 1 and less than or equal to f, for each convolution kernel, the vth group of targets contained in the convolution kernel The p target elements in the element are respectively located in the 1st, 2nd..., the jvth row and the kvth column of the p channel of the convolution kernel, where v is an integer from 1 to u;
    所述第一确定模块包括:The first determination module includes:
    第四确定单元,用于确定与所述第一图像矩阵A m×p的m个行向量对应的m个数据模块,所述数据模块的长度为1,宽度为1,深度为p; The fourth determination unit is configured to determine m data modules corresponding to the m row vectors of the first image matrix A m×p , the length of the data modules is 1, the width is 1, and the depth is p;
    第二拼接单元,用于按照预设的拼接规则对所述m个数据模块进行拼接得到第三特征图,所述第三特征图的长度为a,宽度为b,深度为p,所述a×b=m;The second splicing unit is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width is b, and the depth is p, and the a ×b=m;
    第二补零单元,用于在所述第三特征图的目标位置插入零数据模块得到第一特征图,所述目标位置包括内部位置和边缘位置,所述目标位置与所述多组目标元素在所述卷积核中对应的位置相关,所述零数据模块的长度为1,宽度为1,深度为n,所述零数据模块中所有元素的值为零。The second zero padding unit is used to insert a zero data module at the target position of the third feature map to obtain the first feature map, the target position includes an internal position and an edge position, and the target position is consistent with the multiple groups of target elements The corresponding positions in the convolution kernel are related, the length of the zero data module is 1, the width is 1, and the depth is n, and the values of all elements in the zero data module are zero.
  12. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;以及processor; and
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1-6中任一项所述的方法。A memory on which executable code is stored, and when the executable code is executed by the processor, causes the processor to execute the method according to any one of claims 1-6.
  13. 一种计算机可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1-6中任一项所述的方法。A computer-readable storage medium, on which executable codes are stored, and when the executable codes are executed by a processor of an electronic device, the processor is executed as described in any one of claims 1-6. method.
PCT/CN2022/122544 2021-12-06 2022-09-29 Image data processing method and apparatus, device, and storage medium WO2023103551A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111477627.2A CN114283314A (en) 2021-12-06 2021-12-06 Image data processing method and device
CN202111477627.2 2021-12-06

Publications (1)

Publication Number Publication Date
WO2023103551A1 true WO2023103551A1 (en) 2023-06-15

Family

ID=80871122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122544 WO2023103551A1 (en) 2021-12-06 2022-09-29 Image data processing method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114283314A (en)
WO (1) WO2023103551A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283314A (en) * 2021-12-06 2022-04-05 广州小鹏自动驾驶科技有限公司 Image data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110580324A (en) * 2019-07-23 2019-12-17 珠海格力电器股份有限公司 Matrix operation method, device, computer equipment and storage medium
CN111932437A (en) * 2020-10-10 2020-11-13 深圳云天励飞技术股份有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114283314A (en) * 2021-12-06 2022-04-05 广州小鹏自动驾驶科技有限公司 Image data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110580324A (en) * 2019-07-23 2019-12-17 珠海格力电器股份有限公司 Matrix operation method, device, computer equipment and storage medium
CN111932437A (en) * 2020-10-10 2020-11-13 深圳云天励飞技术股份有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114283314A (en) * 2021-12-06 2022-04-05 广州小鹏自动驾驶科技有限公司 Image data processing method and device

Also Published As

Publication number Publication date
CN114283314A (en) 2022-04-05

Similar Documents

Publication Publication Date Title
US11100386B2 (en) Buffer addressing for a convolutional neural network
KR102452953B1 (en) Method and apparatus for performing convolution operation in neural network
CN110073359B (en) Efficient data placement for convolutional neural networks
US20190303757A1 (en) Weight skipping deep learning accelerator
KR20190051697A (en) Method and apparatus for performing devonvolution operation in neural network
TWI740274B (en) System, computer-implemented method, and apparatus for accessing data in multi-dimensional tensors using adders
KR20190066473A (en) Method and apparatus for processing convolution operation in neural network
CN116541647A (en) Operation accelerator, processing method and related equipment
JP6713036B2 (en) Method and apparatus for performing a convolution operation on folded feature data
US11436017B2 (en) Data temporary storage apparatus, data temporary storage method and operation method
WO2019127517A1 (en) Data processing method and device, dma controller, and computer readable storage medium
US10402196B2 (en) Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients
CN111340201A (en) Convolutional neural network accelerator and method for performing convolutional operation thereof
US11164032B2 (en) Method of performing data processing operation
WO2023103551A1 (en) Image data processing method and apparatus, device, and storage medium
US11579921B2 (en) Method and system for performing parallel computations to generate multiple output feature maps
US20220004840A1 (en) Convolutional neural network-based data processing method and device
US20210174180A1 (en) Hardware Implementation of a Neural Network
JP7085600B2 (en) Similar area enhancement method and system using similarity between images
WO2021147276A1 (en) Data processing method and apparatus, and chip, electronic device and storage medium
US20200218777A1 (en) Signal Processing Method and Apparatus
CN114358239A (en) Implementation mode of neural network in multi-core hardware
WO2023122896A1 (en) Data processing method and apparatus
TW202234266A (en) Performing tensor operations using a programmable control engine
US11636569B1 (en) Matrix transpose hardware acceleration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22902976

Country of ref document: EP

Kind code of ref document: A1