CN114283314A

CN114283314A - Image data processing method and device

Info

Publication number: CN114283314A
Application number: CN202111477627.2A
Authority: CN
Inventors: 胡宇; 姬彬斐; 刘嘉超; 刘兰个川
Original assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-04-05
Also published as: WO2023103551A1

Abstract

The application relates to an image data processing method and device. The method comprises the following steps: obtaining a first image matrix A_m×pAnd a second image matrix B_p×n(ii) a Determining the first image matrix A_m×pA corresponding first characteristic diagram; determining the second image matrix B_p×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix B_p×nOne or more columns of elements; convolving the first feature map with the X convolution kernels to obtainTo a second characteristic diagram, the second characteristic diagram corresponds to a third image matrix, and the third image matrix is a first image matrix A_m×pAnd a second image matrix B_p×nAnd (5) carrying out matrix multiplication to obtain a result. The scheme provided by the application can improve the flexibility of the Transformer module.

Description

Image data processing method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image data processing method and apparatus.

Background

The current mainstream machine translation is mainly based on neural network machine translation, and the method is a system with an encoder-decoder (encoder-decoder) architecture, wherein an encoder encodes a source language sequence, extracts information, and converts the information into a target language through a decoder to complete a language translation process. The deep self-attention transform (Transformer) model designed based on the 'encoder-decoder' architecture has become the mainstream model of the machine translation field due to the superior performance, and has a great influence in the deep learning field.

In a neural network with a transform as a main module, two-dimensional data tensors are subjected to matrix multiplication, and in some schemes, a transform model is generally deployed in a chip supporting matrix multiplication for use.

However, some chips in the market only support convolution calculation, so that the Transformer module which needs to perform matrix multiplication operation cannot be deployed on the chips, which causes the use of the Transformer module to be limited and the flexibility to be poor.

Disclosure of Invention

In order to solve or partially solve the problems in the related art, the application provides an image data processing method which can improve the flexibility of a transform module.

A first aspect of the present application provides an image data processing method, comprising obtaining a first image matrix a_m×pAnd a second image matrix B_p×n；

Determining the first image matrix A_m×pA corresponding first characteristic diagram;

determining the second image matrix B_p×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second convolution kernelTwo-image matrix B_p×nOne or more columns of elements;

convolving the first feature map with the X convolution kernels to obtain a second feature map, wherein the second feature map corresponds to a third image matrix, and the third image matrix is a first image matrix A_m×pAnd a second image matrix B_p×nAnd (5) carrying out matrix multiplication to obtain a result.

A second aspect of the present application provides an image data processing apparatus comprising

An acquisition module for acquiring a first image matrix A_m×pAnd a second image matrix B_p×n；

A first determination module for determining the first image matrix A_m×pA corresponding first characteristic diagram;

a second determination module for determining the second image matrix B_p×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix B_p×nOne or more columns of elements;

a convolution module, configured to convolve the first feature map with the X convolution kernels to obtain a second feature map, where the second feature map corresponds to a third image matrix, and the third image matrix is a first image matrix a_m×pAnd a second image matrix B_p×nAnd (5) carrying out matrix multiplication to obtain a result.

A third aspect of the present application provides an electronic device comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the method as described above.

The technical scheme of the application canAcquiring and acquiring a first image matrix A_m×pAnd a second image matrix B_p×nAnd determining a first image matrix A_m×pCorresponding first feature map, and second image matrix B_p×nAnd performing convolution multiplication on the first characteristic diagram and the X convolution kernels to obtain a characteristic diagram corresponding to a third image matrix, wherein f is an integer larger than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.

Fig. 1 is a schematic flowchart of an image data processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a third feature diagram in an image data processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a first feature diagram in an image data processing method according to an embodiment of the present application;

fig. 4 is another schematic diagram of a first feature diagram in an image data processing method according to an embodiment of the present application;

fig. 5 is another schematic diagram of a first feature diagram in an image data processing method according to an embodiment of the present application;

fig. 6 is a schematic diagram of X convolution kernels in an image data processing method according to an embodiment of the present application;

fig. 7 is another schematic diagram of X convolution kernels in the image data processing method according to the embodiment of the present application;

fig. 8 is a schematic structural diagram of an image data processing apparatus shown in an embodiment of the present application;

fig. 9 is another schematic structural diagram of an image data processing apparatus shown in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

In view of the foregoing problems, embodiments of the present application provide an image data processing method, which can improve the flexibility of a transform module.

For ease of understanding, some terms referred to in the embodiments of the present application are described below.

Image matrix: digital image data can be represented by a matrix, so that the digital image can be analyzed and processed by adopting matrix theory and matrix algorithm. Since digital images can be represented in the form of a matrix, two-dimensional arrays are commonly used to store image data in computer digital image processing programs.

And (3) convolution kernel: when the convolution kernel is used for image processing, given an input image, each corresponding pixel in an output image is formed after weighted averaging of pixels in a small area in the input image, wherein a weight value is defined by a function, and the function is called the convolution kernel.

Feature map (feature map): in convolutional layers of neural networks, data exists in three dimensions. It can be viewed as a stack of a number of two-dimensional pictures, each of which is referred to as a feature map. In the input layer, if the image is a gray-scale image, only one characteristic image exists; in the case of color pictures, there are typically 3 signatures (red, green, and blue). A plurality of convolution kernels are arranged between layers, and the feature map of the next layer can be generated by convolving the previous layer and each feature map with each convolution kernel.

A data module: in the field of images, the image is usually a three-dimensional array, which is used to represent pixel values of an image, the length represents the height of the image, the width represents the width of the image, and the depth represents the number of color channels of the image.

The image data processing method in this embodiment may be used for image processing of a neural network including a transform module, or may be used for other neural networks that need matrix multiplication, and the embodiment is not limited to this specific example.

It should be noted that the image data processing apparatus in this embodiment may include a transform module or another module that needs to perform matrix multiplication, and this embodiment is not limited in this embodiment.

The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating an image data processing method according to an embodiment of the present application.

Referring to fig. 1, the image data processing method in the present embodiment includes:

101. the image data processing device acquires a first image matrix and a second image matrix;

an image data processing apparatus acquires a first image matrix A to be matrix-multiplied_m×pAnd a second image matrix B_p×n. Wherein the first image matrix A_m×pOr the second image matrix B_p×nThe image matrix may be an original image matrix or an image matrix obtained through matrix operation, and the embodiment is not limited in this embodiment.

102. The image data processing device determines a first feature map corresponding to the first image matrix;

an image data processing apparatus acquires a first image matrix A_m×pThen, a first image matrix A is generated according to a first preset rule_m×pA corresponding first characteristic diagram.

Specifically, the image data processing apparatus may generate the first image matrix a in the following manner_m×pThe corresponding first characteristic diagram:

s1, determining and displaying the first image matrix A_m×pM data modules corresponding to the m row vectors;

the image data processing device aims at the first image matrix A_m×pGenerating a data module corresponding to the row vector, wherein the length of the data module is 1, the width of the data module is 1, the depth of the data module is p, and p elements contained in the data module correspond to p elements of the row vector one by one. Specifically, the image data processing apparatus may generate the data module corresponding to the row vector through a matrix transformation function, and may also generate the data module corresponding to the row vector through other manners, which is not limited in this embodiment.

S2, splicing the m data modules according to a preset splicing rule to obtain a third feature diagram;

after the image data processing device generates m data modules, the m data modules are spliced according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a is multiplied by b is m.

Specifically, the splicing rule includes a splicing sequence, which may be from top to bottom, from left to right, that is, a data modules are spliced in the order from top to bottom, the 1 st column of the third feature map is arranged, and then the 2 nd, 3 rd, and b th columns are arranged in the order from top to bottom to obtain the third feature map. The splicing sequence can also be from top to bottom and from right to left; or from bottom to top, from left to right; or from bottom to top, from right to left; or other splicing sequences, and the embodiment is not limited.

Illustratively, the first image matrix

a is 2, b is 2, the stitching order is from top to bottom and from left to right, the image data processing apparatus generates a₄×₃Middle 4 row vectors [ a ]_1，1 a_1，2 a_1，3]，[a_2，1 a_2，2a_2，3]，[a_3，1 a_3，2 a_3，3]，[a_4，1 a_4，2 a_4，3]Corresponding 4 data modules M₁，M₂，M₃And M₄Then the image data processing apparatus sequentially transfers M in the order from the top to the bottom₂Spliced at M₁Get column 1, and then get M₃Is spliced to M₁Then M is again added₄Is spliced to M₃Get column 2 as shown in figure 2.

And S3, inserting a zero data module into the third feature map according to a preset zero padding rule to obtain the first feature map.

It should be understood that, in this embodiment, the length of the zero data block is 1, the width is 1, the depth is n, and the values of all elements in the zero data block are zero. Second image matrix B contained by zero-padding rule and convolution kernel_p×nAnd the position of the column vector at the convolution kernel.

As aAlternatively, the second image matrix B_p×nEach convolution kernel includes a second image matrix B_p×nThe corresponding image data processing apparatus may insert a zero data block by: the image data processing device inserts zero data modules into the upper edge, the lower edge, the left edge and the right edge of the third feature map respectively to obtain a first feature map

Specifically, if each convolution kernel includes a column vector, f is an odd number, and the column vector is located at the center of the convolution kernel, the image data processing apparatus may insert each of the upper edge and the lower edge of the third feature map

Line zero data module inserted into the left and right edges of the third characteristic diagram

The column zero data module obtains a first characteristic diagram, namely the front of the 1 st to the n th channels of the first characteristic diagram

Column, last

Column, front

Row and last

The values of the rows are all zero. Exemplarily, f is 3, the first image matrix

The third feature map obtained by the image data processing device is shown in FIG. 2, and then 1 row of zero data blocks are respectively inserted into the upper edge and the lower edge of the third feature map, and 1 column of zero data blocks are respectively inserted into the left edge and the right edge of the third feature map to obtain the first feature map, as shown in the figure3, respectively.

If each convolution kernel includes a column vector, f is an odd number, and the column vector is not located at the center of the convolution kernel, the image data processing apparatus may insert at the upper edge of the third feature map

Row zero data module, lower edge insert

Line zero data block, each inserted at the left edge

Column zero data block, right edge insert

The column zero data module obtains a first characteristic diagram, L₁，L₂，L₃And L₄Is related to the position of the column vector in the convolution kernel. For example, if f is 3, the column vector is located in the 1 st row and 1 st column of the convolution kernel, then the image data processing apparatus inserts a 1-row zero data block in the upper edge, inserts a 2-row zero data block in the lower edge, inserts a 1-row zero data block in the left edge, and inserts a 2-row zero data block in the right edge, as shown in fig. 4.

As an alternative, the system sets the second image matrix B_p×nEach convolution kernel includes a second image matrix B_p×nThe corresponding image data processing apparatus may insert the zero data block by: and the image data processing device inserts a zero data module into the target position of the third feature map to obtain a first feature map, wherein the target position comprises an inner position and an edge position, and the target position is related to the corresponding positions of the plurality of column vectors in the convolution kernel.

Illustratively, each convolution kernel contains two column vectors W₁And W₂，W₁At row 2, column 2, W of the convolution kernel₂Located at row 2 and column 3 of the convolution kernel. Image data processing apparatusInserting 1 column of zero data module between each column of the third feature map, inserting 1 row of zero data module at the upper edge and the lower edge of the third feature map, and inserting 1 column of zero data module at the left edge and the right edge to obtain the first feature map, as shown in fig. 5.

It should be understood that the zero padding rule is related to the number of column vectors included in the convolution kernel and the corresponding position of each column vector in the convolution kernel, and may be specifically set by a user according to the set number of column vectors and the corresponding position of the column vectors in the convolution kernel, or may be set by other means, which is not limited in this embodiment.

103. The image data processing device determines X convolution kernels corresponding to the second image matrix;

the image data processing device acquires a second image matrix B_p×nThen, a second image matrix B is generated according to a second preset rule_p×nCorresponding X convolution kernels of size f X p, each convolution kernel containing a second image matrix B_p×nI.e. each convolution kernel contains said second image matrix B_p×nWherein f is an integer greater than 1 and X is an integer greater than or equal to 1.

Specifically, the image data processing apparatus may determine the second image matrix B in the following manner_p×nThe corresponding X convolution kernels: the image data processing apparatus determines values of n groups of target elements in X convolution kernels, and determines values of 0 in the X convolution kernels and other elements except the n groups of target elements, wherein each group of target elements corresponds to the second image matrix B_p×nA column vector of (a), the values of the ith group of target elements and a second image matrix B_p×nI is an integer from 1 to n, and each convolution kernel contains one or more sets of target elements from the n sets of target elements.

It should be noted that the size of the convolution kernel is preset, that is, the value of f is a preset value. The system may also set the number of column vectors each convolution kernel contains, i.e. each convolution kernel contains the second image matrix B_p×nHow many column elements in. Further, the system may also set each convolution kernelThe corresponding position of the included column vector in the convolution kernel.

In some embodiments, the system sets each convolution kernel to include a column vector, and X ═ n, each convolution kernel includes a set of target elements. In particular, for each convolution kernel, the target element Z that the convolution kernel contains₁，Z₂，...，Z_pAre located at the 1 st, the jth row and the kth column of the convolution kernel, respectively. More specifically, the system sets that the column vector contained in the convolution kernel is located at the center of the convolution kernel, and f is odd, so that the target element Z contained in the convolution kernel is contained in the convolution kernel₁，Z₂，...，Z_pAre respectively located at the 1 st, p th channel of the convolution kernel

Go to the first

And (4) columns. Exemplarily, f is 3, second image matrix

The image data processing device determines a second image matrix B_3×3Corresponding to 3 groups of target elements b in 3 convolution kernels_1，1，b_2，1，b_3，1；b_1，2，b_2，2，b_3，2；b_1，3，b_2，3，b_3，3(ii) a And determining the values of the elements except the target element in each convolution kernel to be 0, wherein each convolution kernel comprises 1 group of target elements, and the target elements contained in each convolution kernel are respectively positioned in the 2 nd row and the 2 nd column of the 1 st, the 2 nd and the 3 rd channels of the convolution kernel, as shown in fig. 6.

In some embodiments, the system presets each convolution kernel to contain u column vectors, u being an integer greater than 1 and less than or equal to f.

If u can be divided exactly by n, then X is n/u and each convolution kernel contains u sets of target elements. Specifically, for each convolution kernel of the X convolution kernels, the convolution kernel contains a vth set of target elements located at 1, 2, of the convolution kernel, respectively.J th of p channels_vLine kth_vWherein v is an integer of 1 to u;

if u cannot be divided exactly by n, X is the smallest integer greater than n/u, the 1 st to X-1 st convolution kernels of the X convolution kernels each contain u groups of target elements, and the X-th convolution kernel contains n-u (X-1) groups of target elements. The 1 st to the X-1 st convolution kernels contain the v-th group of target elements in the u-th group of target elements respectively located at the 1 st, 2 nd, and j-th channels of the convolution kernels_vLine kth_vIn this example, the first convolution kernel includes n-u (X-1) target elements, the r-th target element being located in the 1 st, 2 nd, and j-th channels of the convolution kernel_rLine kth_rWherein r is an integer of 1 to n-u (X-1).

Illustratively, each convolution kernel contains 2 column vectors W₁And W₂，W₁At row 2, column 2, W of the convolution kernel₂Located at row 2 and column 3 of the convolution kernel.

Second image matrix

The image data processing device determines a second image matrix B_3×3Corresponding to 3 groups of target elements b in 3 convolution kernels_1，1，b_2，1，b_3，1；b_1，2，b_2，2，b_3，2；b_1，3，b_2，3，b_3，3(ii) a And determining the values of the elements except the target element in each convolution kernel to be 0, wherein the 1 st convolution kernel comprises a second image matrix B_3×3First column element b in (1)_1，1，b_2，1，b_3，1And a second column element b_1，2，b_2，2，b_3，2The 2 nd convolution kernel contains the second image matrix B_3×3Third column element b in (1)_1，3，b_2，3，b_3，3As shown in fig. 7.

104. The image data processing device convolves the first feature map with X convolution kernels to obtain a second feature map.

Image data processing apparatus for generating first feature map and X convolution kernelsAnd then, convolving the first characteristic diagram with X convolution kernels to obtain a second characteristic diagram, and convolving the second characteristic diagram with a third image matrix C_m×nCorrespondingly, the third image matrix C_m×nI.e. the first image matrix a_m×pAnd a second image matrix B_p×nAnd (5) carrying out matrix multiplication to obtain a result.

In some embodiments, f is an odd number, each convolution kernel includes a column vector of the second image matrix, and the column vector is located at the center of the convolution kernel, so that the image data processing apparatus convolves the first feature map with n convolution kernels to obtain a second feature map having a length a, a width b, and a depth n. The second feature maps each correspond to a third image matrix C with a depth direction of 1 × 1 × n data blocks_m×nAnd the distribution sequence of the column vector sequence in the second feature map corresponds to the splicing sequence of the third feature map, namely the second feature map and C_m×nThe mapping method and the third characteristic diagram and A_m×pThe mapping method is the same.

In some embodiments, each convolution kernel includes column vectors of a plurality of second image matrices, and after the image data processing apparatus convolves the first feature map with the n convolution kernels to obtain a second feature map, the image data processing apparatus rearranges the data blocks in the second feature map according to a preset rearrangement rule to obtain a fourth feature map, where the fourth feature map has a length a, a width b, and a depth n. The fourth feature maps each correspond to the third image matrix C with data of 1 × 1 × n in the depth direction_m×nAnd the distribution order of the column vector in the fourth feature map corresponds to the stitching order of the third feature map.

In some embodiments, each convolution kernel includes a column vector of the second image matrix, and the column vector is not located in the center of the convolution kernel, so that the image data processing apparatus deletes the zero data block at the preset position after convolving the first feature map with the n convolution kernels to obtain the second feature map, where the length of the fifth feature map is a, the width of the fifth feature map is b, and the depth of the fifth feature map is n. The fifth feature maps each correspond to the third image matrix C with data of 1 × 1 × n in the depth direction_m×nAnd the distribution of the column vector sequence in the fifth feature mapThe order corresponds to the stitching order of the third feature map.

According to the technical scheme, the first image matrix A can be obtained_m×pAnd a second image matrix B_p×nAnd determining a first image matrix A_m×pCorresponding first feature map, and second image matrix B_p×nAnd performing convolution multiplication on the first characteristic diagram and the X convolution kernels to obtain a characteristic diagram corresponding to a third image matrix, wherein f is an integer larger than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.

Secondly, the embodiment of the application provides a plurality of modes for determining the first characteristic diagram and a plurality of modes for determining the convolution kernel, and the flexibility of the scheme is improved.

Corresponding to the embodiment of the application function implementation method, the application also provides an image data processing device, electronic equipment and a corresponding embodiment.

Fig. 8 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application.

Referring to fig. 8, an image data processing apparatus 800 in the present embodiment includes:

an acquisition module 801 for acquiring a first image matrix a_m×pAnd a second image matrix B_p×n；

A first determining module 802 for determining a first image matrix A_m×pA corresponding first characteristic diagram;

a second determining module 803 for determining a second image matrix B_p×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1 and X is an integer greater than or equal to 1, each convolution kernel containing a second image matrix B_p×nOne or more columns of elements;

a convolution module 804 for transforming the first featureConvolving the image with X convolution kernels to obtain a second characteristic image, wherein the second characteristic image corresponds to a third image matrix, and the third image matrix is a first image matrix A_m×pAnd a second image matrix B_p×nAnd (5) carrying out matrix multiplication to obtain a result.

According to the technical scheme of the application, the obtaining module 801 can obtain the first image matrix A_m×pAnd a second image matrix B_p×nThe first determination module 802 may determine the first image matrix A_m×pThe corresponding first feature map, the second determining module 803 may determine the second image matrix B_p×nCorresponding X convolution kernels with the size of f × f × p, and then the convolution module 804 may perform convolution multiplication on the first feature map and the X convolution kernels to obtain a feature map corresponding to a third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.

For easy understanding, the image data processing apparatus of the present application will be described in detail below, and referring to fig. 9, the image data processing apparatus 900 of the present embodiment includes:

an obtaining module 901 for obtaining a first image matrix a_m×pAnd a second image matrix B_p×n；

A first determining module 902 for determining a first image matrix A_m×pA corresponding first characteristic diagram;

a second determining module 903 for determining a second image matrix B_p×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1 and X is an integer greater than or equal to 1, each convolution kernel containing a second image matrix B_p×nOne or more columns of elements;

a convolution module 904, configured to convolve the first feature map with the X convolution kernels to obtain a second feature map, where the second feature map corresponds to a third image matrix, and the third feature map is a third feature mapThe image matrix is a first image matrix A_m×pAnd a second image matrix B_p×nPerforming matrix multiplication to obtain a result;

wherein the second determining module 903 comprises:

a first determining unit 9031, configured to determine values of n groups of target elements in the X convolution kernels, where values of an ith group of target elements in the n groups of target elements correspond to values of an ith column of elements in the second image matrix, i is an integer from 1 to n, and each convolution kernel includes one or more groups of target elements in the n groups of target elements;

a second determining unit 9032, configured to determine that values of elements other than the n groups of target elements in the X convolution kernels are 0;

optionally, each convolution kernel contains a set of target elements Z of the n sets of target elements₁，Z₂，...，Z_pFor each convolution kernel, the target element Z contained in the convolution kernel₁，Z₂，...，Z_pThe jth row and the kth column of the 1 st, 2.,. th and p channels of the convolution kernel respectively;

the first determining module 902 includes:

a third determination unit 9021 for determining the first image matrix a_m×pM data modules corresponding to the m row vectors, wherein the length of the m data modules is 1, the width of the m data modules is 1, and the depth of the m data modules is p;

the first splicing unit 9022 is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, where the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is equal to m:

a first zero padding unit 9023, configured to insert a zero data block into an upper edge, a lower edge, a left edge, and a right edge of the third feature map, respectively, to obtain a first feature map, where the length of the zero data block is 1, the width of the zero data block is 1, the depth of the zero data block is p, and values of all elements in the zero data block are zero.

Optionally, each convolution kernel includes u groups of target elements in n groups of target elements, u being an integer greater than 1 and less than or equal to f, and for each convolution kernel, p target elements in the v-th group of target elements included in the convolution kernel are respectivelyJ-th channel located in 1, 2, p-th channel of the convolution kernel_vLine kth_vColumns, where v is an integer from 1 to u;

the first determining module 902 includes:

a fourth determination unit 9024 for determining the first image matrix a_m×pM data modules corresponding to the m row vectors, wherein the length of the m data modules is 1, the width of the m data modules is 1, and the depth of the m data modules is p;

the second splicing unit 9025 is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, where the third feature map has a length a, a width b, and a depth p, and a × b is equal to m;

a second zero padding unit 9026, configured to insert a zero data module into a target position of the third feature map to obtain the first feature map, where the target position includes an internal position and an edge position, the target position is related to positions of multiple groups of target elements in a convolution kernel, the zero data module has a length of 1, a width of 1, a depth of n, and values of all elements in the zero data module are zero.

According to the technical scheme of the present application, the obtaining module 901 may obtain the first image matrix a_m×pAnd a second image matrix B_p×nThe first determination module 902 may determine a first image matrix A_m×pThe second determining module 903 may determine the second image matrix B according to the first feature map_p×nCorresponding X convolution kernels with the size of f × f × p, and then the convolution module 904 may perform convolution multiplication on the first feature map and the X convolution kernels to obtain a feature map corresponding to a third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.

Secondly, the embodiment of the application provides a mode that the first characteristic diagram and the convolution kernel can be determined in multiple modes, and the flexibility of the scheme is improved.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Referring to fig. 10, the electronic device 1000 includes a memory 1010 and a processor 1020.

The Processor 1020 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 1010 has stored thereon executable code that, when processed by the processor 1020, may cause the processor 1020 to perform some or all of the methods described above.

Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.

Alternatively, the present application may also be embodied as a computer-readable storage medium (or non-transitory machine-readable storage medium or machine-readable storage medium) having executable code (or a computer program or computer instruction code) stored thereon, which, when executed by a processor of an electronic device (or server, etc.), causes the processor to perform part or all of the various steps of the above-described method according to the present application.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image data processing method characterized by comprising:

obtaining a first image matrixA_m×pAnd a second image matrix B_p×n；

determining the second image matrix B_p×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix B_p×nOne or more columns of elements;

2. The method according to claim 1, wherein said determining said second image matrix B_p×nThe corresponding X convolution kernels include:

determining values of n groups of target elements in the X convolution kernels, wherein the value of the ith group of target elements in the n groups of target elements corresponds to the value of the ith column of elements in the second image matrix, i is an integer from 1 to n, and each convolution kernel comprises one or more groups of target elements in the n groups of target elements;

determining values of elements other than the n sets of target elements in the X convolution kernels to be 0.

3. The image data processing method of claim 2, wherein each convolution kernel contains a set of target elements Z of the n sets of target elements₁，Z₂，...，Z_pFor each convolution kernel, the target element Z contained in the convolution kernel₁，Z₂，...，Z_pThe jth row and the kth column of the 1 st, 2.,. th and p channels of the convolution kernel respectively;

said determining said first image matrix A_m×pThe corresponding first characteristic diagram comprises:

determining the first image matrix A_m×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;

splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;

and respectively inserting zero data modules into the upper edge, the lower edge, the left edge and the right edge of the third feature diagram to obtain a first feature diagram, wherein the length of each zero data module is 1, the width of each zero data module is 1, the depth of each zero data module is p, and the values of all elements in each zero data module are zero.

4. The image data processing method according to claim 2, wherein each convolution kernel includes u groups of target elements from the n groups of target elements, u being an integer greater than 1 and less than or equal to f, and for each convolution kernel, p target elements from the v groups of target elements included in the convolution kernel are respectively located at 1 st, 2 th and j th channels of the convolution kernel_vLine kth_vColumns, where v is an integer from 1 to u;

inserting a zero data module into a target position of the third feature map to obtain a first feature map, wherein the target position comprises an internal position and an edge position, the target position is related to corresponding positions of the multiple groups of target elements in the convolution kernel, the length of the zero data module is 1, the width of the zero data module is 1, the depth of the zero data module is n, and values of all elements in the zero data module are zero.

5. An image data apparatus, comprising:

6. The image data processing apparatus according to claim 5, wherein the second determination module includes:

a first determining unit, configured to determine values of n groups of target elements in the X convolution kernels, where a value of an ith group of target elements in the n groups of target elements corresponds to a value of an ith column of elements in a second image matrix, i is an integer from 1 to n, and each convolution kernel includes one or more groups of target elements in the n groups of target elements;

a second determining unit configured to determine that values of elements other than the n groups of target elements in the X convolution kernels are 0.

7. The image data processing apparatus of claim 6, wherein each convolution kernel contains a set of target elements Z of the n sets of target elements₁，Z₂，...，Z_pFor each convolution kernel, the convolution kernel containsTarget element Z₁，Z₂，...，Z_pThe jth row and the kth column of the 1 st, 2.,. th and p channels of the convolution kernel respectively;

the first determining module includes:

a third determination unit for determining the first image matrix A_m×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;

the first splicing unit is used for splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;

and the first zero padding unit is used for respectively inserting zero data modules into the upper edge, the lower edge, the left edge and the right edge of the third feature diagram to obtain a first feature diagram, the length of each zero data module is 1, the width of each zero data module is 1, the depth of each zero data module is p, and the values of all elements in each zero data module are zero.

8. The image data processing apparatus according to claim 6, wherein each convolution kernel includes u groups of target elements out of the n groups of target elements, u being an integer greater than 1 and less than or equal to f, and for each convolution kernel, p target elements out of the v groups of target elements included in the convolution kernel are respectively located at 1 st, 2 th and j th channels of the convolution kernel_vLine kth_vColumns, where v is an integer from 1 to u;

the first determining module includes:

a fourth determination unit for determining the first image matrix A_m×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;

the second splicing unit is used for splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;

a second zero padding unit, configured to insert a zero data module into a target position of the third feature map to obtain a first feature map, where the target position includes an internal position and an edge position, the target position is related to positions of the multiple groups of target elements in the convolution kernel, the zero data module has a length of 1, a width of 1, and a depth of n, and values of all elements in the zero data module are zero.

9. An electronic device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-4.

10. A computer-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-4.