CN114283314A - Image data processing method and device - Google Patents

Image data processing method and device Download PDF

Info

Publication number
CN114283314A
CN114283314A CN202111477627.2A CN202111477627A CN114283314A CN 114283314 A CN114283314 A CN 114283314A CN 202111477627 A CN202111477627 A CN 202111477627A CN 114283314 A CN114283314 A CN 114283314A
Authority
CN
China
Prior art keywords
image matrix
feature map
convolution kernel
image
data module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111477627.2A
Other languages
Chinese (zh)
Inventor
胡宇
姬彬斐
刘嘉超
刘兰个川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Autopilot Technology Co Ltd filed Critical Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority to CN202111477627.2A priority Critical patent/CN114283314A/en
Publication of CN114283314A publication Critical patent/CN114283314A/en
Priority to PCT/CN2022/122544 priority patent/WO2023103551A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to an image data processing method and device. The method comprises the following steps: obtaining a first image matrix Am×pAnd a second image matrix Bp×n(ii) a Determining the first image matrix Am×pA corresponding first characteristic diagram; determining the second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix Bp×nOne or more columns of elements; convolving the first feature map with the X convolution kernels to obtainTo a second characteristic diagram, the second characteristic diagram corresponds to a third image matrix, and the third image matrix is a first image matrix Am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result. The scheme provided by the application can improve the flexibility of the Transformer module.

Description

Image data processing method and device
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image data processing method and apparatus.
Background
The current mainstream machine translation is mainly based on neural network machine translation, and the method is a system with an encoder-decoder (encoder-decoder) architecture, wherein an encoder encodes a source language sequence, extracts information, and converts the information into a target language through a decoder to complete a language translation process. The deep self-attention transform (Transformer) model designed based on the 'encoder-decoder' architecture has become the mainstream model of the machine translation field due to the superior performance, and has a great influence in the deep learning field.
In a neural network with a transform as a main module, two-dimensional data tensors are subjected to matrix multiplication, and in some schemes, a transform model is generally deployed in a chip supporting matrix multiplication for use.
However, some chips in the market only support convolution calculation, so that the Transformer module which needs to perform matrix multiplication operation cannot be deployed on the chips, which causes the use of the Transformer module to be limited and the flexibility to be poor.
Disclosure of Invention
In order to solve or partially solve the problems in the related art, the application provides an image data processing method which can improve the flexibility of a transform module.
A first aspect of the present application provides an image data processing method, comprising obtaining a first image matrix am×pAnd a second image matrix Bp×n
Determining the first image matrix Am×pA corresponding first characteristic diagram;
determining the second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second convolution kernelTwo-image matrix Bp×nOne or more columns of elements;
convolving the first feature map with the X convolution kernels to obtain a second feature map, wherein the second feature map corresponds to a third image matrix, and the third image matrix is a first image matrix Am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result.
A second aspect of the present application provides an image data processing apparatus comprising
An acquisition module for acquiring a first image matrix Am×pAnd a second image matrix Bp×n
A first determination module for determining the first image matrix Am×pA corresponding first characteristic diagram;
a second determination module for determining the second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix Bp×nOne or more columns of elements;
a convolution module, configured to convolve the first feature map with the X convolution kernels to obtain a second feature map, where the second feature map corresponds to a third image matrix, and the third image matrix is a first image matrix am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result.
A third aspect of the present application provides an electronic device comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform the method as described above.
The technical scheme of the application canAcquiring and acquiring a first image matrix Am×pAnd a second image matrix Bp×nAnd determining a first image matrix Am×pCorresponding first feature map, and second image matrix Bp×nAnd performing convolution multiplication on the first characteristic diagram and the X convolution kernels to obtain a characteristic diagram corresponding to a third image matrix, wherein f is an integer larger than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.
Fig. 1 is a schematic flowchart of an image data processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a third feature diagram in an image data processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a first feature diagram in an image data processing method according to an embodiment of the present application;
fig. 4 is another schematic diagram of a first feature diagram in an image data processing method according to an embodiment of the present application;
fig. 5 is another schematic diagram of a first feature diagram in an image data processing method according to an embodiment of the present application;
fig. 6 is a schematic diagram of X convolution kernels in an image data processing method according to an embodiment of the present application;
fig. 7 is another schematic diagram of X convolution kernels in the image data processing method according to the embodiment of the present application;
fig. 8 is a schematic structural diagram of an image data processing apparatus shown in an embodiment of the present application;
fig. 9 is another schematic structural diagram of an image data processing apparatus shown in an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While embodiments of the present application are illustrated in the accompanying drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms "first," "second," "third," etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In view of the foregoing problems, embodiments of the present application provide an image data processing method, which can improve the flexibility of a transform module.
For ease of understanding, some terms referred to in the embodiments of the present application are described below.
Image matrix: digital image data can be represented by a matrix, so that the digital image can be analyzed and processed by adopting matrix theory and matrix algorithm. Since digital images can be represented in the form of a matrix, two-dimensional arrays are commonly used to store image data in computer digital image processing programs.
And (3) convolution kernel: when the convolution kernel is used for image processing, given an input image, each corresponding pixel in an output image is formed after weighted averaging of pixels in a small area in the input image, wherein a weight value is defined by a function, and the function is called the convolution kernel.
Feature map (feature map): in convolutional layers of neural networks, data exists in three dimensions. It can be viewed as a stack of a number of two-dimensional pictures, each of which is referred to as a feature map. In the input layer, if the image is a gray-scale image, only one characteristic image exists; in the case of color pictures, there are typically 3 signatures (red, green, and blue). A plurality of convolution kernels are arranged between layers, and the feature map of the next layer can be generated by convolving the previous layer and each feature map with each convolution kernel.
A data module: in the field of images, the image is usually a three-dimensional array, which is used to represent pixel values of an image, the length represents the height of the image, the width represents the width of the image, and the depth represents the number of color channels of the image.
The image data processing method in this embodiment may be used for image processing of a neural network including a transform module, or may be used for other neural networks that need matrix multiplication, and the embodiment is not limited to this specific example.
It should be noted that the image data processing apparatus in this embodiment may include a transform module or another module that needs to perform matrix multiplication, and this embodiment is not limited in this embodiment.
The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating an image data processing method according to an embodiment of the present application.
Referring to fig. 1, the image data processing method in the present embodiment includes:
101. the image data processing device acquires a first image matrix and a second image matrix;
an image data processing apparatus acquires a first image matrix A to be matrix-multipliedm×pAnd a second image matrix Bp×n. Wherein the first image matrix Am×pOr the second image matrix Bp×nThe image matrix may be an original image matrix or an image matrix obtained through matrix operation, and the embodiment is not limited in this embodiment.
102. The image data processing device determines a first feature map corresponding to the first image matrix;
an image data processing apparatus acquires a first image matrix Am×pThen, a first image matrix A is generated according to a first preset rulem×pA corresponding first characteristic diagram.
Specifically, the image data processing apparatus may generate the first image matrix a in the following mannerm×pThe corresponding first characteristic diagram:
s1, determining and displaying the first image matrix Am×pM data modules corresponding to the m row vectors;
the image data processing device aims at the first image matrix Am×pGenerating a data module corresponding to the row vector, wherein the length of the data module is 1, the width of the data module is 1, the depth of the data module is p, and p elements contained in the data module correspond to p elements of the row vector one by one. Specifically, the image data processing apparatus may generate the data module corresponding to the row vector through a matrix transformation function, and may also generate the data module corresponding to the row vector through other manners, which is not limited in this embodiment.
S2, splicing the m data modules according to a preset splicing rule to obtain a third feature diagram;
after the image data processing device generates m data modules, the m data modules are spliced according to a preset splicing rule to obtain a third feature map, the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a is multiplied by b is m.
Specifically, the splicing rule includes a splicing sequence, which may be from top to bottom, from left to right, that is, a data modules are spliced in the order from top to bottom, the 1 st column of the third feature map is arranged, and then the 2 nd, 3 rd, and b th columns are arranged in the order from top to bottom to obtain the third feature map. The splicing sequence can also be from top to bottom and from right to left; or from bottom to top, from left to right; or from bottom to top, from right to left; or other splicing sequences, and the embodiment is not limited.
Illustratively, the first image matrix
Figure BDA0003394009080000051
a is 2, b is 2, the stitching order is from top to bottom and from left to right, the image data processing apparatus generates a4×3Middle 4 row vectors [ a ]1,1 a1,2 a1,3],[a2,1 a2,2a2,3],[a3,1 a3,2 a3,3],[a4,1 a4,2 a4,3]Corresponding 4 data modules M1,M2,M3And M4Then the image data processing apparatus sequentially transfers M in the order from the top to the bottom2Spliced at M1Get column 1, and then get M3Is spliced to M1Then M is again added4Is spliced to M3Get column 2 as shown in figure 2.
And S3, inserting a zero data module into the third feature map according to a preset zero padding rule to obtain the first feature map.
It should be understood that, in this embodiment, the length of the zero data block is 1, the width is 1, the depth is n, and the values of all elements in the zero data block are zero. Second image matrix B contained by zero-padding rule and convolution kernelp×nAnd the position of the column vector at the convolution kernel.
As aAlternatively, the second image matrix Bp×nEach convolution kernel includes a second image matrix Bp×nThe corresponding image data processing apparatus may insert a zero data block by: the image data processing device inserts zero data modules into the upper edge, the lower edge, the left edge and the right edge of the third feature map respectively to obtain a first feature map
Specifically, if each convolution kernel includes a column vector, f is an odd number, and the column vector is located at the center of the convolution kernel, the image data processing apparatus may insert each of the upper edge and the lower edge of the third feature map
Figure BDA0003394009080000061
Line zero data module inserted into the left and right edges of the third characteristic diagram
Figure BDA0003394009080000062
The column zero data module obtains a first characteristic diagram, namely the front of the 1 st to the n th channels of the first characteristic diagram
Figure BDA0003394009080000063
Column, last
Figure BDA0003394009080000064
Column, front
Figure BDA0003394009080000065
Row and last
Figure BDA0003394009080000066
The values of the rows are all zero. Exemplarily, f is 3, the first image matrix
Figure BDA0003394009080000067
The third feature map obtained by the image data processing device is shown in FIG. 2, and then 1 row of zero data blocks are respectively inserted into the upper edge and the lower edge of the third feature map, and 1 column of zero data blocks are respectively inserted into the left edge and the right edge of the third feature map to obtain the first feature map, as shown in the figure3, respectively.
If each convolution kernel includes a column vector, f is an odd number, and the column vector is not located at the center of the convolution kernel, the image data processing apparatus may insert at the upper edge of the third feature map
Figure BDA0003394009080000068
Row zero data module, lower edge insert
Figure BDA0003394009080000069
Line zero data block, each inserted at the left edge
Figure BDA00033940090800000610
Column zero data block, right edge insert
Figure BDA00033940090800000611
The column zero data module obtains a first characteristic diagram, L1,L2,L3And L4Is related to the position of the column vector in the convolution kernel. For example, if f is 3, the column vector is located in the 1 st row and 1 st column of the convolution kernel, then the image data processing apparatus inserts a 1-row zero data block in the upper edge, inserts a 2-row zero data block in the lower edge, inserts a 1-row zero data block in the left edge, and inserts a 2-row zero data block in the right edge, as shown in fig. 4.
As an alternative, the system sets the second image matrix Bp×nEach convolution kernel includes a second image matrix Bp×nThe corresponding image data processing apparatus may insert the zero data block by: and the image data processing device inserts a zero data module into the target position of the third feature map to obtain a first feature map, wherein the target position comprises an inner position and an edge position, and the target position is related to the corresponding positions of the plurality of column vectors in the convolution kernel.
Illustratively, each convolution kernel contains two column vectors W1And W2,W1At row 2, column 2, W of the convolution kernel2Located at row 2 and column 3 of the convolution kernel. Image data processing apparatusInserting 1 column of zero data module between each column of the third feature map, inserting 1 row of zero data module at the upper edge and the lower edge of the third feature map, and inserting 1 column of zero data module at the left edge and the right edge to obtain the first feature map, as shown in fig. 5.
It should be understood that the zero padding rule is related to the number of column vectors included in the convolution kernel and the corresponding position of each column vector in the convolution kernel, and may be specifically set by a user according to the set number of column vectors and the corresponding position of the column vectors in the convolution kernel, or may be set by other means, which is not limited in this embodiment.
103. The image data processing device determines X convolution kernels corresponding to the second image matrix;
the image data processing device acquires a second image matrix Bp×nThen, a second image matrix B is generated according to a second preset rulep×nCorresponding X convolution kernels of size f X p, each convolution kernel containing a second image matrix Bp×nI.e. each convolution kernel contains said second image matrix Bp×nWherein f is an integer greater than 1 and X is an integer greater than or equal to 1.
Specifically, the image data processing apparatus may determine the second image matrix B in the following mannerp×nThe corresponding X convolution kernels: the image data processing apparatus determines values of n groups of target elements in X convolution kernels, and determines values of 0 in the X convolution kernels and other elements except the n groups of target elements, wherein each group of target elements corresponds to the second image matrix Bp×nA column vector of (a), the values of the ith group of target elements and a second image matrix Bp×nI is an integer from 1 to n, and each convolution kernel contains one or more sets of target elements from the n sets of target elements.
It should be noted that the size of the convolution kernel is preset, that is, the value of f is a preset value. The system may also set the number of column vectors each convolution kernel contains, i.e. each convolution kernel contains the second image matrix Bp×nHow many column elements in. Further, the system may also set each convolution kernelThe corresponding position of the included column vector in the convolution kernel.
In some embodiments, the system sets each convolution kernel to include a column vector, and X ═ n, each convolution kernel includes a set of target elements. In particular, for each convolution kernel, the target element Z that the convolution kernel contains1,Z2,...,ZpAre located at the 1 st, the jth row and the kth column of the convolution kernel, respectively. More specifically, the system sets that the column vector contained in the convolution kernel is located at the center of the convolution kernel, and f is odd, so that the target element Z contained in the convolution kernel is contained in the convolution kernel1,Z2,...,ZpAre respectively located at the 1 st, p th channel of the convolution kernel
Figure BDA0003394009080000081
Go to the first
Figure BDA0003394009080000082
And (4) columns. Exemplarily, f is 3, second image matrix
Figure BDA0003394009080000083
The image data processing device determines a second image matrix B3×3Corresponding to 3 groups of target elements b in 3 convolution kernels1,1,b2,1,b3,1;b1,2,b2,2,b3,2;b1,3,b2,3,b3,3(ii) a And determining the values of the elements except the target element in each convolution kernel to be 0, wherein each convolution kernel comprises 1 group of target elements, and the target elements contained in each convolution kernel are respectively positioned in the 2 nd row and the 2 nd column of the 1 st, the 2 nd and the 3 rd channels of the convolution kernel, as shown in fig. 6.
In some embodiments, the system presets each convolution kernel to contain u column vectors, u being an integer greater than 1 and less than or equal to f.
If u can be divided exactly by n, then X is n/u and each convolution kernel contains u sets of target elements. Specifically, for each convolution kernel of the X convolution kernels, the convolution kernel contains a vth set of target elements located at 1, 2, of the convolution kernel, respectively.J th of p channelsvLine kthvWherein v is an integer of 1 to u;
if u cannot be divided exactly by n, X is the smallest integer greater than n/u, the 1 st to X-1 st convolution kernels of the X convolution kernels each contain u groups of target elements, and the X-th convolution kernel contains n-u (X-1) groups of target elements. The 1 st to the X-1 st convolution kernels contain the v-th group of target elements in the u-th group of target elements respectively located at the 1 st, 2 nd, and j-th channels of the convolution kernelsvLine kthvIn this example, the first convolution kernel includes n-u (X-1) target elements, the r-th target element being located in the 1 st, 2 nd, and j-th channels of the convolution kernelrLine kthrWherein r is an integer of 1 to n-u (X-1).
Illustratively, each convolution kernel contains 2 column vectors W1And W2,W1At row 2, column 2, W of the convolution kernel2Located at row 2 and column 3 of the convolution kernel.
Second image matrix
Figure BDA0003394009080000084
The image data processing device determines a second image matrix B3×3Corresponding to 3 groups of target elements b in 3 convolution kernels1,1,b2,1,b3,1;b1,2,b2,2,b3,2;b1,3,b2,3,b3,3(ii) a And determining the values of the elements except the target element in each convolution kernel to be 0, wherein the 1 st convolution kernel comprises a second image matrix B3×3First column element b in (1)1,1,b2,1,b3,1And a second column element b1,2,b2,2,b3,2The 2 nd convolution kernel contains the second image matrix B3×3Third column element b in (1)1,3,b2,3,b3,3As shown in fig. 7.
104. The image data processing device convolves the first feature map with X convolution kernels to obtain a second feature map.
Image data processing apparatus for generating first feature map and X convolution kernelsAnd then, convolving the first characteristic diagram with X convolution kernels to obtain a second characteristic diagram, and convolving the second characteristic diagram with a third image matrix Cm×nCorrespondingly, the third image matrix Cm×nI.e. the first image matrix am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result.
In some embodiments, f is an odd number, each convolution kernel includes a column vector of the second image matrix, and the column vector is located at the center of the convolution kernel, so that the image data processing apparatus convolves the first feature map with n convolution kernels to obtain a second feature map having a length a, a width b, and a depth n. The second feature maps each correspond to a third image matrix C with a depth direction of 1 × 1 × n data blocksm×nAnd the distribution sequence of the column vector sequence in the second feature map corresponds to the splicing sequence of the third feature map, namely the second feature map and Cm×nThe mapping method and the third characteristic diagram and Am×pThe mapping method is the same.
In some embodiments, each convolution kernel includes column vectors of a plurality of second image matrices, and after the image data processing apparatus convolves the first feature map with the n convolution kernels to obtain a second feature map, the image data processing apparatus rearranges the data blocks in the second feature map according to a preset rearrangement rule to obtain a fourth feature map, where the fourth feature map has a length a, a width b, and a depth n. The fourth feature maps each correspond to the third image matrix C with data of 1 × 1 × n in the depth directionm×nAnd the distribution order of the column vector in the fourth feature map corresponds to the stitching order of the third feature map.
In some embodiments, each convolution kernel includes a column vector of the second image matrix, and the column vector is not located in the center of the convolution kernel, so that the image data processing apparatus deletes the zero data block at the preset position after convolving the first feature map with the n convolution kernels to obtain the second feature map, where the length of the fifth feature map is a, the width of the fifth feature map is b, and the depth of the fifth feature map is n. The fifth feature maps each correspond to the third image matrix C with data of 1 × 1 × n in the depth directionm×nAnd the distribution of the column vector sequence in the fifth feature mapThe order corresponds to the stitching order of the third feature map.
According to the technical scheme, the first image matrix A can be obtainedm×pAnd a second image matrix Bp×nAnd determining a first image matrix Am×pCorresponding first feature map, and second image matrix Bp×nAnd performing convolution multiplication on the first characteristic diagram and the X convolution kernels to obtain a characteristic diagram corresponding to a third image matrix, wherein f is an integer larger than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.
Secondly, the embodiment of the application provides a plurality of modes for determining the first characteristic diagram and a plurality of modes for determining the convolution kernel, and the flexibility of the scheme is improved.
Corresponding to the embodiment of the application function implementation method, the application also provides an image data processing device, electronic equipment and a corresponding embodiment.
Fig. 8 is a schematic structural diagram of an image data processing apparatus according to an embodiment of the present application.
Referring to fig. 8, an image data processing apparatus 800 in the present embodiment includes:
an acquisition module 801 for acquiring a first image matrix am×pAnd a second image matrix Bp×n
A first determining module 802 for determining a first image matrix Am×pA corresponding first characteristic diagram;
a second determining module 803 for determining a second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1 and X is an integer greater than or equal to 1, each convolution kernel containing a second image matrix Bp×nOne or more columns of elements;
a convolution module 804 for transforming the first featureConvolving the image with X convolution kernels to obtain a second characteristic image, wherein the second characteristic image corresponds to a third image matrix, and the third image matrix is a first image matrix Am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result.
According to the technical scheme of the application, the obtaining module 801 can obtain the first image matrix Am×pAnd a second image matrix Bp×nThe first determination module 802 may determine the first image matrix Am×pThe corresponding first feature map, the second determining module 803 may determine the second image matrix Bp×nCorresponding X convolution kernels with the size of f × f × p, and then the convolution module 804 may perform convolution multiplication on the first feature map and the X convolution kernels to obtain a feature map corresponding to a third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.
For easy understanding, the image data processing apparatus of the present application will be described in detail below, and referring to fig. 9, the image data processing apparatus 900 of the present embodiment includes:
an obtaining module 901 for obtaining a first image matrix am×pAnd a second image matrix Bp×n
A first determining module 902 for determining a first image matrix Am×pA corresponding first characteristic diagram;
a second determining module 903 for determining a second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1 and X is an integer greater than or equal to 1, each convolution kernel containing a second image matrix Bp×nOne or more columns of elements;
a convolution module 904, configured to convolve the first feature map with the X convolution kernels to obtain a second feature map, where the second feature map corresponds to a third image matrix, and the third feature map is a third feature mapThe image matrix is a first image matrix Am×pAnd a second image matrix Bp×nPerforming matrix multiplication to obtain a result;
wherein the second determining module 903 comprises:
a first determining unit 9031, configured to determine values of n groups of target elements in the X convolution kernels, where values of an ith group of target elements in the n groups of target elements correspond to values of an ith column of elements in the second image matrix, i is an integer from 1 to n, and each convolution kernel includes one or more groups of target elements in the n groups of target elements;
a second determining unit 9032, configured to determine that values of elements other than the n groups of target elements in the X convolution kernels are 0;
optionally, each convolution kernel contains a set of target elements Z of the n sets of target elements1,Z2,...,ZpFor each convolution kernel, the target element Z contained in the convolution kernel1,Z2,...,ZpThe jth row and the kth column of the 1 st, 2.,. th and p channels of the convolution kernel respectively;
the first determining module 902 includes:
a third determination unit 9021 for determining the first image matrix am×pM data modules corresponding to the m row vectors, wherein the length of the m data modules is 1, the width of the m data modules is 1, and the depth of the m data modules is p;
the first splicing unit 9022 is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, where the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is equal to m:
a first zero padding unit 9023, configured to insert a zero data block into an upper edge, a lower edge, a left edge, and a right edge of the third feature map, respectively, to obtain a first feature map, where the length of the zero data block is 1, the width of the zero data block is 1, the depth of the zero data block is p, and values of all elements in the zero data block are zero.
Optionally, each convolution kernel includes u groups of target elements in n groups of target elements, u being an integer greater than 1 and less than or equal to f, and for each convolution kernel, p target elements in the v-th group of target elements included in the convolution kernel are respectivelyJ-th channel located in 1, 2, p-th channel of the convolution kernelvLine kthvColumns, where v is an integer from 1 to u;
the first determining module 902 includes:
a fourth determination unit 9024 for determining the first image matrix am×pM data modules corresponding to the m row vectors, wherein the length of the m data modules is 1, the width of the m data modules is 1, and the depth of the m data modules is p;
the second splicing unit 9025 is configured to splice the m data modules according to a preset splicing rule to obtain a third feature map, where the third feature map has a length a, a width b, and a depth p, and a × b is equal to m;
a second zero padding unit 9026, configured to insert a zero data module into a target position of the third feature map to obtain the first feature map, where the target position includes an internal position and an edge position, the target position is related to positions of multiple groups of target elements in a convolution kernel, the zero data module has a length of 1, a width of 1, a depth of n, and values of all elements in the zero data module are zero.
According to the technical scheme of the present application, the obtaining module 901 may obtain the first image matrix am×pAnd a second image matrix Bp×nThe first determination module 902 may determine a first image matrix Am×pThe second determining module 903 may determine the second image matrix B according to the first feature mapp×nCorresponding X convolution kernels with the size of f × f × p, and then the convolution module 904 may perform convolution multiplication on the first feature map and the X convolution kernels to obtain a feature map corresponding to a third image matrix, where f is an integer greater than 1, and the third image matrix is obtained by performing matrix multiplication on the first image matrix and the second image matrix. That is to say, in this embodiment, convolution operation may be used to replace matrix multiplication operation to obtain a corresponding characteristic diagram, so that the problem that the transform module cannot be deployed on a chip that only supports convolution calculation is solved, and the flexibility of the transform module is improved.
Secondly, the embodiment of the application provides a mode that the first characteristic diagram and the convolution kernel can be determined in multiple modes, and the flexibility of the scheme is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 10 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
Referring to fig. 10, the electronic device 1000 includes a memory 1010 and a processor 1020.
The Processor 1020 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (e.g., DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 1010 has stored thereon executable code that, when processed by the processor 1020, may cause the processor 1020 to perform some or all of the methods described above.
Furthermore, the method according to the present application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps of the above-described method of the present application.
Alternatively, the present application may also be embodied as a computer-readable storage medium (or non-transitory machine-readable storage medium or machine-readable storage medium) having executable code (or a computer program or computer instruction code) stored thereon, which, when executed by a processor of an electronic device (or server, etc.), causes the processor to perform part or all of the various steps of the above-described method according to the present application.
Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. An image data processing method characterized by comprising:
obtaining a first image matrixAm×pAnd a second image matrix Bp×n
Determining the first image matrix Am×pA corresponding first characteristic diagram;
determining the second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix Bp×nOne or more columns of elements;
convolving the first feature map with the X convolution kernels to obtain a second feature map, wherein the second feature map corresponds to a third image matrix, and the third image matrix is a first image matrix Am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result.
2. The method according to claim 1, wherein said determining said second image matrix Bp×nThe corresponding X convolution kernels include:
determining values of n groups of target elements in the X convolution kernels, wherein the value of the ith group of target elements in the n groups of target elements corresponds to the value of the ith column of elements in the second image matrix, i is an integer from 1 to n, and each convolution kernel comprises one or more groups of target elements in the n groups of target elements;
determining values of elements other than the n sets of target elements in the X convolution kernels to be 0.
3. The image data processing method of claim 2, wherein each convolution kernel contains a set of target elements Z of the n sets of target elements1,Z2,...,ZpFor each convolution kernel, the target element Z contained in the convolution kernel1,Z2,...,ZpThe jth row and the kth column of the 1 st, 2.,. th and p channels of the convolution kernel respectively;
said determining said first image matrix Am×pThe corresponding first characteristic diagram comprises:
determining the first image matrix Am×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;
splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;
and respectively inserting zero data modules into the upper edge, the lower edge, the left edge and the right edge of the third feature diagram to obtain a first feature diagram, wherein the length of each zero data module is 1, the width of each zero data module is 1, the depth of each zero data module is p, and the values of all elements in each zero data module are zero.
4. The image data processing method according to claim 2, wherein each convolution kernel includes u groups of target elements from the n groups of target elements, u being an integer greater than 1 and less than or equal to f, and for each convolution kernel, p target elements from the v groups of target elements included in the convolution kernel are respectively located at 1 st, 2 th and j th channels of the convolution kernelvLine kthvColumns, where v is an integer from 1 to u;
said determining said first image matrix Am×pThe corresponding first characteristic diagram comprises:
determining the first image matrix Am×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;
splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;
inserting a zero data module into a target position of the third feature map to obtain a first feature map, wherein the target position comprises an internal position and an edge position, the target position is related to corresponding positions of the multiple groups of target elements in the convolution kernel, the length of the zero data module is 1, the width of the zero data module is 1, the depth of the zero data module is n, and values of all elements in the zero data module are zero.
5. An image data apparatus, comprising:
an acquisition module for acquiring a first image matrix Am×pAnd a second image matrix Bp×n
A first determination module for determining the first image matrix Am×pA corresponding first characteristic diagram;
a second determination module for determining the second image matrix Bp×nA corresponding number X of convolution kernels, each convolution kernel having a size of f X p, where f is an integer greater than 1, X is an integer greater than or equal to 1, and each convolution kernel contains the second image matrix Bp×nOne or more columns of elements;
a convolution module, configured to convolve the first feature map with the X convolution kernels to obtain a second feature map, where the second feature map corresponds to a third image matrix, and the third image matrix is a first image matrix am×pAnd a second image matrix Bp×nAnd (5) carrying out matrix multiplication to obtain a result.
6. The image data processing apparatus according to claim 5, wherein the second determination module includes:
a first determining unit, configured to determine values of n groups of target elements in the X convolution kernels, where a value of an ith group of target elements in the n groups of target elements corresponds to a value of an ith column of elements in a second image matrix, i is an integer from 1 to n, and each convolution kernel includes one or more groups of target elements in the n groups of target elements;
a second determining unit configured to determine that values of elements other than the n groups of target elements in the X convolution kernels are 0.
7. The image data processing apparatus of claim 6, wherein each convolution kernel contains a set of target elements Z of the n sets of target elements1,Z2,...,ZpFor each convolution kernel, the convolution kernel containsTarget element Z1,Z2,...,ZpThe jth row and the kth column of the 1 st, 2.,. th and p channels of the convolution kernel respectively;
the first determining module includes:
a third determination unit for determining the first image matrix Am×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;
the first splicing unit is used for splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;
and the first zero padding unit is used for respectively inserting zero data modules into the upper edge, the lower edge, the left edge and the right edge of the third feature diagram to obtain a first feature diagram, the length of each zero data module is 1, the width of each zero data module is 1, the depth of each zero data module is p, and the values of all elements in each zero data module are zero.
8. The image data processing apparatus according to claim 6, wherein each convolution kernel includes u groups of target elements out of the n groups of target elements, u being an integer greater than 1 and less than or equal to f, and for each convolution kernel, p target elements out of the v groups of target elements included in the convolution kernel are respectively located at 1 st, 2 th and j th channels of the convolution kernelvLine kthvColumns, where v is an integer from 1 to u;
the first determining module includes:
a fourth determination unit for determining the first image matrix Am×pThe length of the data module is 1, the width of the data module is 1, and the depth of the data module is p;
the second splicing unit is used for splicing the m data modules according to a preset splicing rule to obtain a third feature map, wherein the length of the third feature map is a, the width of the third feature map is b, the depth of the third feature map is p, and a × b is m;
a second zero padding unit, configured to insert a zero data module into a target position of the third feature map to obtain a first feature map, where the target position includes an internal position and an edge position, the target position is related to positions of the multiple groups of target elements in the convolution kernel, the zero data module has a length of 1, a width of 1, and a depth of n, and values of all elements in the zero data module are zero.
9. An electronic device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-4.
10. A computer-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-4.
CN202111477627.2A 2021-12-06 2021-12-06 Image data processing method and device Pending CN114283314A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111477627.2A CN114283314A (en) 2021-12-06 2021-12-06 Image data processing method and device
PCT/CN2022/122544 WO2023103551A1 (en) 2021-12-06 2022-09-29 Image data processing method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111477627.2A CN114283314A (en) 2021-12-06 2021-12-06 Image data processing method and device

Publications (1)

Publication Number Publication Date
CN114283314A true CN114283314A (en) 2022-04-05

Family

ID=80871122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111477627.2A Pending CN114283314A (en) 2021-12-06 2021-12-06 Image data processing method and device

Country Status (2)

Country Link
CN (1) CN114283314A (en)
WO (1) WO2023103551A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103551A1 (en) * 2021-12-06 2023-06-15 广州小鹏自动驾驶科技有限公司 Image data processing method and apparatus, device, and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020678A (en) * 2019-03-25 2019-07-16 联想(北京)有限公司 A kind of data processing method, electronic equipment and computer storage medium
CN110580324B (en) * 2019-07-23 2020-11-17 珠海格力电器股份有限公司 Image matrix operation method and device, computer equipment and storage medium
CN111932437B (en) * 2020-10-10 2021-03-05 深圳云天励飞技术股份有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114283314A (en) * 2021-12-06 2022-04-05 广州小鹏自动驾驶科技有限公司 Image data processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103551A1 (en) * 2021-12-06 2023-06-15 广州小鹏自动驾驶科技有限公司 Image data processing method and apparatus, device, and storage medium

Also Published As

Publication number Publication date
WO2023103551A1 (en) 2023-06-15

Similar Documents

Publication Publication Date Title
Nagy et al. Restoring images degraded by spatially variant blur
JP6431245B1 (en) Edge recognition bidirectional image processing
KR20190051697A (en) Method and apparatus for performing devonvolution operation in neural network
CN110021047A (en) Image processing method, image processing apparatus and storage medium
US10417749B2 (en) Method and system for edge denoising of a digital image
US20120281872A1 (en) Detecting an interest point in an image using edges
CN112613575B (en) Data set expansion method, training method and device of image classification model
US11645734B2 (en) Circuitry for image demosaicing and contrast enhancement and image-processing method
CN110223222A (en) Image split-joint method, image splicing device and computer readable storage medium
CN103390275B (en) The method of dynamical image joining
CN106504196A (en) A kind of panoramic video joining method and equipment based on space sphere
CN114283314A (en) Image data processing method and device
US20140140608A1 (en) Image processing apparatus and method for color-depth demosaicing
CN114170582A (en) Guideboard angular point identification method, device, equipment and storage medium
CN112001451A (en) Data redundancy processing method, system, medium and device
CN117252884B (en) Tea bud and leaf target segmentation method based on self-attention mechanism
Hallek et al. Real-time stereo matching on CUDA using Fourier descriptors and dynamic programming
CN114241446A (en) Method, device and equipment for marking corner points of guideboard and storage medium
US7499599B2 (en) Method of real-time correction of non-functioning pixels in digital radiography
US20110170774A1 (en) Image manipulating system and method
CN113393368A (en) Image processing method, medium, and electronic device based on neural network model
US20240185570A1 (en) Undecimated image processing method and device
WO2018171899A1 (en) Neural network data processing apparatus and method
US20210201132A1 (en) Neural network method and apparatus
CN117830596A (en) Image preprocessing method, device, electronic equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination