CN112580675A - Image processing method and device, and computer readable storage medium - Google Patents

Image processing method and device, and computer readable storage medium Download PDF

Info

Publication number
CN112580675A
CN112580675A CN201910935478.6A CN201910935478A CN112580675A CN 112580675 A CN112580675 A CN 112580675A CN 201910935478 A CN201910935478 A CN 201910935478A CN 112580675 A CN112580675 A CN 112580675A
Authority
CN
China
Prior art keywords
convolution
feature map
coordinates
deformable
image processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910935478.6A
Other languages
Chinese (zh)
Inventor
陆维娜
谭洪贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910935478.6A priority Critical patent/CN112580675A/en
Publication of CN112580675A publication Critical patent/CN112580675A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks

Abstract

An image processing method and apparatus, and a computer-readable storage medium are disclosed. In an embodiment of the present disclosure, an image processing apparatus may include a distortion transformation module, a convolution calculation module, a memory and a controller, the memory may be configured to store an offset parameter and a weight parameter of a deformable convolution, and the controller may be configured to control the distortion transformation module to perform a distortion transformation process on an input image using the offset parameter of the deformable convolution, and control the convolution calculation module to perform a convolution operation on an output result of the distortion transformation module using the weight parameter, so as to complete the deformable convolution operation of the input image. The above embodiment of the present disclosure can efficiently and flexibly complete the deformable convolution operation, and not only has a better processing effect on the image with the spatial deformation, but also has a higher processing efficiency.

Description

Image processing method and device, and computer readable storage medium
Technical Field
The present disclosure relates to the field of neural network technologies, and in particular, to an image processing method and apparatus, and a computer-readable storage medium.
Background
Compared with the traditional Convolution, the Deformable Convolution (DCN) effectively improves the capability of geometric transformation modeling, has good effect on image recognition with space deformation, and is commonly used in scenes such as face detection, automatic driving and the like. However, the image processing based on the deformable convolution has high complexity and large calculation amount of a neural network model, and the processing efficiency is very low when the deformable convolution is realized by adopting a general hardware architecture, so that the requirements of practical application cannot be met.
Disclosure of Invention
In order to solve the above technical problems, it is desirable to provide an image processing method and apparatus, and a computer storage medium, which can flexibly and efficiently complete a deformable convolution operation, and which not only has a good effect of processing an image having spatial deformation, but also has a high processing efficiency.
According to an aspect of the present disclosure, there is provided an image processing apparatus including:
a warp transform module;
a convolution calculation module;
a memory configured to store offset parameters and weight parameters of the deformable convolution;
a controller configured to control the warping transformation module to perform warping transformation processing on an input image using the offset parameter of the deformable convolution, and control the convolution calculation module to perform convolution operation on the output result of the warping transformation module using the weight parameter to complete deformable convolution operation of the input image.
According to an aspect of the present disclosure, there is provided an image processing method including: correcting the input image according to the offset parameter of the deformable convolution to obtain a first feature map; performing a convolution operation on the first feature map using the weight parameters of the deformable convolution to obtain a second feature map; and storing the second feature map.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the image processing method described above.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 is a basic framework diagram of a deformable convolution.
FIG. 2 is a schematic diagram of a standard convolution operation process.
Fig. 3 is a schematic diagram of a process of deformable convolution operation according to an exemplary embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a deformable convolution operation process according to another exemplary embodiment of the present disclosure.
Fig. 5 is an exemplary block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure.
Fig. 6 is an exemplary block diagram of a computing engine in an image processing apparatus according to an exemplary embodiment of the present disclosure.
Fig. 7 is an exemplary block diagram of a warp transform module provided in an exemplary embodiment of the present disclosure.
Fig. 8 is an exemplary block diagram of a warp transform unit provided in an exemplary embodiment of the present disclosure.
Fig. 9 is an exemplary circuit configuration diagram of a warp transform unit according to an exemplary embodiment of the present disclosure.
Fig. 10 is a flowchart illustrating an image processing method according to an exemplary embodiment of the disclosure.
Fig. 11 is a flowchart illustrating an image processing method according to another exemplary embodiment of the present disclosure.
Fig. 12 is a schematic diagram of a process for performing a deformable convolution according to an exemplary embodiment of the present disclosure.
Fig. 13 is a schematic diagram of a process for performing a deformable convolution according to another exemplary embodiment of the present disclosure.
Fig. 14 is a schematic diagram of a warping transformation processing procedure in step S110 in the image processing method according to an exemplary embodiment of the present disclosure.
Fig. 15 is a schematic diagram of an inverse process of the warping conversion processing in step S110 in the image processing method according to an exemplary embodiment of the present disclosure.
Fig. 16 is a schematic diagram of the rearrangement of the weight parameters in step S120 in the image processing method according to an exemplary embodiment of the disclosure.
Fig. 17 is a schematic diagram of data rearrangement in the first feature map in step S120 in the image processing method according to an exemplary embodiment of the disclosure.
Fig. 18 is a schematic diagram of modulation parameter rearrangement in step S120 in the image processing method according to an exemplary embodiment of the present disclosure.
Fig. 19 is a schematic diagram of processing procedures from step S121 to step 122 in an image processing method provided in an exemplary embodiment of the present disclosure.
Fig. 20 is a schematic diagram of processing procedures from step S121 to step 122 in an image processing method according to another exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
Summary of the application
As mentioned above, compared with the traditional convolution, the deformable convolution effectively improves the capability of geometric transformation modeling and achieves a good effect on an image recognition task with space deformation. However, the action region of the convolution operation has uncertainty due to the introduction of a two-dimensional (2D) offset in the deformable convolution, and the input data coordinate is a floating point number, and the numerical value of the input data coordinate must be obtained through some pre-calculation, so that the input data is difficult to multiplex, and the input feature map cannot be split, which greatly reduces the parallel capability of hardware in performing convolution operation, thereby resulting in very low execution efficiency and high power consumption of the deformable convolution.
In view of the above technical problem, a basic idea of an embodiment of the present disclosure includes providing an image processing apparatus, which may include: the controller can be configured to control the distortion transformation module to perform distortion transformation processing on the input image by using the offset parameter of the deformable convolution, and control the convolution calculation module to perform convolution operation on an output result of the distortion transformation module by using the weight parameter so as to complete deformable convolution operation of the input image. According to the embodiment of the disclosure, by designing the general convolution module and the distortion transformation module with higher parallelism in hardware, not only the performance and efficiency of calculation are improved, but also multiplexing of input data, splitting processing of input feature maps and parallelism in channel dimensions can be supported.
The basic idea of the disclosed embodiment further includes providing an image processing method, which first corrects an input image by using a shift parameter of a deformable convolution to obtain a first feature map, and then performs a convolution operation on the first feature map by using a weight parameter of the deformable convolution to obtain a second feature map, which is a result of the deformable convolution operation. According to the embodiment of the disclosure, by splitting the deformable convolution into the distortion transformation and the common convolution, the input data can be reused when the convolution operation is executed, the input characteristic diagram can also be split, and the parallel operation can be performed on the channel dimension, so that the calculation flexibility is improved.
The disclosed embodiments may be applied to any applicable application scenario. In at least some embodiments, the disclosed embodiments may be applicable to scenes that require recognition of objects with certain geometric deformations (e.g., faces, pedestrians, vehicles, text, animals). For example, the embodiments of the present disclosure are particularly suitable for situations where image distortion or object deformation may occur, for example, in an automatic driving application, an image captured by a camera device of a vehicle such as a vehicle is easy to distort and deform during driving, and meanwhile, the accuracy requirement of the automatic driving application on an image processing result (e.g., obstacle detection and positioning) is higher.
Exemplary application scenarios
Figure 1 shows the basic framework of deformable convolution. In the example of fig. 1, white-filled square boxes in the input feature map represent sample points of normal convolution, and gray-framed square boxes in the input feature map represent sample points of deformable convolution. In the example of fig. 1, the process of deformable convolution is divided into two paths, where one path learns the offset vectors, resulting in offset parameters of size (H × N) × (W × N), each small square in the offset parameters representing an offset vector, and the other path performs normal convolution operations. Where N ═ R | is the number of pixels in the field R, and 2N may represent that the pixels in the field R are offset in both directions. With the offset vectors, the convolution window is no longer a regular sliding window (e.g., the area where the white filled box in fig. 1 is located), but a window after translation (e.g., the area where the gray border in fig. 1 is located).
The deformable convolution can be directly obtained through target task learning without any additional supervision signal. The method can conveniently replace a plurality of standard convolution units in any convolution neural network with the existing visual recognition task, and performs end-to-end training through standard back propagation, is a simple and profound structural innovation for the traditional convolution network, and has important academic and practical significance. The method is suitable for all tasks with certain geometric deformation of the target to be recognized (almost all important visual recognition tasks have the characteristics of human faces, pedestrians, vehicles, characters, animals and the like), can be directly expanded from the existing network structure, and does not need to be retrained again. The method obviously improves the identification precision by increasing the complexity and the calculated amount of the model.
In terms of implementation, there are mainly two kinds of deformable convolutions including a first kind of deformable convolution and a second kind of deformable convolution as follows.
The first deformable convolution changes the fixed position sampling pattern in the normal convolution operation into a non-fixed position sampling pattern by adding a learnable two-dimensional (2D) bias parameter to sample on the action region of the convolution operation.
The first deformable convolution principle is shown in equation (1) below, where y (p)0) Representing a pixel p in the corresponding input feature map in the output feature map0Pixel value of the pixel point of (d), w (p)n) Representing the weight parameter of the convolution operation, R representing the pixel p in the corresponding input feature map0N represents the number of pixels in R, p represents the field of view (which may also be considered as a set of sample points with pixel p0 as the starting sample point), and0+pndenotes the nth pixel in R, Δ pnRepresenting an offset vector.
Figure BDA0002221468650000051
Taking the convolution with 3 x 3 as an example, assume pixel p0Is (0,0), the receptive field R can be expressed as R { (-1, -1), (-1,0),. -, (0,1), (1,1) }.
The second type of deformable convolution adds a modulation parameter (mask) to the first type of deformable convolution to adjust the amplitude of each sample location. The second operation principle of the deformable convolution is shown in the following formula (2), wherein y (p)0) Representing the pixel value, w, of a pixel in the output feature map (output feature map) corresponding to pixel p in the input feature mapkDenotes the kth weight parameter, K denotes the number of pixels in the field of view of the pixel p in the input feature map, p + pkRepresenting the kth pixel in the field of view of pixel p in the input feature map,
Figure BDA0002221468650000053
denotes an offset vector,. DELTA.mkRepresenting the corresponding pixel p + p in the modulation parameterkThe data of (1).
Figure BDA0002221468650000052
Fig. 2 shows the process of the standard convolution operation, fig. 3 shows the process of the first deformable convolution operation, and fig. 4 shows the process of the second deformable convolution operation. Comparing fig. 2 with fig. 3 and fig. 4, it can be seen that the standard convolution is a regular lattice sampling, while the deformable convolution is a dispersive sampling. Because the standard convolution is regular lattice sampling, the neural network using the standard convolution is difficult to adapt to the geometric deformation of the image, and the processing effect on the image with the geometric deformation is poor.
In some examples of the disclosure, the parameters of the deformable convolution may include a weight parameter and an offset parameter. The weight parameters are used for executing standard convolution operation, the offset parameters are used for determining the positions of sampling points (namely the positions of pixel points participating in the convolution operation in an input image), and offset vectors can be added at the positions of each sampling point by utilizing the offset parameters, so that a convolution kernel can be randomly sampled near the current position and is not limited to regular lattice points of the standard convolution, a neural network containing deformable convolution can better adapt to the geometric deformation of the image, and the image processing efficiency with the geometric deformation is better.
In other examples of the disclosure, the parameters of the deformable convolution may also include a modulation parameter (mask) that may be used to adjust the input feature amplitude for each sampling location. For some unwanted samples, the value of the element in the modulation parameter corresponding to the sample may be learned to 0 or other lower value, while for some samples related to key features of the image, the value of the element in the modulation parameter corresponding to the sample may be learned to a higher value, so that the deformable convolution can be better adapted to various deformations of the image by adding the modulation parameter.
The deformable convolved input image and the output feature map may be represented by tensor data, which may have four dimensions: number, channel, height and width. Accordingly, the weight parameters of the deformable convolution may include a number of convolution kernels, each convolution kernel being represented by tensor data having three dimensions: channel, height and width. In order to realize convolution operation, the number of convolution kernels in the weight parameter is equal to the number of channels of the output characteristic diagram, the specification of each convolution kernel is the same, the number of channels of a single convolution kernel is equal to the number of channels of the input image, and the width and the length of the single convolution kernel can be preset. For the deformable convolution of the disclosed embodiment, the width and length of the individual convolution kernels in their weighting parameters may be equal. In some examples, a single convolution kernel may be 1 wide and 1 long.
In addition, the weight parameter may further include a sliding step, where the sliding step represents a sliding distance on the input image each time the convolution operation samples. The step size of the sliding may be a two-dimensional vector including a component in a width direction and a component in a height direction. For the deformable convolution of the disclosed embodiment, the component of the sliding step in the width direction and the component in the height direction may both be taken as 1, or as a value equal to the size in the corresponding dimension direction of the convolution kernel, respectively. For example, when the height of the convolution kernel is 3 and the width is 3, the sliding step length thereof may be (1,1) (i.e. the component in the width direction and the component in the height direction both take 1) or (3,3) (i.e. the component in the width direction and the component in the height direction both take 3), which may avoid data overlapping, thereby ensuring that the result of the deformable convolution operation (i.e. the following second characteristic diagram) obtained by applying the technical solution of the present disclosure is more accurate.
The offset parameters of the deformable convolution may be represented by a two-dimensional matrix, which may have two dimensions: height and width, the product of which is equal to the product of the width dimension, height dimension, width dimension of the convolution kernel, height dimension of the convolution kernel, and width dimension of the output feature map of the deformable convolution (e.g., the second feature map below). Taking fig. 2 and 3 as an example, assuming that the width of the output feature map is w, the height is h, and the height of the convolution kernel is k and the width is k, the offset parameter may be a two-dimensional matrix with the size of (h × k) × (w × k), the two-dimensional matrix including h × w sets of offset vectors, and each set including k × k offset vectors (offsets), which are one-to-one corresponding to k × k sampling points on a certain channel when a convolution operation is performed using a convolution kernel with the size of k × C. Each offset vector may include a height component for determining a height dimensional coordinate of the sampling point and a width component for determining a width dimensional coordinate of the sampling point, and a value of each component in the offset vector is a real number, may be obtained through learning, and is generally taken as a high-precision floating point number. For example, the coordinates of the sampling point on a certain channel are represented as (x, y), x represents the width dimension coordinates, and y represents the height dimension coordinates, and then the corresponding offset vector may be represented as (dx, dy), dx represents the width component, and dy represents the height component.
The modulation parameter (mask) of the deformable convolution can be represented by a two-dimensional matrix, each element value in the modulation parameter can be obtained by learning, and the value of each element value is in the range of [0,1 ]. For example, taking fig. 3 as an example, when the weight parameter includes N convolution kernels and the size of each convolution kernel is k × C (height is k, width is k, and the number of channels is C), the size of the two-dimensional matrix representing the modulation parameter is h × w × k, and the value of each element in the two-dimensional matrix is in the range of [0,1 ].
For example, assuming that the convolution kernel size in the weight parameter is 3 × 3 (i.e., the height size is 3 and the width size is 3), the size of each set of offset vectors in the offset parameter is 3 × 3 (i.e., the height size is 3 and the width size is 3), the number of all offset vectors in the offset parameter is equal to the number of pixels in the output feature map (i.e., the second feature map), the size of the modulation parameter may be h × 3, h is the height size of the second feature map, and w is the width size of the second feature map.
Exemplary devices
Fig. 5 is a schematic block diagram of an exemplary apparatus 50 for image processing in embodiments of the present disclosure.
As shown in fig. 5, an exemplary apparatus 50 for image processing according to an embodiment of the present disclosure may include: a warp transform module 51, a convolution calculation module 52, a memory 53 and a controller 54. The warp transform module 51 is connected to the controller 54 and the memory 53, and the convolution calculation module 52 is connected to the controller 54 and the memory 53.
In at least some embodiments, the controller 54 may be configured to control the warping transformation module 51 to perform a warping transformation process on the input image using the offset parameter of the deformable convolution, and control the convolution calculation module 52 to perform a convolution operation on the output result of the warping transformation module 51 using the weight parameter to complete the deformable convolution operation of the input image.
In some examples of the disclosure, the controller 54 may include one or more processors or other forms of processing units having neural network computing capabilities and/or instruction execution capabilities, and may control other components in the example apparatus 50 to perform desired functions. The processor may include, but is not limited to, a GPU, a Brain Processing Unit (BPU), a Tensor Processing Unit (TPU), and other processors supporting neural network related computations. In some examples, the control unit 31 may control the warping transformation module 51 and the convolution calculation module 52 to perform the deformable convolution operation on the input image by executing a sequence of instructions (e.g., a sequence of instructions for performing the deformable convolution operation). In some examples, the controller 54 may be configured to decode instructions, schedule other various components (e.g., the warp transform module 51, the convolution calculation module 52, the memory 53, etc.), process interrupts, and the like.
In embodiments of the present disclosure, the memory 53 may be configured to store parameters required to complete the deformable convolution operation. In some examples, memory 53 may store at least offset parameters for the deformable convolution and weight parameters for the convolution layer. In other examples, memory 53 may also store modulation parameters (mask) for the deformable convolution. In addition, the memory 53 may also be configured to buffer input and output data of the respective components (e.g., the warp transform module 51, the convolution calculation module 52). For example, the memory 53 may be further configured to store data of the input image of the warping transformation module 51, an output result (e.g., a first feature map) of the warping transformation module 51, and an operation result (e.g., a second feature map) of the convolution calculation module 52. In addition, the memory 53 may be further configured to store a first matrix obtained by recombining the shift parameters, a second matrix obtained by recombining the modulation parameters, a third feature map obtained by rearranging the first feature map, several 1 × 1 convolution kernels obtained by recombining the weight parameters, and the like.
In the disclosed embodiments, memory 53 may comprise one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium and executed by the controller 54 to control the warp transform module 51, the convolution calculation module 52 to implement the image processing methods described below and/or other desired functions.
In at least some embodiments, the memory 53 may include one or more separate memories or processing units having data storage capabilities and may be accessible to the warp transform module 51 and the convolution calculation module 52 under the control of the controller 54. In some examples, the Memory 53 may be a slow, large capacity Dynamic Access Memory, such as a Dynamic Random Access Memory (DRAM). In one example, the memory 53 may be a Double Data Rate (DDR) DRAM.
In some examples, the memory 53 may be configured to store various types of data (including, but not limited to, data as an input image, data of various feature maps (e.g., a first feature map, a second feature map, below), offset parameters of a deformable convolution, weight parameters of a convolution layer, etc.) in a one-dimensional arrangement. Accordingly, the memory 53 may employ a one-dimensional linear address structure, the address of which may be represented by hexadecimal numbers. In one example, the data may be stored in the memory 53 by mapping each dimension (e.g., number, height, width, channel) of the data to one dimension, such that the address of the data in the memory 53 is uniquely determined by the coordinates of each dimension and the mapping relationship. Of course, the arrangement of the data in the memory 53 is not limited to this, and other various arrangements can be applied to the embodiment of the present application.
In the embodiment of the present application, what convolution operation is performed by the convolution calculation module 52 depends on the type of deformable convolution to be implemented. In some examples, for example, for the first type of deformable convolution described above, the convolution calculation module 52 may be configured to perform a convolution operation on the first feature map output by the warping transformation module 51 using the weight parameters of the deformable convolution to obtain the second feature map. In some examples, for example, for the second type of deformable convolution described above, the convolution calculation module 52 may be configured to perform an element-wise multiplication operation on the modulation parameter and the first profile, and perform a convolution operation using the result of the multiplication operation and the weight parameter to obtain the second profile.
In the embodiment of the present application, the convolution calculating module 52 may be implemented by any hardware supporting convolution operation. In some examples, the convolution calculation module 52 may be a calculation engine for supporting convolution operations. Fig. 6 shows an exemplary structure of the calculation engine 60 and a connection relationship between the calculation engine and the memory 53 and the controller 54. As shown in fig. 6, the calculation engine 60 may include a buffer memory 61 and a calculation unit 62, the buffer memory 61 connecting the memory 53 and the calculation unit 62. In at least one embodiment, the buffer memory 61 may be configured to receive and buffer at least a portion of the tensor data (e.g., the first feature map, below) from the memory 53 for use by the computation unit 62, and may also be configured to receive and buffer an operation result (e.g., at least a portion of the second feature map, below) output by the computation unit 62 and output the operation result (e.g., the second feature map, below) into the memory 53.
In some examples, the buffer Memory 61 may be a fast and small Static Access Memory, such as a Static Random Access Memory (SRAM).
In some examples, the computation unit 62 includes, but is not limited to, an arithmetic circuit that is configurable to perform standard convolution operations. In at least one implementation, the arithmetic circuit may include a multiply-accumulator array, which may be formed by connecting a plurality of multipliers and a plurality of adders, and is configured to perform convolution operations on input data (e.g., data in a first feature diagram and weight parameters of convolution layers below), and may implement multiple convolution kernel sizes, multiple step sizes of standard convolution operations, and/or element-wise multiplication operations and/or addition operations by multiplexing, with a higher degree of parallelism in one or more dimensions of channel, height, and width. It should be understood that the structure of the computing unit 62 is not limited to the above-described implementation. In specific applications, the structure of the computing unit 62 may take various forms, and only the convolution operation needs to be supported.
In some examples, the calculation unit 62 may include a control unit and the above-mentioned arithmetic circuit, and the control unit may control a plurality of multipliers and a plurality of adders in the above-mentioned arithmetic circuit to perform a standard convolution operation and/or a multiplication operation and/or an addition operation at an element level based on an instruction issued by the controller 54. For example, when a standard convolution operation is performed, the controller 54 may convert an instruction of the standard convolution operation into an instruction format that can be performed by the calculation unit 62 through instruction decoding and issue the instruction to the control unit of the calculation unit 62, and the control unit of the calculation unit 62 controls each multiplier and adder in the calculation circuit to perform a multiply-add operation according to the instruction, thereby completing the corresponding standard convolution operation.
In the embodiment of the present disclosure, the warping transformation module 51 may be configured to correct the input image according to the offset parameter of the deformable convolution to obtain the first feature map. In summary, the warping transformation module 51 may implement translation, rotation, warping deformation, etc. of an image with respect to an arbitrary position, and may have a high degree of parallelism in a channel direction. In the embodiment of the present disclosure, the warping transformation module 51 processes the input image with the offset parameter of the deformable convolution as a transformation matrix and outputs the first feature map, which is equivalent to performing the translation, rotation, warping transformation and the like on the input image with respect to the position defined by the offset parameter in advance, and the geometric deformation of the target object in the obtained first feature map is eliminated.
In at least some embodiments, the warping transformation module 51 may comprise at least one warping transformation unit 510, each warping transformation unit 510 being configurable to perform a warping transformation process on data in one or more channel directions in the input image. In this way, the structure of the warping transformation module 51 can be freely defined according to the channel dimension size of the input image and the requirement for parallelism, so that the parallelism of the warping transformation module 51 meets the requirement.
Since the offset parameter of the deformable convolution is a two-dimensional parameter which is the same in the channel direction, when the offset parameter of the deformable convolution is taken as the transformation matrix of the warping transformation module 51, since the transformation matrix is the same in the channel direction and has randomness only in two directions of width and height, in some examples, a plurality of warping transformation units 510 may be provided in the warping transformation module 51, and each warping transformation unit 510 may be configured to perform warping transformation processing on data in one channel direction in the input image, so that parallel warping transformation in a plurality of channel directions may be realized, thereby improving image processing efficiency. Fig. 7 shows the structure of the warp transform module 51 and its connection relationship with the memory 53 in this example manner.
Fig. 8 shows an exemplary structure of a single warp transform unit 510 and its connection relationship with the memory 53. As shown in fig. 8, in at least some embodiments, the warp transform unit 510 may include: a coordinate locating unit 511, a data extraction unit 512 and a bilinear interpolation unit 513. The coordinate positioning unit 511 may be configured to determine sampling point coordinates of the pixel points in the first feature map in the input image according to the position coordinates of the pixel points in the first feature map and the corresponding offset vectors in the offset parameters; the data extraction unit 512 may be configured to obtain a pixel value of a corresponding reference pixel point according to the sampling point coordinates; the bilinear interpolation unit 513 may be configured to perform a bilinear interpolation operation on the pixel values of the reference pixels to obtain the pixel values of the pixels in the first feature map. The coordinates of the sampling point are generally floating point numbers, but may also be fixed point numbers.
In some examples, the reference pixel may be the four pixels that are most adjacent to the sample point. If the sampling point is located at an edge position of the input image (for example, the located row is a first row or a last row of the input image, or and/or the located column is a first column or a last column of the input image), there may be only two nearest neighboring pixel points of the sampling point, that is, two reference pixels in four reference pixel points exist, and other two reference pixel points do not exist, and a pixel value of the two reference pixel points that do not exist at this time may be a default value (for example, the default value may be a value of a boundary point, or may be a preset fixed value, for example, both are set to 0 or one other numerical value). In addition, since the other two reference pixels do not actually exist, the bilinear interpolation operation is performed only by using a default pixel value without reading from a memory. If four most adjacent reference pixel points of the sampling point exist, the coordinates of the four reference pixel points can be determined according to the position of the sampling point, the pixel values of the four reference pixel points can be read according to the coordinates, and then the bilinear interpolation operation is executed.
In some examples of the present disclosure, the offset vectors in the offset parameters may be recombined into a plurality of first matrices in advance, which will be described in detail below with respect to the first matrices. The process of the warp transform unit 510 performing the warp transform may be: the coordinate positioning unit 511 sequentially traverses the first matrix to find an offset vector corresponding to a pixel point in the first feature map, calculates a floating point coordinate of a sampling point in the input image according to a coordinate of the pixel point S1 in the first feature map and the offset vector, then the data extraction unit 512 calculates coordinates of 4 reference pixel points closest to the position of the floating point coordinate and extracts pixel values of the 4 reference pixel points, and finally the bilinear interpolation unit 513 performs bilinear interpolation on the pixel values of the 4 reference pixel points to obtain the pixel values of the pixel points in the first feature map. Therefore, after the distortion transformation is performed on all the pixel points in the first characteristic diagram, the pixel values of all the pixel points in the first characteristic diagram can be obtained, and the first characteristic diagram is obtained. In the above example of the present disclosure, the plurality of warping units 510 may perform the above-mentioned warping process on the pixels on each channel in the first feature map in parallel.
In some examples of the present disclosure, the coordinate locating unit 511 may include: a first adder and a second adder; the first adder can be configured to sum the height dimension coordinates of the pixel points in the first characteristic diagram and the height components in the corresponding offset vectors, and output the height dimension coordinates of the corresponding sampling points; and the second adder can be configured to sum the width dimension coordinates of the pixel points in the first characteristic diagram and the width components in the corresponding offset vectors and output the width dimension coordinates of the corresponding sampling points. In this way, the warping transformation module 51 may locate the floating-point coordinates that meet the requirements of the deformable convolution operation by summing the offset vector and the coordinates, so as to achieve image correction that meets the requirements of the deformable convolution operation. Moreover, the coordinate positioning units 511 in the plurality of warp transformation units 510 in the warp transformation module 51 can position the floating point coordinates of the sampling points on different channels in parallel, and the efficiency is higher.
Fig. 9 shows an exemplary circuit configuration in the warp transform unit 510. Taking fig. 9 as an example, the coordinate locating unit 511 may read the corresponding offset vector (dx, dy) from the memory 53, and perform summation with the coordinates of the corresponding pixel in the first feature map through 2 adders (adder 91 and adder 92), so as to obtain the floating point coordinates of the sampling point in the input image, i.e., (x, dy)src,ysrc)=(xdst+dx,ydst+dy)。
In some examples of the present disclosure, the data extraction unit 512 may include: the device comprises a data positioning unit, an address generating unit (for example, an address generating unit 916 in fig. 9), and a reading unit, wherein the data positioning unit may be configured to determine coordinates of reference pixels according to sampling point coordinates, the address generating unit may be configured to determine data storage addresses of the reference pixels according to the coordinates of the reference pixels, and the reading unit may be configured to obtain pixel values of the reference pixels from the memory 53 according to the data storage addresses of the reference pixels and send the pixel values to the bilinear interpolation unit 513. The data extraction unit 512 with the exemplary structure can locate and acquire the data of the four reference pixel points which are nearest to the floating point coordinates of the sampling point, so that the pixel values of the corresponding pixel points in the first characteristic diagram are determined through the data of the four reference pixel points, and therefore correction meeting the requirement of deformable convolution operation is achieved.
Taking fig. 9 as an example, in some examples of the present disclosure, the data positioning unit may include: a first shifter 93 configured to shift the height dimension coordinates of the sampling points by a predetermined bit to the right, and output first height dimension coordinates; a third adder 95 configured to add 1 to the first height dimension coordinate and output a second height dimension coordinate; a second shifter 94 configured to shift the width dimension coordinates of the sampling points by a predetermined bit to the right, and output the first width dimension coordinates; a fourth adder 96 configured to add 1 to the first width dimension coordinate and output a second width dimension coordinate; and the first height dimension coordinate, the second height dimension coordinate, the first width dimension coordinate and the second width dimension coordinate are the coordinates of the reference pixel point. Through the specific structure, four pixel points near the floating point coordinates can be accurately positioned, and then the image correction meeting the requirement of deformable convolution operation is completed.
In some implementations, the warp transform module 51 and the convolution calculation module 52 may each have an address generation unit in the controller 54, and the process of determining a storage address according to coordinates and reading data may be performed by the controller 54. In other implementations, the address generating unit may be disposed in the warp transform module 51 or disposed independently. Of course, how the address generation unit is specifically configured, the embodiment of the present disclosure is not limited.
Still taking fig. 9 as an example, the data extraction unit 512 reads the floating point coordinate decimal place number frac (i.e. the predetermined bit is shifted to the right as described above) according to the external read-in floating point coordinate decimal place number frac and the floating point coordinate (x) of the sampling point obtained by the coordinate positioning unit 511src,ysrc) Determining integer coordinates (x0, y0), (x0, y1), (x1, y0) and (x1, y1) of 4 pixels (namely reference pixels) nearest to the position of the sampling point in the input image, wherein x0 is xsrc>>frac,x1=x0+1,y0=ysrc>>frac, y1 ═ y0+ 1. Then, according toThe coordinates of the 4 pixels generate data storage addresses, complete reading of corresponding data (i.e., pixel values of the 4 pixels), and transmit the data to the bilinear interpolation unit 513. The floating-point decimal place number of decimal places frac may also be referred to herein as a threshold number of digits in a binary representation of a numerical value after the floating-point coordinate decimal place, which numerical value may be predetermined and stored in the memory 53 or other memory.
In the embodiment of the present disclosure, the bilinear interpolation unit 513 may perform a multiply-add operation on pixel values of four reference pixel points from the data extraction unit 512 by bilinear interpolation, output the pixel values of the pixel points in the first feature map, and store the pixel values in the memory 53. Here, if some reference pixels do not exist, the data extracting unit 512 may read corresponding default values from the memory according to the data storage addresses of the default values configured in advance and send the corresponding default values as the pixel values of the some reference pixels to the bilinear interpolation unit 513.
In the embodiment of the present disclosure, the circuit structure of the bilinear interpolation unit 513 may be various. In some examples of the present disclosure, the circuit structure of the bilinear interpolation unit 513 may be directly implemented according to the following bilinear interpolation operation formula (3).
dst[xsrc,ysrc]=(x1-xsrc)*(y1-ysrc)*src[x0,y0]+
(xsrc-x0)*(y1-ysrc)*src[x1,y0]+
(x1-xsrc)*(ysrc-y0)*src[x0,y1]+
(xsrc-x0)*(ysrc-y0)*src[x1,y1] (3)
In some examples of the present disclosure, the bilinear interpolation calculation performed by the equivalent transformation bilinear interpolation unit 513 is as shown in the following expressions (4) to (6), and the bilinear interpolation calculation shown in the expressions (4) to (6) can be completed only by 3 sets of multiply-add operations, so that the circuit structure of the bilinear interpolation unit 513 can save multiplier resources by implementing the expressions (4) to (6).
tmp0=(x1-xsrc)*src00+(xsrc-x0)*src10=src00+xsrc[frac-1:0]*(src10-src00) (4)
tmp1=(x1-xsrc)*src01+(xsrc-x0)*src11=src01+xsrc[frac-1:0]*(src11-src01) (5)
dst=(y1-ysrc)*tmp0+(ysrc-y0)*tmp1=tmp0+ysrc[frac-1:0]*(tmp1-tmp0) (6)
Where tmp0 denotes a first intermediate value, tmp1 and a second intermediate value, dst denotes a pixel value of a pixel point in the first feature map, xsrc[frac-1:0]Denotes xsrcLow [ frac-1:0 ] when represented in binary]Bit, [ frac-1:0]Is the low frac bits in binary representation. Such as frac-4, xsrc[frac-1:0]Is xsrcLow 4 bits.
In the examples of the corresponding equations (4) to (6) of the present disclosure, as shown in fig. 9, the bilinear interpolation unit 513 may include: the first multiply-add subunit 5131, the second multiply-add subunit 5132, and the third multiply-add subunit 5133, where the first multiply-add subunit 5131 may be configured to calculate a first intermediate value by using values of two reference pixels with the same width dimension coordinate among the reference pixels, the second multiply-add subunit 5132 is configured to calculate a second intermediate value by using values of two reference pixels with the same height dimension coordinate among the reference pixels, and the third multiply-add subunit 5133 may be configured to calculate a pixel value of a pixel in the first feature map by using the first intermediate value and the second intermediate value. Each of the first multiply-add subunit 5131, the second multiply-add subunit 5132, and the third multiply-add subunit 5133 may include a subtractor, a multiplier, and an adder, which are sequentially connected. In this example, the bilinear interpolation unit 513 may perform calculation twice, and perform shift quantization once after each calculation to obtain a result, which can be implemented only by three adders, three subtractors, and three multipliers, and the structure of equation (3) requires 8 multipliers, while this example only uses 3 multipliers, and it uses fewer multipliers to implement bilinear interpolation operation, thereby saving multiplier resources and also being beneficial to reducing chip area.
Still taking fig. 9 as an example, the subtractor 97 of the first multiply-add subunit 5131 can be configured to perform subtraction on pixel values of reference pixels with the same width dimension coordinate to perform subtraction operationTo obtain a first difference value (i.e., src10-src00 in equation (4)), the multiplier 98 of the first multiply-add subunit 5131 may be configured to compare the first difference value with the width dimension coordinate of the sample point by a threshold number of bits xsrc[frac-1:0]The adder 99 of the first multiply-add subunit 5131 may be configured to perform an addition operation on the first product value and the pixel value src00 of the reference pixel located before the high-degree-dimension coordinate in the reference pixel having the same width-dimension coordinate, so as to obtain a first intermediate value tmp 0.
Still taking fig. 9 as an example, the subtractor 910 of the second multiply-add subunit 5132 may be configured to perform a subtraction operation on the pixel values of the reference pixels with the same height dimension coordinate to obtain a second difference value (i.e., src11-src01 in equation (5)), and the multiplier 911 of the second multiply-add subunit 5132 may be configured to perform a subtraction operation on the second difference value and the width dimension coordinate low bit number threshold multiple bit number x of the sample pointsrc[frac-1:0]The adder 912 of the second multiply-add subunit 5132 may be configured to perform an addition operation on the pixel value src01 of a reference pixel point with the same width-dimension coordinate as the height-dimension coordinate, to obtain a second intermediate value tmp 1.
Still taking fig. 9 as an example, the subtractor 913 of the third multiplier-adder subunit 5133 may be configured to perform a subtraction operation on the first intermediate value tmp0 and the second intermediate value tmp1 to obtain a third difference value, and the multiplier 914 of the third multiplier-adder subunit 5133 may be configured to perform a subtraction operation on the third difference value and the height dimension coordinate of the sample point to obtain a low bit number threshold bit number ysrc[frac-1:0]The adder 915 of the third multiply-add subunit 5133 may be configured to perform an addition operation on the third multiply-product value and the first intermediate value tmp0 to obtain the value dst of the pixel point in the first feature map.
The above exemplary structure can save multiplier resources. It is understood that bilinear interpolation unit 513 may also adopt any other applicable circuit structure, and the disclosed embodiment is not limited to the specific circuit structure of bilinear interpolation unit 513.
In some examples of the present disclosure, the warp transform module 51, the convolution calculation module 52, the memory 53, and the controller 54 may be connected by a bus. Of course, the parts of the exemplary apparatus 50 for image processing may be connected by any other suitable connection.
For simplicity, only a portion of the components of the exemplary apparatus 50 for image processing are shown in fig. 5, and components such as a bus and the like are omitted. In addition, the exemplary apparatus 50 for image processing may include any other suitable components, depending on the particular application. For example, input components (e.g., keyboard, mouse, microphone, etc.), output components (e.g., printer, display, speakers, etc.), and so forth.
The exemplary apparatus 50 for image processing in the embodiments of the present disclosure corrects an input image through a distortion transformation module, and then directly performs a general convolution operation on the corrected image by using a convolution calculation module, thereby implementing image processing based on a deformable convolution.
The image processing apparatus according to the embodiment of the present disclosure may correct the input image only through the distortion transformation module 51, may perform a standard convolution operation or other multiply-add operation only through the convolution calculation module 52, and may implement the operation of the deformable convolution through cooperation of the distortion transformation module 51 and the convolution calculation module 52, which may be determined according to the requirements of the specific application scenario.
Exemplary method
Fig. 10 is a flow diagram of an exemplary method 100 of image processing in an embodiment of the present disclosure. As shown in fig. 10, an exemplary method 100 of image processing in embodiments of the present disclosure may include: step S110, correcting the input image according to the offset parameter of the deformable convolution to obtain a first characteristic diagram; step S120, performing convolution operation on the first feature map by using the weight parameter of the deformable convolution to obtain a second feature map; step S130, storing the second feature map. Not only the operation of the first type of deformable convolution described above may be implemented by exemplary method 100, but the operation of the second type of deformable convolution may also be implemented by adding modulation parameters in step S120.
Fig. 11 is a flow diagram of an exemplary method 200 of image processing in an embodiment of the present disclosure. As shown in fig. 11, in the exemplary method 200, in step S120, a convolution operation is performed on the first feature map using the weight parameter and the modulation parameter of the deformable convolution together to obtain a second feature map. As shown in fig. 11, step S120 in exemplary method 200 may include: step S121, performing element level multiplication operation on the modulation parameter of the deformable convolution and the first characteristic diagram; step S122, a convolution operation is performed using the result of the element-level multiplication operation and the weight parameter to obtain a second feature map. The operation of the second deformable convolution described above may be implemented by exemplary method 200.
Both the exemplary method 100 and the exemplary method 200 in the embodiments of the present disclosure may be performed by the above exemplary apparatus 50 of image processing.
According to the above exemplary method 100 and exemplary method 200 of the embodiment of the present disclosure, the operation of deformable convolution is divided into two parts, namely, the calculation of offset parameter and the convolution operation, that is, the warping transformation is performed on the whole input image by using the offset parameter to obtain the first feature map, and then the convolution operation is performed on the whole first feature map by using the weight parameter (or the weight parameter and the modulation parameter) (for example, the standard convolution operation shown in fig. 1 or the combination of the element-level multiplication operation and the standard convolution) to obtain the result of the deformable convolution operation (i.e., the second feature map), so that the input data can be reused when the convolution operation is performed, the feature map can be split, and the convolution operation part and the calculation of the offset parameter can be parallel in the channel dimension, thereby it can be seen that the exemplary method of the embodiment of the present disclosure can be performed quickly, The flexible completion of the computation of the deformable convolution facilitates leveraging the parallel processing capabilities of hardware (e.g., the exemplary apparatus 50 above) to increase the processing speed of the deformable convolution, thereby efficiently implementing the processing of images with spatial deformations at a lower hardware cost.
For example, assuming that the size of the input image is h0 × w0 × C, the weight parameters include N convolution kernels, the size of each convolution kernel is k × C, the size of the output feature (i.e., the second feature) of the deformable convolution is h × w × N, the shift parameters include h × w (k × k) shift vectors, the adjustment parameters may include h × w (k × k) elements, fig. 12 shows a schematic diagram of performing the first type of deformable convolution on the input image by exemplary method 100 of the present disclosure, and fig. 13 shows a schematic diagram of performing the second type of deformable convolution operation on the input image by exemplary method 200 of the present disclosure. It should be noted that fig. 12 and 13 are only examples and are not intended to limit the scope of the embodiments of the present disclosure.
In some examples of the present disclosure, step S110 may include: a1, reorganizing the offset vectors in the offset parameters according to the corresponding positions of the offset vectors in the convolution kernels in the weight parameters to obtain k1 k2 first matrixes with the size h w; step b2, using each first matrix to execute warping transformation processing on the data in the input image to obtain a first feature map; wherein, k2 and k1 are the width dimension and the height dimension of the convolution kernel in the weight parameter, h is the height dimension of the first feature map, and w is the width dimension of the first feature map. In this example, the position of the offset vector in the rearranged offset parameter is the same as the position of the pixel point in the first feature map corresponding to the offset vector, not only the first matrix to which the offset vector belongs corresponds to the channel of the pixel point in the first feature map corresponding to the offset vector in the first feature map, but also the positions (h, w-directional coordinates) of the offset vector in the first matrix are consistent with the coordinates (i.e., h, w-directional coordinates) of the pixel point in the first feature map corresponding to the offset vector in the corresponding channel, so that the warping transformation unit can conveniently and directly find the corresponding first matrix and read the corresponding offset vector according to the position (i.e., the channel dimension to which the pixel point belongs, the coordinates in the corresponding channel dimension), which not only facilitates the parallel of a plurality of warping transformation units, but also facilitates the improvement of the processing speed of a single warping transformation unit, thereby effectively improving the processing efficiency. Here, the specific processing procedure of the warping transformation processing in step b2 may be as described above with reference to the exemplary apparatus, and is not described in detail again.
Still taking the example of an input image with a size h0 w 0C and an output feature map with a size h w N (i.e. the second feature map), let k beFig. 14 is a schematic diagram illustrating the process performed in step S110 in this example. In FIG. 14, MViRepresents the ith offset vector, i ═ 0,1,2 … …, h ×; offj denotes the jth first matrix, j 0,1,2 … …,8 (i.e., k × k-1). In the example of fig. 14, since the convolution window size on each channel is 3 × 3, the convolution window on each channel may cover 9 sampling points, where the 9 sampling points correspond to 9 pixel points on different channels in the first feature map, in this example, the offset parameter is regrouped into 9 first matrices, the width dimension of each first matrix is w, and the height dimension of each first matrix is h, that is, the height dimension of each first matrix is equal to the height dimension of the first feature map, and the width dimension of each first matrix is equal to the width dimension of the first feature map. Fig. 15 shows the reverse process performed by step S110 in this example, the direction indicated by the arrow in fig. 15 is the reverse order of the calculation process performed by step S110, P in fig. 15 represents an element in the first feature map, and in fig. 15, the first feature map is divided into k × k and each first feature map corresponds to one first matrix. In fig. 14 and 15, the shift parameters of the deformable convolution are regarded as the transformation matrices of the warping transformation module 51, and are decomposed into k × k groups according to their positions in the convolution kernel, thereby obtaining a first matrix with the size of k × k groups w × h, and the first feature map with the size of w × h (k × C) can be calculated by sequentially inputting the first matrices with the size of k × k groups w × h into the warping transformation module 51.
In some examples of the present disclosure, step S120 may include: step b1, splitting the convolution kernel of the weight parameter in the deformable convolution into convolution kernels with the size of 1 x 1; and step b2, performing convolution operation on the first feature map by using the split convolution kernel. In this example, by splitting the convolution kernel into convolution kernels having a size of 1 × 1, it is possible to avoid data overlapping during convolution calculation, and ensure that the calculation result of the convolution calculation is consistent with the calculation result of the deformable convolution.
FIG. 16 shows a split example diagram of step b1, in the example of FIG. 16
Figure BDA0002221468650000181
Denotes data in a convolution kernel, where i is 0,1,2,3, … …, N, j is 0,1, ……, k × k, N represents the number of convolution kernels, N being equal to the number of channels of the output signature (i.e. the second signature). The process of performing the convolution operation in step b2 is similar to the standard convolution process shown in fig. 1, and is not repeated here.
In some examples of the disclosure, if the size of the width dimension of the convolution kernel in the weight parameter is k2, the size of the height dimension is k1, and the number of channels is c, in step S120, the data in the first feature map may be rearranged according to the position of the corresponding data of each data in the first feature map in the convolution kernel, so as to obtain a third feature map with the size of the width dimension of k2 × w, the size of the height dimension of k1 × h, and the number of channels being c; then, a convolution operation is performed using the third feature map and the above-described weight parameter. This example can avoid data overlapping in convolution calculation by reorganizing the first feature map, thereby ensuring that the result of step S120 coincides with the operation result of the deformable convolution. Moreover, since the pixel points in the first feature map correspond to the data positions in the convolution kernels corresponding to the pixel points, parallel processing can be performed in the channel direction, and the width dimension direction and the height dimension direction can also be split, so that the parallel processing capability of hardware (for example, the multiplier-accumulator array of the convolution calculation module 52 in the above exemplary apparatus 50) can be fully utilized, the deformable convolution operation can be efficiently completed, and the efficiency of the hardware for executing the deformable convolution operation can be effectively improved.
In the above example where the convolution kernel size is k × k, fig. 17 shows an exemplary data rearrangement diagram of the first feature map, and P in fig. 17 indicates data in the first feature map.
In some examples of the present disclosure, the modulation parameters may be rearranged before the element-level multiplication operation is performed in step S121, so as to fully utilize the parallel operation capability of the hardware to increase the operation speed. In step S121 of this example, the data in the modulation parameters may be first recombined according to the corresponding positions of the data in the modulation parameters in the convolution kernel in the weight parameters to obtain k1 × k2 second matrices with a size h × w, where h is the height size of the first feature map and w is the width size of the first feature map, and then a dot-by-dot multiplication operation is performed on each second matrix and the data in the corresponding channel dimension in the first feature map, and k2 and k1 are the width dimension and the height dimension of the convolution kernel in the weight parameters, respectively. In this example, the recombination process of the modulation parameter is the same as the offset parameter. And after rearrangement, the position of the data in each second matrix is the same as the position of the corresponding pixel point in the first characteristic diagram.
Still taking the example above where the convolution kernel is k × k and k is 3, fig. 18 shows an exemplary process of modulation parameter reorganization, where m in fig. 18 represents an element in the modulation parameter, the reorganization process is the same as the reorganization process of the offset parameter above, which facilitates parallel computation of the second multiply-add arrays in the channel direction, and the position of the data in each second matrix is the same as the position of its corresponding pixel point in the first feature map.
Fig. 19 shows an exemplary execution procedure of steps S121 to S122, where O in fig. 19 denotes data in the second feature map, and N is the number of channels of the second feature map. In the example of fig. 19, the modulation parameters are recombined into k × k second matrices with size h × w, and the convolution kernels with size k × k in the weight parameters are split into 1 × 1, in this example, the element multiplication operation is completed by using the second matrices, and the standard convolution operation is completed by using 1 × 1 convolution kernels, which is not only more efficient and faster, but also facilitates parallel processing on the channels, and can ensure accurate final calculation results.
Fig. 20 shows another exemplary implementation of steps S121 to S122, in fig. 20, the modulation parameters are regrouped into a two-dimensional matrix with size (k × h) × (k × w), data in the two-dimensional matrix is the same as the position of the corresponding pixel point in the first feature map, the size of the convolution kernel in the weight parameters is still k × k, and the sliding step size is also k.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the image processing method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in an image processing method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (18)

1. An image processing apparatus comprising:
a warp transform module;
a convolution calculation module;
a memory configured to store offset parameters and weight parameters of the deformable convolution;
a controller configured to control the warping transformation module to perform warping transformation processing on an input image using the offset parameter of the deformable convolution, and control the convolution calculation module to perform convolution operation on the output result of the warping transformation module using the weight parameter to complete deformable convolution operation of the input image.
2. The image processing apparatus according to claim 1,
the warping transformation module is configured to correct the input image according to the offset parameter of the deformable convolution so as to obtain a first feature map;
the convolution calculation module is configured to perform convolution operation on the first feature map by using the weight parameter of the deformable convolution so as to obtain a second feature map;
the memory is further configured to store the second profile.
3. The image processing apparatus of claim 2, wherein the warping transformation module comprises at least one warping transformation unit comprising:
the coordinate positioning unit is configured to determine sampling point coordinates of the pixel points in the first characteristic diagram in the input image according to the position coordinates of the pixel points in the first characteristic diagram and the corresponding offset vectors in the offset parameters;
the data extraction unit is configured to acquire the pixel value of the corresponding reference pixel point according to the sampling point coordinate;
and the bilinear interpolation unit is configured to execute bilinear interpolation operation on the pixel value of the reference pixel point so as to obtain the pixel value of the pixel point in the first characteristic diagram.
4. The image processing apparatus according to claim 3, wherein the coordinate locating unit includes:
the first adder is configured to sum the height dimension coordinates of the pixel points in the first characteristic diagram and the height components in the offset vectors and output the height dimension coordinates of the sampling points;
and the second adder is configured to sum the width dimension coordinates of the pixel points in the first characteristic diagram and the width components in the offset vectors and output the width dimension coordinates of the sampling points.
5. The image processing apparatus according to claim 3, wherein the data extraction unit includes:
the data positioning unit is configured to determine the coordinates of the reference pixel points according to the coordinates of the sampling points;
the address generating unit is configured to determine a data storage address of the reference pixel point according to the coordinate of the reference pixel point; and
and the reading unit is configured to acquire the pixel value of the reference pixel point from the memory according to the data storage address of the reference pixel point and send the pixel value to the bilinear interpolation unit.
6. The image processing apparatus according to claim 5, wherein the data positioning unit includes:
a first shifter configured to shift the height dimension coordinates of the sampling points by a predetermined bit to the right, and output first height dimension coordinates;
a third adder configured to add 1 to the first height dimension coordinate and output a second height dimension coordinate;
a second shifter configured to shift the width dimension coordinates of the sampling points by a predetermined bit to the right, and output first width dimension coordinates;
a fourth adder configured to add 1 to the first width dimension coordinate and output a second width dimension coordinate;
and the first height dimension coordinate, the second height dimension coordinate, the first width dimension coordinate and the second width dimension coordinate are the coordinates of the reference pixel point.
7. The image processing apparatus according to claim 3, wherein the bilinear interpolation unit includes:
the first multiplier-adder unit is configured to calculate a first intermediate value by using values of two reference pixel points with the same width dimension coordinates in the reference pixel points;
the second multiplier-adder unit is configured to calculate a second intermediate value by using numerical values of two reference pixel points with the same high-degree dimension coordinates in the reference pixel points;
the third multiplying and adding subunit is configured to calculate a pixel value of a pixel point in the first characteristic diagram by using the first intermediate value and the second intermediate value;
each of the first multiply-add subunit, the second multiply-add subunit and the third multiply-add subunit comprises a subtracter, a multiplier and an adder which are sequentially connected.
8. The image processing apparatus according to claim 7,
the subtracter of the first multiply-add subunit is configured to perform subtraction operation on pixel values of reference pixels with the same width dimension coordinates to obtain a first difference value;
a multiplier of the first multiply-add subunit configured to perform a multiplication operation on the first difference value and a width dimension coordinate low bit number threshold number of bit numbers of sample points to obtain a first product value
And the adder of the first multiply-add subunit is configured to perform an add operation on a pixel value of a reference pixel point which is positioned in front of the high-degree dimension coordinate among the reference pixel points with the same first product value and the width dimension coordinate, so as to obtain the first intermediate value.
9. The image processing apparatus according to claim 7,
the subtracter of the second multiply-add subunit is configured to perform subtraction operation on pixel values of reference pixel points with the same height dimension coordinates to obtain a second difference value;
the multiplier of the second multiply-add subunit is configured to perform multiplication operation on the second difference value and the width dimension coordinate low bit number threshold number of bit numbers of the sampling points to obtain a second product value;
and the adder of the second multiply-add subunit is configured to perform an add operation on a pixel value of a reference pixel point with a width dimension coordinate before the reference pixel point with the second product value being the same as the height dimension coordinate, so as to obtain the second intermediate value.
10. The image processing apparatus according to claim 7,
the subtractor of the third multiply-add subunit is configured to perform a subtraction operation on the first intermediate value and the second intermediate value to obtain a third difference value;
the multiplier of the third multiplying and adding subunit is configured to perform multiplication operation on the third difference value and the height dimension coordinate of the sampling point to obtain a third product value;
and the adder of the third multiply-add subunit is configured to perform an addition operation on the third product value and the first intermediate value to obtain a numerical value of a pixel point in the first feature map.
11. The image processing apparatus according to claim 1,
the memory further configured to store modulation parameters of the deformable convolution;
the convolution calculation module is further configured to perform element-level multiplication operation on the modulation parameter and the first feature map, and perform convolution operation using the result of the multiplication operation and the weight parameter to obtain a second feature map.
12. An image processing method comprising:
correcting the input image according to the offset parameter of the deformable convolution to obtain a first feature map;
performing a convolution operation on the first feature map using the weight parameters of the deformable convolution to obtain a second feature map;
and storing the second feature map.
13. The method of claim 12, wherein correcting the input image according to the offset parameters of the deformable convolution comprises:
recombining the offset vectors in the offset parameters according to corresponding positions of the offset vectors in the convolution kernels in the weight parameters to obtain k1 x k2 first matrixes with the size h x w; and
performing a warping transformation process on data in the input image using each of the first matrices to obtain the first feature map;
wherein k2 and k1 are the width dimension and the height dimension of the convolution kernel in the weight parameter, h is the height dimension of the first feature map, and w is the width dimension of the first feature map.
14. The method of claim 12, wherein the performing a convolution operation on the first feature map using weight parameters of a deformable convolution comprises:
splitting the convolution kernel of the weight parameter in the deformable convolution into convolution kernels with the size of 1 x 1; and
performing a convolution operation on the first feature map using the split convolution kernel.
15. The method of claim 12, wherein,
the weight parameters comprise at least one convolution kernel, the width dimension of the convolution kernel is k2, the height dimension of the convolution kernel is k1, and the number of channels is c;
the performing a convolution operation on the first feature map using the weight parameters of the deformable convolution includes:
rearranging the data in the first feature map according to the position of each data in the first feature map in the corresponding data in the convolution kernel to obtain a third feature map with the width dimension of k2 xw, the height dimension of k1 xh and the channel number of c; and
performing a convolution operation using the third feature map and the weight parameter.
16. The method of claim 12, wherein the performing a convolution operation on the first feature map using weight parameters of a deformable convolution comprises:
performing an element-level multiplication operation on the modulation parameters of the deformable convolution and the first feature map; and
performing a convolution operation using a result of the element-level multiplication operation and the weight parameter to obtain a second feature map.
17. The method of claim 16, wherein performing an element-level multiplication operation on the modulation parameters of the deformable convolution and the first profile comprises:
recombining the data in the modulation parameters according to the corresponding positions of the data in the modulation parameters in the convolution kernels in the weight parameters to obtain k1 x k2 second matrixes with the size h x w; and
performing a point-by-point multiplication operation on each second matrix and data on the corresponding channel dimension in the first feature map;
wherein k2 and k1 are the width dimension and the height dimension of the convolution kernel in the weight parameter, h is the height dimension of the first feature map, and w is the width dimension of the first feature map.
18. A computer-readable storage medium storing a computer program for executing the image processing method according to any one of claims 12 to 17.
CN201910935478.6A 2019-09-29 2019-09-29 Image processing method and device, and computer readable storage medium Pending CN112580675A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910935478.6A CN112580675A (en) 2019-09-29 2019-09-29 Image processing method and device, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910935478.6A CN112580675A (en) 2019-09-29 2019-09-29 Image processing method and device, and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112580675A true CN112580675A (en) 2021-03-30

Family

ID=75110756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910935478.6A Pending CN112580675A (en) 2019-09-29 2019-09-29 Image processing method and device, and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112580675A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657587A (en) * 2021-08-17 2021-11-16 上海大学 FPGA-based deformable convolution acceleration method and device
CN114463592A (en) * 2022-04-01 2022-05-10 深圳鲲云信息科技有限公司 Quantitative calculation method and device applied to depthwise convolution
WO2023279739A1 (en) * 2021-07-09 2023-01-12 上海商汤智能科技有限公司 Image processing method and apparatus, and electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1149952A (en) * 1994-06-03 1997-05-14 德国Idt国际数字技术有限公司 Apparatus and method for decoding video images
CN107292319A (en) * 2017-08-04 2017-10-24 广东工业大学 The method and device that a kind of characteristic image based on deformable convolutional layer is extracted
CN108764164A (en) * 2018-05-30 2018-11-06 华中科技大学 A kind of method for detecting human face and system based on deformable convolutional network
CN109034373A (en) * 2018-07-02 2018-12-18 鼎视智慧(北京)科技有限公司 The parallel processor and processing method of convolutional neural networks
CN109862208A (en) * 2019-03-19 2019-06-07 深圳市商汤科技有限公司 Method for processing video frequency, device and computer storage medium
CN109886400A (en) * 2019-02-19 2019-06-14 合肥工业大学 The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel
CN110210571A (en) * 2019-06-10 2019-09-06 腾讯科技(深圳)有限公司 Image-recognizing method, device, computer equipment and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1149952A (en) * 1994-06-03 1997-05-14 德国Idt国际数字技术有限公司 Apparatus and method for decoding video images
CN107292319A (en) * 2017-08-04 2017-10-24 广东工业大学 The method and device that a kind of characteristic image based on deformable convolutional layer is extracted
CN108764164A (en) * 2018-05-30 2018-11-06 华中科技大学 A kind of method for detecting human face and system based on deformable convolutional network
CN109034373A (en) * 2018-07-02 2018-12-18 鼎视智慧(北京)科技有限公司 The parallel processor and processing method of convolutional neural networks
CN109886400A (en) * 2019-02-19 2019-06-14 合肥工业大学 The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel
CN109862208A (en) * 2019-03-19 2019-06-07 深圳市商汤科技有限公司 Method for processing video frequency, device and computer storage medium
CN110210571A (en) * 2019-06-10 2019-09-06 腾讯科技(深圳)有限公司 Image-recognizing method, device, computer equipment and computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023279739A1 (en) * 2021-07-09 2023-01-12 上海商汤智能科技有限公司 Image processing method and apparatus, and electronic device and storage medium
CN113657587A (en) * 2021-08-17 2021-11-16 上海大学 FPGA-based deformable convolution acceleration method and device
CN113657587B (en) * 2021-08-17 2023-09-26 上海大学 Deformable convolution acceleration method and device based on FPGA
CN114463592A (en) * 2022-04-01 2022-05-10 深圳鲲云信息科技有限公司 Quantitative calculation method and device applied to depthwise convolution
CN114463592B (en) * 2022-04-01 2022-07-22 深圳鲲云信息科技有限公司 Quantitative calculation method and device applied to depthwise convolution

Similar Documents

Publication Publication Date Title
CN108205700B (en) Neural network operation device and method
CN112580675A (en) Image processing method and device, and computer readable storage medium
KR101298393B1 (en) Training convolutional neural networks on graphics processing units
US8539201B2 (en) Transposing array data on SIMD multi-core processor architectures
JP2021522565A (en) Neural hardware accelerator for parallel distributed tensor calculations
US10013628B2 (en) Information processing apparatus and information processing method
CN112673383A (en) Data representation of dynamic precision in neural network cores
EP3093757B1 (en) Multi-dimensional sliding window operation for a vector processor
CN110807170B (en) Method for realizing Same convolution vectorization of multi-sample multi-channel convolution neural network
US20220004840A1 (en) Convolutional neural network-based data processing method and device
CN102231788A (en) Method and apparatus for high-speed and low-complexity piecewise geometric transformation of signals
US5621676A (en) Discrete cosine transformation system and inverse discrete cosine transformation system, having simple structure and operable at high speed
CN112703511B (en) Operation accelerator and data processing method
KR20200095300A (en) Method and apparatus for processing convolution operation of neural network
EP3876092B1 (en) Method for executing matrix multiplication, circuit and soc
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
US11481994B2 (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN108629405B (en) Method and device for improving calculation efficiency of convolutional neural network
CN108154522B (en) Target tracking system
CN112889072A (en) System, method and apparatus for reducing power consumption
CN112765540A (en) Data processing method and device and related products
US11636569B1 (en) Matrix transpose hardware acceleration
CN113191935A (en) Reconfigurable hardware acceleration method and system for Gaussian pyramid construction
US10789072B2 (en) Parallel processor for calculating operand and result addresses as a function of a position of a field of action and predetermined positions of processing elements in non-homogeneous address zones within a topology
JP5045652B2 (en) Correlation processing device and medium readable by correlation processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination