WO2019128726A1 - 图像处理方法和装置 - Google Patents

图像处理方法和装置 Download PDF

Info

Publication number
WO2019128726A1
WO2019128726A1 PCT/CN2018/120830 CN2018120830W WO2019128726A1 WO 2019128726 A1 WO2019128726 A1 WO 2019128726A1 CN 2018120830 W CN2018120830 W CN 2018120830W WO 2019128726 A1 WO2019128726 A1 WO 2019128726A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
pixels
feature maps
height
width
Prior art date
Application number
PCT/CN2018/120830
Other languages
English (en)
French (fr)
Inventor
杨帆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019128726A1 publication Critical patent/WO2019128726A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal

Definitions

  • the present application relates to the field of image processing, and more particularly to an image processing method and apparatus in the field of image processing.
  • CNN convolutional neural network
  • the present application provides an image processing method and apparatus capable of performing upsampling or non-integer multiple downsampling of a non-integer multiple of an image.
  • the present application provides an image processing method, the method comprising:
  • the convolutional neural network including a plurality of image processing layers, the plurality of image processing layers including the first An image processing layer, wherein B is an integer greater than one;
  • the second image is subjected to A-time upsampling processing by the second image processing layer of the plurality of image processing layers to obtain a third image, wherein A is an integer greater than 1, and A is not equal to B.
  • the image processing method provided by the embodiment of the present application performs B-time downsampling processing on the acquired first image by using the first image processing layer of the convolutional neural network to obtain a second image, and then passes through the convolutional neural network.
  • the second image processing layer performs A-time upsampling processing on the second image to obtain a third image, which can achieve upsampling or non-integer multiple downsampling of the first image.
  • the downsampling process is performed first, and the upsampling process is performed, the amount of data processed by the convolutional neural network is reduced, thereby reducing the computational complexity of image processing and improving image processing efficiency.
  • the size of the image may include multiple dimensions.
  • the size of the image includes height and width; when the dimension of the image is three-dimensional, the size of the image includes width, height, and depth.
  • a pixel is the most basic element that makes up an image and is a logical unit of measure.
  • the height of an image can be understood as the number of pixels that the image includes in the height direction; the width of the image can be understood as the number of pixels that the image includes in the width direction; the depth of the image can be understood as the channel of the image. Quantity.
  • the depth of an image can be understood as the number of feature maps included in the image, wherein the width and height of any one of the features of the image are related to the other of the image.
  • the width and height of the feature map are the same. That is to say, one image is a three-dimensional image, and it can be understood that the three-dimensional image is composed of a plurality of two-dimensional feature images, and the plurality of two-dimensional feature images have the same size.
  • the first image includes M first feature maps, and the height of each of the first feature maps in the M first feature maps is H a pixel, the width of each of the first feature maps is W pixels, H and W are integers greater than 1, and M is an integer greater than 0;
  • the first image processing layer of the convolutional neural network performs the first image B times down sampling processing to obtain a second image, comprising: dividing each first feature map into (H ⁇ W)/B 2 image blocks that do not overlap each other by the first image processing layer, ⁇ W) / B 2 block of image height for each image block B pixels, the width of each image block is a B pixel; based on the (H ⁇ W) / B 2 image blocks B 2 to give FIG Zhang second feature, the height of each of the second feature map B 2 wherein the second sheet as FIG H / B pixels, the width of each of the second feature map is W / B pixels, each of which Each pixel in the two feature maps is taken from
  • the first image includes M first feature maps, wherein each of the first feature maps has a height of H pixels, and each of the first feature maps has a width of W pixels.
  • the first image is a three-dimensional image, and the size of the first image of the three-dimensional image is H ⁇ W ⁇ M, that is, the height of the first image is H pixels, the width is W pixels, and the depth is M first.
  • the feature map that is, the first image of the three-dimensional image includes M two-dimensional first feature maps of H ⁇ W.
  • the first image may be an originally acquired image to be processed, or the first image may be a preprocessed image, or the first image may be processed by other image processing layers in the convolutional neural network.
  • the image obtained afterwards, or the first image may be an image obtained by processing by another image processing device, which is not limited in this embodiment of the present application.
  • the first image may be obtained in a plurality of different manners in the embodiment of the present application, which is not limited by the embodiment of the present application.
  • the first image when the first image is the originally acquired image to be processed, the first image may be acquired from the image capturing device; when the first image is an image processed by other image processing layers in the convolutional neural network, The first image output by the other image processing layer may be acquired; when the first image is an image processed by another image processing device, the first image output by the other image processing device may be acquired.
  • B 2 second feature maps can be obtained, and then B-time downsampling processing is performed on the M first feature maps to obtain M ⁇ B 2 .
  • the second feature map is obtained, that is, the second image is obtained.
  • the second image includes M ⁇ B 2 second feature maps, and the height of each second feature map in the M ⁇ B 2 second feature map is H/B pixels, and each second is The width of the feature map is H/B pixels, and it can be understood that the second image is a three-dimensional image, and the size of the first image of the three-dimensional image is (H/B) ⁇ (W/B) ⁇ (M ⁇ B 2 ), That is, the height of the first image is H/B pixels, the width is W/B pixels, and the depth is M ⁇ B 2 first feature maps, that is, the first image of the three-dimensional image includes M ⁇ B 2 sheets.
  • each pixel in the second feature map is associated with the position of the image block to which each pixel belongs in the first feature map, which can be understood as the second feature map.
  • the relative position of each pixel included in the second feature map is the same as the relative position of the image block to which each pixel belongs in the first feature map.
  • the image processing method provided by the embodiment of the present application obtains B 2 second feature maps by splitting and combining rearranged pixels included in each first feature map, thereby achieving B times of the first image. Sampling processing.
  • the B 2 second feature maps include all the pixels in the first feature map, that is, all the image information in each of the first feature maps are retained in the B 2 second feature maps;
  • the relative position between the pixels included in the second feature map is determined according to the relative position of the image block to which the pixel belongs in the first feature map, that is, each second feature map obtained by a first feature map is A thumbnail of the first feature map.
  • the first image includes M first feature maps, and the height of each of the first feature maps in the M first feature maps is H a pixel, the width of each of the first feature maps is W pixels, H and W are integers greater than 1, and M is an integer greater than 0;
  • the first image processing layer of the convolutional neural network performs the first image B times down sampling processing, obtaining a second image, comprising: convolving the M first feature image by the first image processing layer to obtain the second image, the convolution operation is in a width direction and a height direction
  • the convolution step is B, and the convolution operation uses N convolution kernels.
  • the height of each convolution kernel in the N convolution kernels is K pixels, and the width of each convolution kernel is J.
  • a pixel the depth of each convolution kernel is M feature maps
  • each of the first feature maps fills a height boundary of P pixels
  • each of the first feature maps fills a width boundary of P pixels
  • the second image includes N second feature maps, and the height of each of the N second feature maps is Pixels, the width of each second feature map is Pixels, where N is an integer greater than 0, P is an integer greater than or equal to 0, and J and K are greater than or equal to B.
  • a convolution kernel is a filter for extracting a feature map of an image.
  • the dimensions of the convolution kernel include width, height, and depth, where the depth of the convolution kernel is the same as the depth of the input image. How many different feature maps can be extracted by performing a convolution operation on how many different convolution kernels an input image uses.
  • the same image may be convolved multiple times by setting different sizes of convolution kernels, different weight values, or with different convolution steps. Extract the features of the image.
  • the convolution step is a process in which the convolution kernel is slid on the feature map of the input image to extract a feature map of the input image, and the convolution kernel performs two convolution operations in the height direction and the width direction. The distance between the slides.
  • the convolution step can determine the downsampling magnification of the input image.
  • the convolution step in the width (or height) direction is B, which can make the input feature map B times in the width (or height) direction. Downsampling.
  • the convolutional layer primarily serves to extract features.
  • the convolution operation is performed on the input image mainly according to the set convolution kernel.
  • the K ⁇ K image block covered by the convolution kernel while sliding on the image is dot-multiplied with the convolution kernel. That is, the gray value of each point on the image block is multiplied by the weight value of the same position on the convolution kernel, and a total of K ⁇ K results are obtained.
  • the offset is added to obtain a result, and the output is a single pixel of the output image.
  • the coordinate position of the pixel on the output image corresponds to the coordinate position of the center of the image block on the input image.
  • the dimension of the convolution kernel also needs to be three-dimensional, and the third dimension (depth) of the convolution kernel and the third dimension of the three-dimensional image
  • the three dimensions (depth or number of feature maps) are the same.
  • the convolution operation of the three-dimensional image and the three-dimensional convolution kernel can be transformed into two-dimensional image and convolution kernel divided into multiple two-dimensional feature maps and convolution kernels in two dimensions by depth (image channel number or number of feature maps).
  • the convolution operation is finally accumulated in the dimension of image depth, and finally a two-dimensional image output is obtained.
  • the output image of the convolutional layer usually includes a plurality of feature maps
  • the three-dimensional convolution kernel processes the three-dimensional input image to obtain a two-dimensional output feature map, and obtains multiple output feature maps.
  • a plurality of three-dimensional convolution kernels are required, so the dimension of the convolution kernel is larger than the dimension of the input image by one, and the value of the increased dimension corresponds to the depth of the output image, that is, the number of feature maps included in the output image.
  • the convolution operation is divided into a padding mode or a non-padding mode.
  • the padding method can be understood as an image preprocessing operation, and the same padding method includes a same padding method and a valid padding method.
  • the same padding method refers to adding the same boundary to the width and height of the input image, and convolving the image after the boundary is added, wherein the boundary refers to the outer boundary of the input image.
  • the size of the input image is 5 ⁇ 5 ⁇ 2.
  • the convolution operation is performed by the same padding method, the height boundary of the input image is 1 pixel, and the width boundary of the input image is 1 pixel, and one can be obtained.
  • a 7 ⁇ 7 ⁇ 2 image is subjected to a convolution operation on the 7 ⁇ 7 ⁇ 2 image.
  • the width boundary of the input image fill (width of the convolution kernel - 1) / 2
  • the height boundary of the fill (the height of the convolution kernel - 1) / 2
  • the convolution step is 1
  • the size of the convolution kernel when the size of the convolution kernel is 3 ⁇ 3, the height boundary and the width boundary of the input image are both 1 pixel; when the size of the convolution kernel is 5 ⁇ 5, the height of the input image is filled.
  • the boundary of the boundary and the width are both 2 pixels.
  • the size of the convolution kernel is 7 ⁇ 7, the height boundary and the width boundary of the input image are both 3 pixels, but this embodiment does not limit this.
  • the width (or height) of the input feature map is W
  • the width (or height) of the convolution kernel is F
  • the convolution step is S
  • the convolution operation is performed by the same padding method, and the input feature map is filled.
  • the width (or height) boundary is P
  • the resulting width (or height) of the output feature map can be expressed as: Where W, F, and S are integers greater than 0, and P is an integer greater than or equal to 0. The representative is right and rounded down.
  • the first image is convoluted by the first image processing layer of the convolutional neural network, and the convolution step of the convolution operation in the width direction and the height direction is B.
  • the convolution operation uses N convolution kernels, each convolution kernel having a height of K pixels, and each convolution kernel having a width of J pixels, each convolution kernel
  • the depth is M feature maps
  • the first image fills a height boundary of P pixels
  • the width boundary is P pixels, which can achieve B times downsampling of the first image.
  • J and K are greater than or equal to B, enabling the convolution kernel to traverse at least each pixel in the first image during the convolution process, that is, retain all image information in the first image.
  • M, N, and B satisfy the following formula: N ⁇ M ⁇ B/2.
  • the purpose of retaining the portion of the image information can be better by increasing the number of feature images included in the output image.
  • N ⁇ M ⁇ B/2 that is, increasing the depth of the second image according to a certain limited condition, the effect of compensating the image information of the first image loss can be achieved.
  • the first image includes M first feature maps, and the height of each of the first feature maps in the M first feature maps is H a pixel, the width of each of the first feature maps is W pixels, H and W are integers greater than 1, and M is an integer greater than 0;
  • the first image processing layer of the convolutional neural network performs the first image B times the downsampling process to obtain the second image, comprising: performing a pooling operation on each of the first feature maps of the M first feature maps by using the first image processing layer to obtain the second image, the pooling The pooling step size in the width direction and the height direction is B, the height of the pooling core of the pooling operation is B pixels, the width of the pooling core is B pixels, and the second image includes M sheets In the second feature map, each of the second feature maps has a height of H/B pixels, and each of the second maps has a width of W/B pixels. Among them, H/B and W/B are integers.
  • two common pooling operations are average pooling and max pooling, and the two pooling operations are processed in the width and height of the feature graph. Does not affect the depth of the output feature map.
  • the mean pooling operation refers to finding an average value in each area where the pooled core slips.
  • the image processing method provided by the embodiment of the present application performs B-time downsampling processing on the first image by the pooling layer, thereby reducing the data amount of the feature layer, thereby reducing the computational complexity of the convolutional neural network, and the convolution The cache bandwidth requirement of the neural network.
  • any one of the first to fourth possible implementation manners of the first aspect in a fifth possible implementation manner of the first aspect, A and B are mutually prime numbers .
  • a and B have a common number compared with A and B, and can ensure the integrity of the image information of the first image and the image texture information to a greater extent. Continuity.
  • the convolutional neural The network performs A-times upsampling processing on the second image to obtain a third image, including: performing a first processing on the second image by using the convolutional neural network to obtain a fourth image, and performing A times on the fourth image
  • the upsampling process obtains the third image.
  • the first process is an operation of non-upsampling processing or downsampling processing, such as a convolution operation with a convolution step of one or the like.
  • the present application provides an image processing method, the method comprising:
  • the convolutional neural network Up-sampling the first image by a first image processing layer of the convolutional neural network to obtain a second image, the convolutional neural network comprising a plurality of image processing layers, the plurality of image processing layers including the first An image processing layer, wherein A is an integer greater than one;
  • the second image is subjected to B-time downsampling processing by the second image processing layer of the plurality of image processing layers to obtain a third image, wherein B is an integer greater than 1, and A is not equal to B.
  • the image processing method provided by the embodiment of the present application performs A-upsampling processing on the acquired first image by the first image processing layer of the convolutional neural network to obtain a second image, and then passes through the convolutional neural network.
  • the second image processing layer performs B-time downsampling processing on the second image to obtain a third image, which can achieve upsampling or non-integer multiple downsampling of the first image.
  • the second image includes M second feature maps, and each of the second second feature maps has a height of H a pixel, the width of each second feature map is W pixels, H and W are integers greater than 1, and M is an integer greater than 0;
  • the second image processing layer of the convolutional neural network performs the second image B times down sampling processing, to obtain a third image, comprising: dividing, by the second image processing layer, each second feature image into (H ⁇ W)/B 2 image blocks that do not overlap each other, (H) ⁇ W) / B 2 block of image height for each image block B pixels, the width of each image block is a B pixel; based on the (H ⁇ W) / B 2 image blocks B 2 to give a third feature map, the height of each third feature map in the B 2 third feature maps is H/B pixels, and the width of each third feature map is W/B pixels, and each of the first Each pixel in the three feature map is taken from different image
  • the second image includes M second feature maps, where each of the second feature maps has a height of H a pixel, the width of each second feature map is W pixels, H and W are integers greater than 1, and M is an integer greater than 0;
  • the second image processing layer of the convolutional neural network performs the second image B times down sampling processing to obtain a third image, comprising: convolving the M second feature map by the second image processing layer to obtain the third image, the convolution operation in the width direction and the height direction
  • the convolution step is B, and the convolution operation uses N convolution kernels.
  • the height of each convolution kernel in the N convolution kernels is K pixels, and the width of each convolution kernel is J.
  • the third image includes N third feature maps, and the height of each of the N third feature maps is Pixels, the width of each third feature map is Pixels, where N is an integer greater than 0, P is an integer greater than or equal to 0, and J and K are greater than or equal to B.
  • M, N, and B satisfy the following formula: N ⁇ M ⁇ B/2.
  • the second image includes M second feature maps, where each of the second feature maps has a height of H a pixel, the width of each second feature map is W pixels, H and W are integers greater than 1, and M is an integer greater than 0; the second image processing layer of the convolutional neural network performs the second image B times the downsampling process to obtain the third image, comprising: performing a pooling operation on each of the second feature maps of the M second feature maps by the second image processing layer to obtain the third image, the pooling The pooling step size in the width direction and the height direction is B, the height of the pooling core of the pooling operation is B pixels, the width of the pooling core is B pixels, and the third image includes M sheets
  • the third feature map has a height of each of the third feature maps of H/B pixels, and the width of each of the third feature maps is W/B pixels. Among them, (H ⁇ W) / B 2 , H / B and W /
  • any one of the first to fourth possible implementation manners of the second aspect in a fifth possible implementation manner of the second aspect, A and B are mutually prime numbers .
  • the present application provides an image processing apparatus for performing the method of any of the above first aspect or any possible implementation of the first aspect.
  • the present application provides an image processing apparatus for performing the method of any of the above-described second aspect or any possible implementation of the second aspect.
  • the present application provides an image processing apparatus including: a memory, a processor, a communication interface, and a computer program stored on the memory and operable on the processor, wherein the processor
  • the method of any of the above-described first aspects or any of the possible implementations of the first aspect is performed when the computer program is executed.
  • the present application provides an image processing apparatus including: a memory, a processor, a communication interface, and a computer program stored on the memory and operable on the processor, wherein the processor
  • the method of any of the above-described second aspect or any of the possible implementations of the second aspect is performed when the computer program is executed.
  • the application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • the present application provides a computer readable medium for storing a computer program, the computer program comprising instructions for performing the method of any of the second aspect or any of the possible implementations of the second aspect.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspect or the first aspect of the first aspect.
  • the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above-described second aspect or any of the possible implementations of the second aspect.
  • the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path.
  • the processor Communicating with each other, the processor is operative to execute code in the memory, the processor being operative to perform the method of any of the first aspect or the first aspect of the first aspect when the code is executed.
  • the present application provides a chip, including: an input interface, an output interface, at least one processor, and a memory, wherein the input interface, the output interface, the processor, and the memory pass through an internal connection path Communicating with each other, the processor is operative to execute code in the memory, and when the code is executed, the processor is operative to perform the method of any of the second aspect or the second aspect of the second aspect.
  • Figure 1 is a schematic illustration of the height, width and depth of a three-dimensional image
  • FIG. 2 is a schematic diagram of a convolution layer implementing a convolution operation process
  • FIG. 3 is a schematic diagram of a pooling layer implementing a pooling operation process
  • FIG. 4 is a schematic diagram of a sub-pixel convolution layer implementing a sub-pixel convolution operation process
  • FIG. 5 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a downsampling process provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a process of performing convolution operations using convolution kernels of different sizes according to an embodiment of the present application
  • FIG. 9 is a schematic block diagram of an image processing apparatus according to an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.
  • a pixel is the most basic element that makes up an image and is a logical unit of measure.
  • the size of the image may include multiple dimensions. When the dimension of the image is two-dimensional, the size of the image includes height and width; when the dimension of the image is three-dimensional, the size of the image includes width, height, and depth.
  • the height of an image can be understood as the number of pixels that the image includes in the height direction; the width of the image can be understood as the number of pixels that the image includes in the width direction; the depth of the image can be understood as the number of channels of the image. .
  • the depth of an image can be understood as the number of feature maps included in the image, wherein the width and height of any one of the feature maps of the image are the same as the width of other feature maps of the image. Same height.
  • one image is a three-dimensional image
  • the three-dimensional image is composed of a plurality of two-dimensional feature images, and the plurality of two-dimensional feature images have the same size.
  • one image includes M feature maps, and each of the M feature maps has a height of H pixels, and each feature map has a width of W pixels, which can be understood as the image is a three-dimensional image.
  • the size of the three-dimensional image is H ⁇ W ⁇ M, that is, the three-dimensional image includes M two-dimensional feature maps of H ⁇ W.
  • H and W are integers greater than 1
  • M is an integer greater than zero.
  • Figure 1 shows a 5 ⁇ 5 ⁇ 3 image, which includes 3 feature maps (such as red (R, R) feature map, green (green) map, and blue (blue) features.
  • the size of each feature map is 5 ⁇ 5.
  • the feature maps of different colors can be understood as different channels of the image, and different channels can be regarded as different feature maps in the convolutional neural network.
  • FIG. 1 is only described by taking an image with a depth of 3 as an example, and the depth of the image may also be other values, for example, the depth of the gray image is 1, and the RGB-depth (D) image is The depth is 4, etc., and the embodiment of the present application does not limit this.
  • the resolution of an image (or feature map) can be understood as the product of the width and height of the image (or feature map), ie if the height of the image (or feature map) is H pixels, the image (or feature) The width of the image is W pixels, and the resolution of the image (or feature map) is H ⁇ W.
  • a convolution kernel is a filter that extracts a feature map of an image.
  • the dimensions of the convolution kernel include width, height, and depth, where the depth of the convolution kernel is the same as the depth of the input image. How many different feature maps can be extracted by performing a convolution operation on how many different convolution kernels an input image uses.
  • an output feature map can be obtained, using a plurality of different 5 ⁇ 5 ⁇ 3 convolution checksums 7 ⁇ 7 A convolution operation is performed on the input image of ⁇ 3, and a plurality of different output feature maps can be obtained.
  • the same image may be convolved multiple times by setting different sizes of convolution kernels, different weight values, or with different convolution steps. Extract the features of the image.
  • the convolution step is a distance in which the convolution kernel slides between the convolution operations in the height direction and the width direction in the process of sliding the feature map of the input image on the feature image of the input image. .
  • the convolution step can determine the downsampling magnification of the input image.
  • the convolution step in the width (or height) direction is B, which can make the input feature map B times in the width (or height) direction.
  • B is an integer greater than one.
  • the convolutional layer plays a major role in extracting features.
  • the convolution operation is performed on the input image mainly according to the set convolution kernel.
  • the K ⁇ K image block covered by the convolution kernel while sliding on the image is dot-multiplied with the convolution kernel. That is, the gray value of each point on the image block is multiplied by the weight value of the same position on the convolution kernel, and a total of K ⁇ K results are obtained.
  • the offset is added to obtain a result, and the output is a single pixel of the output image.
  • the coordinate position of the pixel on the output image corresponds to the coordinate position of the center of the image block on the input image.
  • the dimension of the convolution kernel also needs to be three-dimensional, and the third dimension (depth) of the convolution kernel and the third dimension of the three-dimensional image
  • the three dimensions (depth or number of feature maps) are the same.
  • the convolution operation of the three-dimensional image and the three-dimensional convolution kernel can be transformed into two-dimensional image and convolution kernel divided into multiple two-dimensional feature maps and convolution kernels in two dimensions by depth (image channel number or number of feature maps).
  • the convolution operation is finally accumulated in the dimension of image depth, and finally a two-dimensional image output is obtained.
  • the output image of the convolutional layer usually includes a plurality of feature maps
  • the three-dimensional convolution kernel processes the three-dimensional input image to obtain a two-dimensional output feature map, and obtains multiple output feature maps.
  • a plurality of three-dimensional convolution kernels are required, so the dimension of the convolution kernel is larger than the dimension of the input image by one, and the value of the increased dimension corresponds to the depth of the output image, that is, the number of feature maps included in the output image.
  • the convolution operation is divided into a padding mode or a non-padding mode.
  • the padding method can be understood as an image preprocessing operation, and the same padding method includes a same padding method and a valid padding method.
  • padding manners in the embodiments of the present application all refer to the same padding manner, but the embodiment of the present application is not limited thereto.
  • the same padding method refers to adding the same boundary to the width and height of the input image, and convolving the image after the boundary is added, wherein the boundary refers to the outer boundary of the input image.
  • the size of the input image is 5 ⁇ 5 ⁇ 2.
  • the convolution operation is performed by the same padding method, the height boundary of the input image is 1 pixel, and the width boundary of the input image is 1 pixel, and one can be obtained.
  • a 7 ⁇ 7 ⁇ 2 image is subjected to a convolution operation on the 7 ⁇ 7 ⁇ 2 image.
  • the width boundary of the input image fill (width of the convolution kernel - 1) / 2
  • the height boundary of the fill (the height of the convolution kernel - 1) / 2
  • the convolution step is 1
  • the size of the convolution kernel when the size of the convolution kernel is 3 ⁇ 3, the height boundary and the width boundary of the input image are both 1 pixel; when the size of the convolution kernel is 5 ⁇ 5, the height of the input image is filled.
  • the boundary of the boundary and the width are both 2 pixels.
  • the size of the convolution kernel is 7 ⁇ 7, the height boundary and the width boundary of the input image are both 3 pixels, but this embodiment does not limit this.
  • the width (or height) of the input feature map is W
  • the width (or height) of the convolution kernel is F
  • the convolution step is S
  • the convolution operation is performed by the same padding method, and the input feature map is filled.
  • the width (or height) boundary is P
  • the resulting width (or height) of the output feature map can be expressed as: Wherein, W, F, and S are integers greater than 0, and P is an integer greater than or equal to 0. The representative is right and rounded down.
  • FIG. 2 shows a process in which a convolution layer performs a convolution operation on an input image.
  • the size of the three-dimensional input image is 5 ⁇ 5 ⁇ 3, and the height boundary and the width boundary of the input image are both 1 pixel, and 7 is obtained.
  • the convolution operation has a convolution step size of 2 in the width direction and the height direction, and the convolution operation is convoluted by the convolution kernel w0, and the size of the convolution kernel w0 is 3 ⁇ 3 ⁇ 3
  • the three input feature maps (input feature map 1, input feature map 2, and input feature map 3) included in the input image and the three-layer depth of the convolution kernel (convolution kernel w0-1, convolution kernel w0-, respectively) 2 and the convolution kernel w0-3) are convoluted to obtain an output characteristic map 1, which has a size of 3 ⁇ 3 ⁇ 2.
  • the first layer depth of w0 (ie, w0-1) is multiplied by the element of the corresponding position in the blue box of the input feature map 1 and then summed to obtain 0.
  • the blue frame sequentially slides along the width direction and the height direction of each input feature map to continue the next convolution operation, wherein each sliding distance 2 (i.e., the convolution step in both the width and height directions is 2) until the convolution operation on the input image is completed, and an output characteristic map 1 of 3 ⁇ 3 ⁇ 1 is obtained.
  • the convolution operation also convolves the input image with another convolution kernel w1, based on a process similar to the convolution kernel w0, the output feature map 2 can be obtained, and the size of the output feature graph 2 is 3. ⁇ 3 ⁇ 2.
  • the output feature map 1 and the output feature map 2 are also activated by an activation function, and the activated output feature map 1 and the activated output feature map 2 are obtained.
  • the pooling layer makes the width and height of the feature map smaller, and reduces the computational complexity of the convolutional neural network by reducing the amount of data in the feature layer.
  • feature compression is performed to extract the main features.
  • two common pooling operations are average pooling and max pooling, and the two pooling operations are processed in the width and height of the feature graph. Does not affect the depth of the output feature map.
  • FIG. 3 shows a process in which the pooling layer performs a pooling operation on the input image.
  • the input image is a 4 ⁇ 4 ⁇ 1 image
  • the input image is subjected to max pooling operation through the 2 ⁇ 2 pooling kernel, that is, in the pooling core slide.
  • the maximum value is found in each area as one pixel in the output image, and the position of each pixel in the output image is the same as the position of the area to which each pixel belongs in the input image, wherein the pooling step is 2
  • the main features are extracted from the input image to obtain an output image.
  • the mean pooling operation refers to finding an average value in each area where the pooled core slips.
  • the deconvolution layer is also called the transposed convolution layer.
  • the deconvolution step By setting the deconvolution step, the downsampling ratio of the input image can be determined.
  • the convolution step in the width (or height) direction is A
  • the input feature map can be down-sampled by A times in the width (or height) direction
  • B is an integer greater than 1.
  • deconvolution operation can be understood as the inverse of the convolution operation as shown in FIG. 2.
  • the width (or height) of the input feature map is W
  • the width (or height) of the convolution kernel is F
  • the deconvolution step size is S
  • the width (or height) boundary of the output feature map is P
  • the width (or height) of the obtained output feature map can be expressed as: S ⁇ (W-1) + F-2P, where W, F, and S are integers greater than 0, and P is greater than or equal to 0. Integer.
  • a 5 ⁇ 5 ⁇ 1 input image is deconvolved using a 3 ⁇ 3 ⁇ 3 convolution kernel, and each pixel in the input feature map of the input image and the 3 ⁇ 3 deconvolution core are Each weight of one layer is multiplied to obtain a 3 ⁇ 3 image block corresponding to each pixel, and the image block is placed on a 7 ⁇ 7 ⁇ 1 output feature map 1, and the image block is The center position is the position of each pixel, and the distance between the center positions of two adjacent image blocks is equal to the deconvolution step. Then, the plurality of values assigned to each pixel in the output feature map are accumulated to obtain a final output feature map. Similarly, the second layer depth and the third layer depth through the deconvolution kernel can be obtained. After the input feature map is deconvolved, the output feature map 2 and the output feature map 3 are obtained, and then one pixel of the three feature map boundaries is cropped to obtain a 5 ⁇ 5 ⁇ 3 output image.
  • the sub-pixel convolution layer achieves an integer proportional amplification of the width and height of the input image by integrating multiple feature maps in the dimension of the input image depth.
  • a sub-pixel convolution operation can be understood as a method of data arrangement and recombination of a feature map included in an input image.
  • the sub-pixel convolution layer rearranges the pixels at the same position in the r 2 feature maps into one
  • the image block of r ⁇ r corresponds to an image block of r ⁇ r in the output feature map, and the center position of the image block of the r ⁇ r is the each pixel, so that the input feature map of H ⁇ W ⁇ r 2 is Rearranged into an output feature map of rH ⁇ rW ⁇ 1.
  • This transformation is referred to as a sub-pixel
  • convolution the convolution operation is not actually, the process is arranged in the H ⁇ W r 2 th input feature pixel combinations FIG.
  • FIG. 4 For example, as shown in FIG. 4, four 2 ⁇ 2 input feature maps are obtained by sub-pixel convolution operation of the sub-pixel convolution layer, and an output feature map as shown in FIG. 4 is obtained.
  • the size is 4 x 4, it being understood that for the sake of clarity of description, the number with parentheses on each pixel in Figure 4 indicates the number or identity of the pixel, rather than the pixel value on the pixel.
  • the technical solution of the embodiment of the present invention may be applied to a terminal device, which may be mobile or fixed.
  • the terminal device may be a mobile phone with image processing function or a tablet personal computer ( Tablet personal computer (TPC), media player, smart TV, laptop computer (LC), personal digital assistant (PDA), personal computer (PC), camera, video camera, smart watch, A wearable device (WD) or the like is not limited in this embodiment of the present invention.
  • Tablet personal computer Tablet personal computer (TPC), media player, smart TV, laptop computer (LC), personal digital assistant (PDA), personal computer (PC), camera, video camera, smart watch,
  • a wearable device (WD) or the like is not limited in this embodiment of the present invention.
  • FIG. 6 shows a schematic flowchart of an image processing method 600 provided by an embodiment of the present application, which may be performed by, for example, an image processing apparatus.
  • S620 Perform B-time downsampling processing on the first image by using a first image processing layer of the convolutional neural network to obtain a second image, where the convolutional neural network includes multiple image processing layers, where the multiple image processing layers include The first image processing layer, wherein B is an integer greater than one.
  • the second image processing layer of the convolutional neural network performs A-time upsampling processing on the second image to obtain a third image, where A is An integer greater than 1, and A is not equal to B.
  • the image processing method provided by the embodiment of the present application performs B-time downsampling processing on the acquired first image by using the first image processing layer of the convolutional neural network to obtain a second image, and then passes through the convolutional neural network.
  • the second image processing layer performs A-time upsampling processing on the second image to obtain a third image, which can achieve upsampling or non-integer multiple downsampling of the first image.
  • the first image includes M first feature maps, and each of the first feature maps has a height of H pixels, and each of the first feature maps has a width of W pixels, H and W is an integer greater than 1, and M is an integer greater than zero.
  • the first image includes M first feature maps, wherein each of the first feature maps has a height of H pixels, and each of the first feature maps has a width of W pixels.
  • the first image is a three-dimensional image, and the size of the first three-dimensional image is H ⁇ W ⁇ M, that is, the height of the first image is H pixels, the width is W pixels, and the depth is M.
  • a feature map, that is, the first image of the three dimensions includes M two-dimensional first feature maps of H ⁇ W.
  • the first image may be an originally acquired image to be processed, or the first image may be a preprocessed image, or the first image may be processed by other image processing layers in the convolutional neural network.
  • the image obtained afterwards, or the first image may be an image obtained by processing by another image processing device, which is not limited in this embodiment of the present application.
  • the first image may be obtained in a plurality of different manners in the embodiment of the present application, which is not limited by the embodiment of the present application.
  • S610 may acquire the first image from the image capturing device; the first image is an image processed by other image processing layers in the convolutional neural network.
  • the S610 may be the first image outputted by the other image processing device.
  • the first image is the image obtained by the other image processing device, and the S610 may obtain the first image output by the other image processing device. .
  • the convolutional neural network described in the embodiments of the present application may include multiple image processing layers, wherein the first image processing layer of the convolutional neural network may include a part of the image processing layers of the plurality of image processing layers.
  • the second image processing layer of the convolutional neural network may include another partial image processing layer of the plurality of image processing layers, which is not limited by the embodiment of the present application.
  • the image processing device may perform a B-time downsampling process on the first image by using the first image processing layer to obtain the second image. Not limited.
  • the image processing apparatus divides each first feature map into (H ⁇ W)/B 2 image blocks that are not overlapped by each other by the first image processing layer, where (H ⁇ W) /B 2 image blocks each having a height of B pixels, the width of each image block being B pixels; according to the (H ⁇ W)/B 2 image blocks, obtaining B 2 sheets and second FIG characterized in height, the sheet second B 2 wherein each of the second characteristic figure of FIG H / B pixels, the width of each of the second feature of FIG W / B pixels, each second feature of the FIG.
  • Each pixel in the image is taken from different image blocks in the (H ⁇ W)/B 2 image blocks, and the position of each pixel in each of the second feature maps is corresponding to the image block to which each pixel belongs
  • the locations in the first feature map are associated.
  • (H ⁇ W) / B 2 , H / B and W / B are integers.
  • B 2 second feature maps can be obtained, and then B-time downsampling processing is performed on the M first feature maps to obtain M ⁇ B 2 .
  • the second feature map is obtained, that is, the second image is obtained.
  • the second image includes M ⁇ B 2 second feature maps, and the height of each second feature map in the M ⁇ B 2 second feature map is H/B pixels, and each second is The width of the feature map is H/B pixels, and it can be understood that the second image is a three-dimensional image, and the size of the first image of the three-dimensional image is (H/B) ⁇ (W/B) ⁇ (M ⁇ B 2 ), That is, the height of the first image is H/B pixels, the width is W/B pixels, and the depth is M ⁇ B 2 first feature maps, that is, the first image of the three-dimensional image includes M ⁇ B 2 sheets.
  • each pixel in the second feature map is associated with the position of the image block to which each pixel belongs in the first feature map, which can be understood as the second feature map.
  • the relative position of each pixel included in the second feature map is the same as the relative position of the image block to which each pixel belongs in the first feature map.
  • FIG. 6 shows a schematic diagram of a 4 ⁇ 4 ⁇ 1 input image processed by a convolutional neural network to obtain an output image. It should be understood that for the sake of clarity of description, the number of parentheses on each pixel in FIG. 6 is the pixel. The number or logo, not the pixel value on that pixel.
  • the input image includes one 4 ⁇ 4 input feature map, and the input feature map is divided into four 2 ⁇ 2 image blocks;
  • the image block 1 includes numbers 1, 2, 5, and 6.
  • the image block 2 includes pixels numbered 3, 4, 7, and 8
  • the image block 3 includes pixels numbered 9, 10, 13, and 14, and the image block 4 includes pixels numbered 11, 12, 15, and 16;
  • Extracting one pixel from the upper left corner position of each image block constitutes an output feature map 1, wherein the relative position of each pixel in the output feature map 1 in the output feature map 1 and the image block to which each pixel belongs
  • the relative positions in the input feature map are the same, that is, the relative positions of the pixels numbered 1, 3, 9, and 11 in the output feature map 1 are the same as the relative positions of the image blocks 1, 2, 3, and 4.
  • one pixel is taken from the upper right corner of each image block to form an output feature map 2
  • one pixel is taken from the lower left corner position of each image block to form an output feature map 3
  • the image is taken from the lower right corner of each image block.
  • One pixel constitutes the output characteristic map 4, that is, the size of the output image is 2 ⁇ 2 ⁇ 4.
  • the image processing method provided by the embodiment of the present application obtains B 2 second feature maps by splitting and combining rearranged pixels included in each first feature map, thereby achieving B times of the first image. Sampling processing.
  • the B 2 second feature maps include all the pixels in the first feature map, that is, all the image information in each of the first feature maps are retained in the B 2 second feature maps;
  • the relative position between the pixels included in the second feature map is determined according to the relative position of the image block to which the pixel belongs in the first feature map, that is, each second feature map obtained by a first feature map is A thumbnail of the first feature map.
  • the image processing apparatus convolves the M first feature maps by the first image processing layer to obtain the second image, and the convolution operation is in the width direction and the height direction.
  • the convolution step is B, and the convolution operation uses N convolution kernels, each of which has a height of K pixels, and the width of each convolution kernel is J pixels.
  • each convolution kernel is M feature maps
  • each of the first feature maps fills a height boundary of P pixels
  • each of the first feature maps fills a width boundary of P pixels
  • the second The image includes N second feature maps, and the height of each of the N second feature maps is Pixels, the width of each second feature map is Pixels, where N is an integer greater than 0, P is an integer greater than or equal to 0, and J and K are greater than or equal to B.
  • the first image processing layer may be a convolution layer, but the embodiment of the present application does not limit this.
  • Figure 7 shows a 6 ⁇ 6 ⁇ 1 input image.
  • the convolution method of the same padding method has a convolution step of 3 pixels in both the height direction and the width direction.
  • the core is, for example, a schematic diagram of a convolution process when a convolution operation is performed for 1 ⁇ 1 ⁇ 1, 3 ⁇ 3 ⁇ 1, and 5 ⁇ 5 ⁇ 1.
  • the width (or height) of the convolution kernel is equal to the convolution step in the width (or height) direction (for example, the size of the convolution kernel in Fig. 7 is 3 ⁇ 3 ⁇ 1, in the width and height directions
  • the convolution step is a convolution process of 3 pixels, there is no overlap between the convolution regions corresponding to the convolution kernel in the adjacent two convolution operations, but the convolution kernel covers all the input images during the sliding process. Pixel.
  • the width (or height) of the convolution core is larger than the convolution step in the width (or height) direction (for example, the size of the convolution kernel in Fig. 7 is 5 ⁇ 5 ⁇ 1, in the width and height directions
  • the convolution step is a convolution process of 3 pixels, there is overlap between the convolution regions corresponding to the convolution kernels in the adjacent two convolution operations, and all of the input images are covered during the convolution kernel slip process. Pixel.
  • the output image does not cause a large amount of image information to be lost due to the fact that all pixel calculations of the input image are not used.
  • M, N, and B can satisfy the following formula: N ⁇ M ⁇ B/2, that is, increase the depth of the output image according to a certain limited condition, The effect of replenishing the image information of the input image loss is achieved.
  • the first image is subjected to a convolution operation by the first image processing layer, and the convolution step of the convolution operation in the width direction and the height direction is B, and the convolution The operation uses N convolution kernels, each convolution kernel has a height of K pixels, and each convolution kernel has a width of J pixels, and the depth of each convolution kernel is M
  • the feature map has a height boundary of P pixels and a width boundary of P pixels, which can achieve B times downsampling of the first image.
  • J and K are greater than or equal to B, enabling the convolution kernel to traverse at least each pixel in the first image during the convolution process, that is, retain all image information in the first image.
  • the image processing apparatus performs a pooling operation on the M first feature maps by using the first image processing layer to obtain the second image, and the pooling operation is performed in a width direction and a height direction.
  • the pooling step is B
  • the height of the pooled core of the pooling operation is B pixels
  • the width of the pooling core is B pixels
  • the second image includes M second feature maps
  • the second sheet is M
  • the height of each second feature map in the feature map is H/B pixels
  • the width of each second feature map is W/B pixels.
  • H/B and W/B are integers.
  • the first image processing layer may be a pooling layer, which is not limited in this embodiment of the present application.
  • the first image processing layer may include P sub-image processing layers, and P is an integer greater than 1.
  • the image processing apparatus implements a downsampling process of B i times the first image by each of the P sub-image processing layers, wherein B i satisfies the following formula:
  • the image processing apparatus may perform a multi-upsampling process on the second image by using the second image processing layer to obtain the third image. This is not limited.
  • the image processing apparatus may perform a deconvolution operation on the second image by the second image processing layer to obtain the third image, and the deconvolution operation is deconvolved in width and height.
  • the step size is A.
  • the second image processing layer may be a deconvolution layer, but the embodiment of the present application does not limit this.
  • the image processing apparatus may perform a sub-pixel convolution operation on the second image by using the second image processing layer, and then the image processing apparatus performs a sub-pixel convolution operation on the second image to obtain a third image, wherein the width (or height) of the third image is A times the width (or height) of the second image, and the depth of the third image is 1/A of the depth of the second image.
  • the second image processing layer may be a sub-pixel convolution layer, but the embodiment of the present application does not limit this.
  • the second image processing layer may include Q sub-image processing layers, and Q is an integer greater than 1.
  • the image processing apparatus A i times to achieve the second image processing by the image of each sub-layer of the image processing sub Q sampling processing layer, wherein, A i satisfies the following formula:
  • the S630 may include: the image processing device performs a first process on the second image by using the second image processing layer to obtain a fourth image, and the fourth image is A-timed by the second image processing layer.
  • the upsampling process obtains the third image.
  • the first processing may be non-sampling processing (non-sampling processing includes non-upsampling processing and non-subsampling processing), for example, using a convolution operation with a convolution step of 1 in the width direction and the height direction to the second image
  • the processing is performed, or the second image is processed by using an activation function, and the like.
  • the image processing apparatus may perform integer multiple sampling processing on the first image to obtain a second image, and then perform an integer multiple of upsampling processing on the second image to obtain a third image.
  • the image processing apparatus may perform A-time upsampling processing on the first image to obtain a fifth image, and then perform B-time downsampling processing on the fifth image to obtain a sixth image, which may also be implemented.
  • the non-integer multiple of the first image is downsampled or non-integer multiple of the upsampling. Therefore, the solution should also be within the scope of protection of the embodiments of the present application.
  • FIG. 9 is a schematic block diagram of an image processing apparatus 900 provided by an embodiment of the present application.
  • the device 900 includes:
  • the obtaining unit 910 is configured to acquire a first image.
  • the processing unit 920 is configured to perform a B-time downsampling process on the first image acquired by the acquiring unit 910 by using a first image processing layer of the convolutional neural network to obtain a second image, where the convolutional neural network includes multiple images.
  • a processing layer the plurality of image processing layers including the first image processing layer, wherein B is an integer greater than 1; and the second image is A-timed by the second image processing layer of the plurality of image processing layers Sampling processing to obtain a third image, where A is an integer greater than 1, and A is not equal to B.
  • the first image includes M first feature maps, wherein each of the first feature maps has a height of H pixels, and each of the first feature maps has a width of W pixels.
  • H and W are integers greater than 1, and M is an integer greater than 0;
  • the processing unit is specifically configured to: divide each first feature map into (H ⁇ W)/B 2 image blocks that do not overlap each other, The height of each image block in the (H ⁇ W)/B 2 image blocks is B pixels, and the width of each image block is B pixels; according to the (H ⁇ W)/B 2 image blocks, Obtaining B 2 second feature maps, wherein the height of each second feature map in the B 2 second feature maps is H/B pixels, and the width of each second feature map is W/B pixels, Each pixel in each second feature map is taken from a different image block in the (H ⁇ W)/B 2 image blocks, and the position of each pixel in each of the second feature maps and each The position of the image block to which the pixel belongs is associated in the first feature map.
  • the first image includes M first feature maps, wherein each of the first feature maps has a height of H pixels, and each of the first feature maps has a width of W pixels.
  • H and W are integers greater than 1, and M is an integer greater than 0;
  • the processing unit is specifically configured to perform convolution operation on the M first feature maps by using the first image processing layer to obtain the second image, where
  • the convolution operation has a convolution step of B in both the width direction and the height direction, and the convolution operation uses N convolution kernels, and each of the N convolution kernels has a height of K pixels.
  • Each convolution kernel has a width of J pixels, and the depth of each convolution kernel is M feature maps, and each of the first feature maps fills a height boundary of P pixels, and each of the first images is filled with The width boundary is P pixels, and the second image includes N second feature maps, and the height of each of the N second feature maps is Pixels, the width of each second feature map is Pixels, where N is an integer greater than 0, P is an integer greater than or equal to 0, and J and K are greater than or equal to B.
  • M, N, and B satisfy the following formula: N ⁇ M ⁇ B/2.
  • the first image includes M first feature maps, wherein each of the first feature maps has a height of H pixels, and each of the first feature maps has a width of W pixels.
  • H and W are integers greater than 1
  • M is an integer greater than 0.
  • the processing unit is specifically configured to perform a pooling operation on the M first feature maps by using the first image processing layer to obtain the second image.
  • the pooling operation has a pooling step size B in the width direction and the height direction, and the height of the pooling core of the pooling operation is B pixels, the width of the pooling core is B pixels, and the second image includes M
  • the second feature map has a height of each of the second feature maps of H/B pixels, and the width of each of the second feature maps is W/B pixels.
  • H/B and W/B are integers.
  • a and B are prime numbers.
  • the image processing apparatus 900 herein is embodied in the form of a functional unit.
  • the term "unit" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor (eg, a shared processor, a proprietary processor, or a group) for executing one or more software or firmware programs. Processors, etc.) and memory, merge logic, and/or other suitable components that support the described functionality.
  • ASIC application specific integrated circuit
  • processor eg, a shared processor, a proprietary processor, or a group
  • memory merge logic, and/or other suitable components that support the described functionality.
  • the image processing device 900 can be specifically the image processing device in the foregoing method 600, and the image processing device 900 can be used to execute the image processing device in the foregoing method 600. Corresponding processes and/or steps are not repeated here to avoid repetition.
  • FIG. 10 is a schematic block diagram of an image processing apparatus 1000 provided by an embodiment of the present application.
  • the image processing apparatus 1000 may be the image processing apparatus described in FIG. 9, and the image processing apparatus may employ hardware as shown in FIG. Architecture.
  • the image processing apparatus may include a processor 1010, a communication interface 1020, and a memory 1030 that communicate with each other through an internal connection path.
  • the related functions implemented by the processing unit 920 in FIG. 9 may be implemented by the processor 1010, and the related functions implemented by the obtaining unit 910 may be implemented by the processor 1010 controlling the communication interface 1020.
  • the processor 1010 may include one or more processors, for example, including one or more central processing units (CPUs).
  • processors for example, including one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single core CPU, It can be a multi-core CPU.
  • the communication interface 1020 is for transmitting and/or receiving data.
  • the communication interface may include a transmission interface for transmitting data and a receiving interface for receiving data.
  • the memory 1030 includes, but is not limited to, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM), and a read only memory.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read-only memory
  • the memory 1030 is used to store program code and data of the image processing apparatus, and may be a separate device or integrated in the processor 1010.
  • the processor 1010 is configured to control a communication interface with other devices, such as data transmission with other image processing devices.
  • other devices such as data transmission with other image processing devices.
  • FIG. 10 only shows a simplified design of the image processing apparatus.
  • the image retrieval device may further include other necessary components, including but not limited to any number of communication interfaces, processors, controllers, memories, etc., and all image processing devices that can implement the present application are in the present application. Within the scope of protection.
  • image processing device 1000 can be replaced with a chip device, such as a chip that can be used in an image processing device, for implementing related functions of processor 1010 in an image processing device.
  • the chip device can be a field programmable gate array for implementing related functions, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, or a programmable controller or other integrated chip.
  • the chip may include one or more memories for storing program code that, when executed, causes the processor to perform the corresponding functions.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请提供了一种图像处理方法和装置,该方法包括:获取第一图像;通过卷积神经网络的第一图像处理层对该第一图像进行B倍的下采样处理,得到第二图像,该卷积神经网络包括多个图像处理层,该多个图像处理层包括该第一图像处理层,其中,B为大于1的整数;通过该多个图像处理层中的第二图像处理层对该第二图像进行A倍的上采样处理,得到第三图像,其中,A为大于1的整数,且A不等于B。采用本申请提供的图像处理方法和装置,能够实现对图像的非整数倍的上采样或非整数倍的下采样。

Description

图像处理方法和装置
本申请要求于2017年12月29日提交中国专利局、申请号为201711471002.9、申请名称为“图像处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,更具体地,涉及图像处理领域中的图像处理方法和装置。
背景技术
随着图像处理技术的不断发展,以及人们对图像显示画质要求的不断提升,基于深度学习的卷积神经网络(convolutional neural network,CNN)以其局部权值共享的特殊结构,在图像处理领域得到了快速发展,并逐渐成为了行业内一个重要的技术选择。
在实际应用中,经常存在需要将图像的分辨率由720逐行扫描(progressive,p)放大至1080p,即需要对图像进行非整数倍的上采样处理,或由1080p缩小至720p的场景,即需要对图像进行非整数倍的下采样处理。然而,目前采用由卷积层构成的卷积神经网络模型,例如有效亚像素卷积神经网络(efficient sub-pixel convolutional neural network,ESPCN)模型、快速超分辨率卷积神经网络(fast super-resolution convolutional neural networks,FSRCNN)模型等,能够实现对图像的整数倍(包括倍率为1)上采样,例如图像超分辨率算法。
因此,需要提供一种图像处理方法解决如何实现图像的非整数倍的上采样或非整数倍的下采样的问题。
发明内容
本申请提供一种图像处理方法和装置,能够实现对图像的非整数倍的上采样或非整数倍的下采样。
第一方面,本申请提供了一种图像处理方法,该方法包括:
获取第一图像;
通过卷积神经网络的第一图像处理层对该第一图像进行B倍的下采样处理,得到第二图像,该卷积神经网络包括多个图像处理层,该多个图像处理层包括该第一图像处理层,其中,B为大于1的整数;
通过该多个图像处理层中的第二图像处理层对该第二图像进行A倍的上采样处理,得到第三图像,其中,A为大于1的整数,且A不等于B。
本申请实施例提供的图像处理方法,通过卷积神经网络的第一图像处理层对获取到的第一图像进行B倍的下采样处理,得到第二图像,再通过该卷积神经网络的第二图像处理层对该第二图像进行A倍的上采样处理,得到第三图像,能够实现对该第一图像的非整数 倍的上采样或非整数倍的下采样。
此外,由于先进行下采样处理,再进行上采样处理,减少了卷积神经网络处理的数据量,从而能够降低图像处理的计算复杂度,以及提高图像处理效率。
应理解,图像的尺寸可以包括多个维度,当图像的维度为二维时,图像的尺寸包括高度和宽度;当图像的维度为三维时,图像的尺寸包括宽度、高度和深度。
还应理解,像素是组成图像的最基本的元素,是一种逻辑尺寸单位。
还应理解,图像的高度可以理解为该图像在高度方向上包括的像素的数量;图像的宽度可以理解为该图像在宽度方向上包括的像素的数量;图像的深度可以理解为该图像的通道数量。
还应理解,在卷积神经网络模型中,图像的深度可以理解为图像包括的特征图(feature maps)的数量,其中,该图像的任意一张特征图的宽度和高度都与该图像的其他特征图的宽度和高度相同。也就是说,一张图像为三维图像,可以理解为该三维图像是由多张二维特征图构成的,且该多张二维特征图的尺寸相同。
还应理解,本申请实施例中,下采样的倍率B大于上采样的倍率A时,能够实现对该第一图像的非整数倍的下采样;下采样的倍率B小于上采样的倍率A时,能够实现对该第一图像的非整数倍的上采样。
结合第一方面,在第一方面的第一种可能的实现方式中,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该通过卷积神经网络的第一图像处理层对该第一图像进行B倍的下采样处理,得到第二图像,包括:通过该第一图像处理层将该每张第一特征图划分为互不重叠的(H×W)/B 2个图像块,该(H×W)/B 2个图像块中每个图像块的高度为B个像素,该每个图像块的宽度为B个像素;根据该(H×W)/B 2个图像块,得到B 2张第二特征图,该B 2张第二特征图中每张第二特征图的高度为H/B个像素,该每张第二特征图的宽度为W/B个像素,该每张第二特征图中的每个像素取自该(H×W)/B 2个图像块中的不同图像块,该每个像素在该每张第二特征图中的位置与该每个像素所属的图像块在该第一特征图中的位置相关联。其中,(H×W)/B 2、H/B和W/B均为整数。
应理解,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素、该每张第一特征图的宽度为W个像素,可以理解为该第一图像为三维图像,该三维的第一图像的尺寸为H×W×M,即该第一图像的高度为H个像素、宽度为W个像素、深度为M个第一特征图,也就是说,该三维的第一图像包括M张H×W的二维的第一特征图。
可选地,该第一图像可以为原始采集的待处理图像,或者该第一图像可以为经过预处理的图像,或者该第一图像可以为经过该卷积神经网络中的其它图像处理层处理后得到的图像,或者该第一图像可以为经过其他图像处理装置处理后得到的图像,本申请实施例对此不作限定。
可选地,本申请实施例中可以通过多种不同的方式获取该第一图像,本申请实施例对此不作限定。
例如,该第一图像为原始采集的待处理图像时,可以从图像采集装置获取该第一图像; 该第一图像为经过该卷积神经网络中的其他图像处理层处理后得到的图像时,可以获取该其它图像处理层输出的该第一图像;该第一图像为经过其他图像处理装置处理后得到的图像时,可以获取该其它图像处理装置输出的该第一图像。
应理解,对一张第一特征图进行B倍的下采样处理,可以得到B 2张第二特征图,那么对M张第一特征图进行B倍的下采样处理,可以得到M×B 2张第二特征图,即得到该第二图像。
还应理解,该第二图像包括M×B 2张第二特征图,该M×B 2张第二特征图中每张第二特征图的高度为H/B个像素、该每张第二特征图的宽度为H/B个像素,可以理解为该第二图像为三维图像,该三维的第一图像的尺寸为(H/B)×(W/B)×(M×B 2),即该第一图像的高度为H/B个像素、宽度为W/B个像素、深度为M×B 2个第一特征图,也就是说,该三维的第一图像包括M×B 2张(H/B)×(W/B)的二维的第二特征图。
还应理解,该每个像素在该每张第二特征图中的位置与该每个像素所属的图像块在该第一特征图中的位置相关联,可以理解为,该第二特征图中包括的每个像素在该第二特征图中的相对位置与该每个像素所属的图像块在该第一特征图中的相对位置相同。
本申请实施例提供的图像处理方法,通过对每个第一特征图中包括的像素进行拆分和组合重排,得到B 2个第二特征图,能够实现对该第一图像的B倍下采样处理。
此外,该B 2个第二特征图包括了该第一特征图中的所有像素,即该B 2个第二特征图中保留了该每个第一特征图中的所有图像信息;每个第二特征图中包括的像素之间的相对位置是根据该像素所属的图像块在第一特征图中的相对位置决定的,即由一个第一特征图得到的每个第二特征图都是该第一特征图的一个缩略图。
结合第一方面,在第一方面的第二种可能的实现方式中,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该通过卷积神经网络的第一图像处理层对该第一图像进行B倍的下采样处理,得到第二图像,包括:通过该第一图像处理层对该M张第一特征图进行卷积操作,得到该第二图像,该卷积操作在宽度方向和高度方向上的卷积步长均为B,该卷积操作采用N个卷积核,该N个卷积核中每个卷积核的高度为K个像素,该每个卷积核的宽度为J个像素,该每个卷积核的深度为M个特征图,该每张第一特征图填充的高度边界为P个像素,该每张第一特征图填充的宽度边界为P个像素,该第二图像包括N个第二特征图,该N个第二特征图中每个第二特征图的高度为
Figure PCTCN2018120830-appb-000001
个像素,该每个第二特征图的宽度为
Figure PCTCN2018120830-appb-000002
个像素,其中,N为大于0的整数,P为大于或等于0的整数,J和K大于或等于B。
应理解,卷积核是一种滤波器,用于提取图像的特征图。卷积核的尺寸包括宽度、高度和深度,其中,卷积核的深度与输入图像的深度相同。对一张输入图像使用多少种不同的卷积核进行卷积操作,就可以提取多少张不同的特征图。
可选地,在卷积神经网络的卷积层中,可以通过设置不同大小的卷积核、不同权重值或以不同的卷积步长对同一张图像进行多次卷积,以尽可能多的抽取该图像的特征。
还应理解,卷积步长是指卷积核在输入图像的特征图上滑动提取该输入图像的特征图的过程中,该卷积核在高度方向和宽度方向上执行两次卷积操作之间滑动的距离。
应理解,卷积步长可以决定输入图像的下采样倍率,例如,在宽度(或高度)方向上 的卷积步长为B,可以使输入特征图在宽度(或高度)方向上实现B倍的下采样。
还应理解,在卷积神经网络中,卷积层主要起到作用是抽取特征的作用。主要是根据设定的卷积核,对输入图像进行卷积操作。
应理解,在使用一个K×K的卷积核对一张二维的输入图像进行卷积操作时,将卷积核在该图像上滑动时覆盖的K×K的图像块与卷积核做点乘,即图像块上每个点的灰度值与卷积核上相同位置的权重值相乘,共得到K×K个结果,累加后加上偏置,得到一个结果,输出为输出图像的单一像素,该像素在该输出图像上的坐标位置对应该图像块的中心在该输入图像上的坐标位置。
还应理解,在使用卷积核对输入的一张三维图像进行卷积操作时,该卷积核的维度也需为三维,且该卷积核的第三维度(深度)与该三维图像的第三维度(深度或特征图数量)相同。该三维图像与三维卷积核的卷积操作,可以转化为将三维图像和卷积核以深度(图像通道数或特征图数量)维度拆分为多张二维的特征图与卷积核进行二维卷积操作,最终在图像深度这一维度进行累加,最终获得一张二维图像输出。
还应理解,在卷积神经网络中,卷积层的输出图像通常也包括多张特征图,三维的卷积核对三维输入图像进行处理后得到一张二维输出特征图,而获得多张输出特征图需要多个三维卷积核,因此卷积核的维度比输入图像的维度大1,增加的维度的数值对应输出图像的深度,即输出图像包括的特征图的数量。
还应理解,卷积操作分为填充(padding)方式或非padding方式。padding方式可以理解为一种图像的预处理操作,same padding方式包括相同填充(same padding)方式和有效填充(valid padding)方式。
还应理解,same padding方式是指对输入图像的宽度和高度都加上一个相同边界,并对加边界后的图像进行卷积操作,其中,该边界是指该输入图像的外边界。例如,输入图像的尺寸为5×5×2,采用same padding方式进行卷积操作时,该输入图像填充的高度边界为1个像素,该输入图像填充的宽度边界为1个像素,可以得到一个7×7×2的图像,再对该7×7×2的图像进行卷积操作。
应理解,在输入图像填充的宽度边界=(卷积核的宽度-1)/2、填充的高度边界=(卷积核的高度-1)/2,且卷积步长为1的情况下,该输入图像与卷积核进行卷积后,得到的输出图像与输入图像具有相同的宽度和高度。
可选地,一般来说,卷积核的尺寸为3×3时,输入图像填充的高度边界和宽度边界均为1个像素;卷积核的尺寸为5×5时,输入图像填充的高度边界和宽度边界均为2个像素;卷积核的尺寸为7×7时,输入图像填充的高度边界和宽度边界均为3个像素,但本申请实施例对此不作限定。
还应理解,本申请实施例采用same padding方式进行卷积操作时,仅以边界元素的取值全部为0作为示例性介绍,边界元素还可以取值还可以为其他值,本申请实施例对此不作限定。
可选地,假设输入特征图的宽度(或高度)为W,卷积核的宽度(或高度)为F,卷积步长为S,采用same padding方式进行卷积操作,该输入特征图填充的宽度(或高度)边界为P,则得到的输出特征图的宽度(或高度)可以表示为:
Figure PCTCN2018120830-appb-000003
其中,W、F 和S为大于0的整数,P为大于或等于0的整数,
Figure PCTCN2018120830-appb-000004
代表对·向下取整。
应理解,若卷积操作采用非padding方式进行卷积操作,则可以认为P为0。
本申请实施例提供的图像处理方法,通过卷积神经网络的第一图像处理层对该第一图像进行卷积操作,该卷积操作在宽度方向和高度方向上的卷积步长均为B,该卷积操作采用N个卷积核,该N个卷积核中每个卷积核的高度为K个像素,该每个卷积核的宽度为J个像素,该每个卷积核的深度为M个特征图,该第一图像填充的高度边界为P个像素,宽度边界为P个像素,能够实现对该第一图像的B倍下采样。
另外,J和K大于或等于B,能够使得卷积核在卷积过程中至少遍历该第一图像中的每个像素,即保留该第一图像中的所有图像信息。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,M、N和B满足以下公式:N≥M×B/2。
由于在通过卷积操作抽象图像特征过程中会损失输入图像所携带的小部分图像信息,可以通过增加输出图像包括的特征图的数量以达到更好的保留这部分图像信息的目的。
本申请实施例中,N≥M×B/2,即按一定限定条件增加第二图像的深度,可以达到弥补第一图像损失的图像信息的效果。
结合第一方面,在第一方面的第四种可能的实现方式中,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该通过卷积神经网络的第一图像处理层对该第一图像进行B倍的下采样处理,得到第二图像,包括:通过该第一图像处理层对该M张第一特征图中每张第一特征图进行池化操作,得到该第二图像,该池化操作在宽度方向和高度方向上的池化步长为B,该池化操作的池化核的高度为B个像素,该池化核的宽度为B个像素,该第二图像包括M张第二特征图,该M张第二特征图中每张第二特征图的高度为H/B个像素,该每张第二特征图的宽度为W/B个像素。其中,H/B和W/B均为整数。
可选地,两种常见的池化操作为均值池化(average pooling)和最大值池化(max pooling),上述两种的池化操作是在特征图的宽度和高度这两个维度进行处理,并不影响输出特征图的深度。
另外,均值池化操作是指在池化核滑过的每个区域中寻找平均值。
本申请实施例提供的图像处理方法,通过池化层对该第一图像进行B倍的下采样处理,能够减少特征层的数据量,从而降低卷积神经网络的计算复杂度,以及该卷积神经网络的缓存带宽需求。
结合第一方面、第一方面的第一种至第四种可能的实现方式中的任一种可能的实现方式,在第一方面的第五种可能的实现方式中,A和B为互质数。
由于若A和B存在公约数,下采样过程中可能会损失更多的图像信息,并且破坏图像纹理信息的连续性。
因此,本申请实施例提供的图像处理方法,A和B为互质数相比与A和B有公约数的情况,能够更多程度保障该第一图像的图像信息的完整性和图像纹理信息的连续性。
结合第一方面、第一方面的第一种至第五种可能的实现方式中的任一种可能的实现方式,在第一方面的第六种可能的实现方式中,该通过该卷积神经网络对该第二图像进行A 倍的上采样处理,得到第三图像,包括:通过该卷积神经网络对该第二图像进行第一处理,得到第四图像,对该第四图像进行A倍的上采样处理,得到该第三图像。
可选地,该第一处理为非上采样处理或下采样处理的操作,例如卷积步长为1的卷积操作等。
第二方面,本申请提供了一种图像处理方法,该方法包括:
获取第一图像;
通过卷积神经网络的第一图像处理层对该第一图像进行A倍的上采样处理,得到第二图像,该卷积神经网络包括多个图像处理层,该多个图像处理层包括该第一图像处理层,其中,A为大于1的整数;
通过该多个图像处理层中的第二图像处理层对该第二图像进行B倍的下采样处理,得到第三图像,其中,B为大于1的整数,且A不等于B。
本申请实施例提供的图像处理方法,通过卷积神经网络的第一图像处理层对获取到的第一图像进行A倍的上采样处理,得到第二图像,再通过该卷积神经网络的第二图像处理层对该第二图像进行B倍的下采样处理,得到第三图像,能够实现对该第一图像的非整数倍的上采样或非整数倍的下采样。
结合第二方面,在第二方面的第一种可能的实现方式中,该第二图像包括M张第二特征图,该M张第二特征图中每张第二特征图的高度为H个像素,该每张第二特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该通过卷积神经网络的第二图像处理层对该第二图像进行B倍的下采样处理,得到第三图像,包括:通过该第二图像处理层将该每张第二特征图划分为互不重叠的(H×W)/B 2个图像块,该(H×W)/B 2个图像块中每个图像块的高度为B个像素,该每个图像块的宽度为B个像素;根据该(H×W)/B 2个图像块,得到B 2张第三特征图,该B 2张第三特征图中每张第三特征图的高度为H/B个像素,该每张第三特征图的宽度为W/B个像素,该每张第三特征图中的每个像素取自该(H×W)/B 2个图像块中的不同图像块,该每个像素在该每张第三特征图中的位置与该每个像素所属的图像块在该第二特征图中的位置相关联。其中,(H×W)/B 2、H/B和W/B均为整数。
结合第二方面,在第二方面的第二种可能的实现方式中,该第二图像包括M张第二特征图,该M张第二特征图中每张第二特征图的高度为H个像素,该每张第二特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该通过卷积神经网络的第二图像处理层对该第二图像进行B倍的下采样处理,得到第三图像,包括:通过该第二图像处理层对该M张第二特征图进行卷积操作,得到该第三图像,该卷积操作在宽度方向和高度方向上的卷积步长均为B,该卷积操作采用N个卷积核,该N个卷积核中每个卷积核的高度为K个像素,该每个卷积核的宽度为J个像素,该每个卷积核的深度为M个特征图,该每张第二特征图填充的高度边界为P个像素,该每张第二特征图填充的宽度边界为P个像素,该第三图像包括N个第三特征图,该N个第三特征图中每个第三特征图的高度为
Figure PCTCN2018120830-appb-000005
个像素,该每个第三特征图的宽度为
Figure PCTCN2018120830-appb-000006
个像素,其中,N为大于0的整数,P为大于或等于0的整数,J和K大于或等于B。
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,M、N和B满足以下公式:N≥M×B/2。
结合第二方面,在第二方面的第四种可能的实现方式中,该第二图像包括M张第二特征图,该M张第二特征图中每张第二特征图的高度为H个像素,该每张第二特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该通过卷积神经网络的第二图像处理层对该第二图像进行B倍的下采样处理,得到第三图像,包括:通过该第二图像处理层对该M张第二特征图中每张第二特征图进行池化操作,得到该第三图像,该池化操作在宽度方向和高度方向上的池化步长为B,该池化操作的池化核的高度为B个像素,该池化核的宽度为B个像素,该第三图像包括M张第三特征图,该M张第三特征图中每张第三特征图的高度为H/B个像素,该每张第三特征图的宽度为W/B个像素。其中,(H×W)/B 2、H/B和W/B均为整数。
结合第二方面、第二方面的第一种至第四种可能的实现方式中的任意一种可能的实现方式,在第二方面的第五种可能的实现方式中,A和B为互质数。
第三方面,本申请提供了一种图像处理装置,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。
第四方面,本申请提供了一种图像处理装置,用于执行上述第二方面或第二方面的任意可能的实现方式中的方法。
第五方面,本申请提供了一种图像处理装置,该装置包括:存储器、处理器、通信接口及存储在该存储器上并可在该处理器上运行的计算机程序,其特征在于,该处理器执行该计算机程序时执行上述第一方面或第一方面的任意可能的实现方式中的方法。
第六方面,本申请提供了一种图像处理装置,该装置包括:存储器、处理器、通信接口及存储在该存储器上并可在该处理器上运行的计算机程序,其特征在于,该处理器执行该计算机程序时执行上述第二方面或第二方面的任意可能的实现方式中的方法。
第七方面,本申请提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。
第八方面,本申请提供了一种计算机可读介质,用于存储计算机程序,该计算机程序包括用于执行第二方面或第二方面的任意可能的实现方式中的方法的指令。
第九方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任意可能的实现方式中的方法。
第十方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面或第二方面的任意可能的实现方式中的方法。
第十一方面,本申请提供了一种芯片,包括:输入接口、输出接口、至少一个处理器、存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路互相通信,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。
第十二方面,本申请提供了一种芯片,包括:输入接口、输出接口、至少一个处理器、存储器,所述输入接口、输出接口、所述处理器以及所述存储器之间通过内部连接通路互相通信,所述处理器用于执行所述存储器中的代码,当所述代码被执行时,所述处理器用于执行上述第二方面或第二方面的任意可能的实现方式中的方法。
附图说明
图1是三维图像的高度、宽度和深度的示意图;
图2是卷积层实现卷积操作过程的示意图;
图3是池化层实现池化操作过程的示意图;
图4是亚像素卷积层实现亚像素卷积操作过程的示意图;
图5是本申请实施例提供的应用场景示意图;
图6是本申请实施例提供的图像处理方法的示意性流程图;
图7是本申请实施例提供的下采样处理的示意图;
图8是本申请实施例提供的采用不同尺寸的卷积核进行卷积操作的过程的示意图;
图9是本申请实施例提供的图像处理装置的示意性框图;
图10是本申请实施例提供的另一图像处理装置的示意性框图。
具体实施方式
为了清楚起见,首先对本申请中所使用的术语作以解释。
1、像素
像素是组成图像的最基本的元素,是一种逻辑尺寸单位。
2、图像的尺寸
图像的尺寸可以包括多个维度,当图像的维度为二维时,图像的尺寸包括高度和宽度;当图像的维度为三维时,图像的尺寸包括宽度、高度和深度。
应理解,图像的高度可以理解为该图像在高度方向上包括的像素的数量;图像的宽度可以理解为该图像在宽度方向上包括的像素的数量;图像的深度可以理解为该图像的通道数量。
在卷积神经网络模型中,图像的深度可以理解为图像包括的特征图(feature maps)的数量,其中,该图像的任意一张特征图的宽度和高度都与该图像的其他特征图的宽度和高度相同。
也就是说,一张图像为三维的图像,可以理解为该三维的图像是由多张二维的特征图构成的,且该多张二维的特征图的尺寸相同。
应理解,一张图像包括M张特征图、该M张特征图中每张特征图的高度为H个像素,给每个特征图的宽度为W个像素,可以理解为该图像为三维的图像,且该三维的图像的尺寸为H×W×M,也就是说,该三维的图像包括M张H×W的二维的特征图。其中,H、W为大于1的整数,M为大于0的整数。
图1示出了一张5×5×3的图像,该图像包括3张特征图(例如红色(red,R)特征图、绿色(green,G)特征图和蓝色(blue,B)特征图,每张特征图的尺寸为5×5。
应理解,不同色彩的特征图可以理解为是图像的不同通道,卷积神经网络中可以把不同的通道看作为不同的特征图。
还应理解,图1中仅以深度为3的图像为例进行描述,图像的深度还可以为其它取值,例如,例如灰度图像的深度为1,RGB-深度(depth,D)图像的深度为4等,本申请实施例对此不作限定。
还应理解,图像(或特征图)的分辨率可以理解为该图像(或特征图)的宽度与高度的积,即若图像(或特征图)的高度为H个像素,该图像(或特征图)的宽度为W个像 素,则该图像(或特征图)的分辨率为H×W。
3、卷积核
卷积核是一种滤波器,用于提取图像的特征图。卷积核的尺寸包括宽度、高度和深度,其中,卷积核的深度与输入图像的深度相同。对一张输入图像使用多少种不同的卷积核进行卷积操作,就可以提取多少张不同的特征图。
例如,采用一个5×5×3的卷积核对7×7×3的输入图像进行卷积操作,可以得到一个输出特征图,采用多个不同的5×5×3的卷积核对7×7×3的输入图像进行卷积操作,可以得到多个不同的输出特征图。
可选地,在卷积神经网络的卷积层中,可以通过设置不同大小的卷积核、不同权重值或以不同的卷积步长对同一张图像进行多次卷积,以尽可能多的抽取该图像的特征。
4、卷积步长
卷积步长是指卷积核在输入图像的特征图上滑动提取该输入图像的特征图的过程中,该卷积核在高度方向和宽度方向上执行两次卷积操作之间滑动的距离。
应理解,卷积步长可以决定输入图像的下采样倍率,例如,在宽度(或高度)方向上的卷积步长为B,可以使输入特征图在宽度(或高度)方向上实现B倍的下采样,B为大于1的整数。
5、卷积层(convolutional layer)
在卷积神经网络中,卷积层主要起到作用是抽取特征的作用。主要是根据设定的卷积核,对输入图像进行卷积操作。
应理解,在使用一个K×K的卷积核对一张二维的输入图像进行卷积操作时,将卷积核在该图像上滑动时覆盖的K×K的图像块与卷积核做点乘,即图像块上每个点的灰度值与卷积核上相同位置的权重值相乘,共得到K×K个结果,累加后加上偏置,得到一个结果,输出为输出图像的单一像素,该像素在该输出图像上的坐标位置对应该图像块的中心在该输入图像上的坐标位置。
还应理解,在使用卷积核对输入的一张三维图像进行卷积操作时,该卷积核的维度也需为三维,且该卷积核的第三维度(深度)与该三维图像的第三维度(深度或特征图数量)相同。该三维图像与三维卷积核的卷积操作,可以转化为将三维图像和卷积核以深度(图像通道数或特征图数量)维度拆分为多张二维的特征图与卷积核进行二维卷积操作,最终在图像深度这一维度进行累加,最终获得一张二维图像输出。
还应理解,在卷积神经网络中,卷积层的输出图像通常也包括多张特征图,三维的卷积核对三维输入图像进行处理后得到一张二维输出特征图,而获得多张输出特征图需要多个三维卷积核,因此卷积核的维度比输入图像的维度大1,增加的维度的数值对应输出图像的深度,即输出图像包括的特征图的数量。
还应理解,卷积操作分为填充(padding)方式或非padding方式。padding方式可以理解为一种图像的预处理操作,same padding方式包括相同填充(same padding)方式和有效填充(valid padding)方式。
应理解,本申请实施例中所述的padding方式,均指same padding方式,但本申请实施例不限于此。
还应理解,same padding方式是指对输入图像的宽度和高度都加上一个相同边界,并 对加边界后的图像进行卷积操作,其中,该边界是指该输入图像的外边界。例如,输入图像的尺寸为5×5×2,采用same padding方式进行卷积操作时,该输入图像填充的高度边界为1个像素,该输入图像填充的宽度边界为1个像素,可以得到一个7×7×2的图像,再对该7×7×2的图像进行卷积操作。
应理解,在输入图像填充的宽度边界=(卷积核的宽度-1)/2、填充的高度边界=(卷积核的高度-1)/2,且卷积步长为1的情况下,该输入图像与卷积核进行卷积后,得到的输出图像与输入图像具有相同的宽度和高度。
可选地,一般来说,卷积核的尺寸为3×3时,输入图像填充的高度边界和宽度边界均为1个像素;卷积核的尺寸为5×5时,输入图像填充的高度边界和宽度边界均为2个像素;卷积核的尺寸为7×7时,输入图像填充的高度边界和宽度边界均为3个像素,但本申请实施例对此不作限定。
还应理解,本申请实施例采用same padding方式进行卷积操作时,仅以边界元素的取值全部为0作为示例性介绍,边界元素还可以取值还可以为其他值,本申请实施例对此不作限定。
还应理解,假设输入特征图的宽度(或高度)为W,卷积核的宽度(或高度)为F,卷积步长为S,采用same padding方式进行卷积操作,该输入特征图填充的宽度(或高度)边界为P,则得到的输出特征图的宽度(或高度)可以表示为:
Figure PCTCN2018120830-appb-000007
其中,W、F和S为大于0的整数,P为大于或等于0的整数,
Figure PCTCN2018120830-appb-000008
代表对·向下取整。
图2中示出了卷积层对输入图像进行卷积操作的过程,三维的输入图像的尺寸为5×5×3,该输入图像填充的高度边界和宽度边界均为1个像素,得到7×7×3的输入图像,卷积操作在宽度方向和高度方向上的卷积步长为2,卷积操作采用卷积核w0进行卷积,卷积核w0的尺寸为3×3×3,将该输入图像包括的3张输入特征图(输入特征图1、输入特征图2和输入特征图3)分别与卷积核的三层深度(卷积核w0-1、卷积核w0-2和卷积核w0-3)进行卷积,得到输出特征图1,该输出特征图1的尺寸为3×3×2。
具体地,w0的第一层深度(即w0-1)和输入特征图1蓝色方框中对应位置的元素相乘再求和得到0,同理,卷积核w0的其他两个深度(即w0-2和w0-3)分别与输入特征图2和输入特征图3进行卷积操作,得到2和0,则图1中输出特征图1的第一个元素为0+2+0=2。经过卷积核w0的第一次卷积操作后,蓝色方框依次沿着每个输入特征图的宽度方向和高度方向上滑动,继续进行下一次卷积操作,其中,每次滑动的距离为2(即宽度和高度方向上的卷积步长均为2),直到完成对该输入图像的卷积操作,得到3×3×1的输出特征图1。
同理,若卷积操作还采用另1个卷积核w1对该输入图像进行卷积,基于与卷积核w0类似的过程,可以得到输出特征图2,该输出特征图2的尺寸为3×3×2。
可选地,还可以通过激活函数对该该输出特征图1和该输出特征图2进行激活,得到激活后的输出特征图1和激活后的输出特征图2。
6、池化层(padding layer)
池化层一方面使特征图的宽度和高度变小,通过减少特征层数据量降低卷积神经网络计算复杂度;一方面进行特征压缩,提取主要特征。
可选地,两种常见的池化操作为均值池化(average pooling)和最大值池化(max pooling),上述两种的池化操作是在特征图的宽度和高度这两个维度进行处理,并不影响输出特征图的深度。
图3示出了池化层对输入图像进行池化操作的过程,输入图像为4×4×1的图像,通过2×2池化核对该输入图像进行max pooling操作,即在池化核滑过的每个区域中寻找最大值作为输出图像中的一个像素,输出图像中每个像素的位置与该每个像素所属的区域在该输入图像中的位置相同,其中,池化步长为2,最终在输入图像中提取主要特征得到输出图像。
另外,均值池化操作是指在池化核滑过的每个区域中寻找平均值。
7、反卷积层(deconvolution layer)
反卷积层也称反置卷积层(transposed convolution layer),通过设定反卷积步长可以决定输入图像的下采样倍率,例如,在宽度(或高度)方向上的卷积步长为A,可以使输入特征图在宽度(或高度)方向上实现A倍的下采样,B为大于1的整数。
应理解,反卷积操作可以理解为如图2所示的卷积操作的逆过程。
还应理解,假设输入特征图的宽度(或高度)为W,卷积核的宽度(或高度)为F,反卷积步长为S,对输出特征图裁剪的宽度(或高度)边界为P,则得到的输出特征图的宽度(或高度)可以表示为:S×(W-1)+F-2P,其中,W、F和S为大于0的整数,P为大于或等于0的整数。
例如,在使用一个3×3×3的卷积核对5×5×1输入图像进行反卷积操作,将该输入图像的输入特征图中的每个像素与3×3反卷积核的第一层深度上的每个权重相乘,得到与该每个像素对应的3×3的图像块,将该图像块放置于一个7×7×1的输出特征图1上,且该图像块的中心位置为该每个像素的位置,相邻两个图像块的中心位置之间的距离等于反卷积步长。然后,将该输出特征图中每个像素上被赋予的多个值做累加,得到最终的一张输出特征图,同理,可得到通过反卷积核的第二层深度和第三层深度对输入特征图进行反卷积后得到输出特征图2和输出特征图3,然后对3个特征图边界1个像素进行裁剪得到5×5×3的输出图像。
8、亚像素卷积层(sub-pixel convolutional layer)
亚像素卷积层通过整合输入图像深度这一维度中的多张特征图,实现对输入图像的宽度和高度的整数比例放大的作用。
亚像素卷积操作可以被理解为一种对输入图像包括的特征图进行数据排列重组的方法。
例如,亚像素卷积层的输入特征层的尺寸为H×W×r 2(r是图像的放大倍数)时,亚像素卷积层将r 2个特征图中相同位置的像素重新排列成一个r×r的图像块,对应于输出特征图中的一个r×r的图像块,该r×r的图像块的中心位置为该每个像素,从而H×W×r 2的输入特征图被重新排列成rH×rW×1的输出特征图。这个变换虽然被称作亚像素卷积,但实际上并没有卷积操作,是对该r 2个H×W的输入特征图中像素的排列组合的过程。
例如,如图4所示,4个2×2的输入特征图,通过亚像素卷积层的亚像素卷积操作后,得到一个如图4中所示的输出特征图,该输出特征图的尺寸为4×4,应理解,为描述清楚起见,图4中每个像素上带括号的数字表示该像素的编号或标识,而不是该像素上的像素 值。
应理解,本申请实施例提供的技术方案可以应用于各种需要对输入图像进行图像处理以得到相应地输出图像的场景,本申请实施例对此不作限定。
例如,如图5所示,本发明实施例的技术方案可以应用于终端设备,该终端设备可以为移动的或固定的,例如该终端设备可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)等,本发明实施例对此不作限定。
图6示出了本申请实施例提供的图像处理方法600的示意性流程图,该方法例如可以由图像处理装置执行。
S610,获取第一图像。
S620,通过卷积神经网络的第一图像处理层对该第一图像进行B倍的下采样处理,得到第二图像,该卷积神经网络包括多个图像处理层,该多个图像处理层包括该第一图像处理层,其中,B为大于1的整数。
S630,通过该多个图像处理层中的第二图像处理层,该卷积神经网络的第二图像处理层对该第二图像进行A倍的上采样处理,得到第三图像,其中,A为大于1的整数,且A不等于B。
本申请实施例提供的图像处理方法,通过卷积神经网络的第一图像处理层对获取到的第一图像进行B倍的下采样处理,得到第二图像,再通过该卷积神经网络的第二图像处理层对该第二图像进行A倍的上采样处理,得到第三图像,能够实现对该第一图像的非整数倍的上采样或非整数倍的下采样。
还应理解,本申请实施例中,下采样的倍率B大于上采样的倍率A时,能够实现对该第一图像的非整数倍的下采样;下采样的倍率B小于上采样的倍率A时,能够实现对该第一图像的非整数倍的上采样。
假设该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数。
应理解,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素、该每张第一特征图的宽度为W个像素,可以理解为该第一图像为三维的图像,该三维的第一图像的尺寸为H×W×M,即该第一图像的高度为H个像素、宽度为W个像素、深度为M个第一特征图,也就是说,该三维的第一图像包括M张H×W的二维的第一特征图。
可选地,该第一图像可以为原始采集的待处理图像,或者该第一图像可以为经过预处理的图像,或者该第一图像可以为经过该卷积神经网络中的其它图像处理层处理后得到的图像,或者该第一图像可以为经过其他图像处理装置处理后得到的图像,本申请实施例对此不作限定。
可选地,本申请实施例中可以通过多种不同的方式获取该第一图像,本申请实施例对此不作限定。
例如,该第一图像为原始采集的待处理图像时,S610可以为从图像采集装置获取该第一图像;该第一图像为经过该卷积神经网络中的其他图像处理层处理后得到的图像时,S610可以为获取该其它图像处理层输出的该第一图像;该第一图像为经过其他图像处理装置处理后得到的图像时,S610可以为获取该其它图像处理装置输出的该第一图像。
应理解,本申请实施例中所述的卷积神经网络可以包括多个图像处理层,其中,该卷积神经网络的第一图像处理层可以包括该多个图像处理层中的一部分图像处理层,该卷积神经网络的第二图像处理层可以包括该多个图像处理层中的另一部分图像处理层,本申请实施例对此不作限定。
可选地,S620中,该图像处理装置可以通过该第一图像处理层采用多种不同操作,对该第一图像进行B倍的下采样处理,得到该第二图像,本申请实施例对此不作限定。
作为一个可选实施例,该图像处理装置通过该第一图像处理层将该每张第一特征图划分为互不重叠的(H×W)/B 2个图像块,该(H×W)/B 2个图像块中每个图像块的高度为B个像素,该每个图像块的宽度为B个像素;根据该(H×W)/B 2个图像块,得到B 2张第二特征图,该B 2张第二特征图中每张第二特征图的高度为H/B个像素,该每张第二特征图的宽度为W/B个像素,该每张第二特征图中的每个像素取自该(H×W)/B 2个图像块中的不同图像块,该每个像素在该每张第二特征图中的位置与该每个像素所属的图像块在该第一特征图中的位置相关联。其中,(H×W)/B 2、H/B和W/B均为整数。
应理解,对一张第一特征图进行B倍的下采样处理,可以得到B 2张第二特征图,那么对M张第一特征图进行B倍的下采样处理,可以得到M×B 2张第二特征图,即得到该第二图像。
还应理解,该第二图像包括M×B 2张第二特征图,该M×B 2张第二特征图中每张第二特征图的高度为H/B个像素、该每张第二特征图的宽度为H/B个像素,可以理解为该第二图像为三维图像,该三维的第一图像的尺寸为(H/B)×(W/B)×(M×B 2),即该第一图像的高度为H/B个像素、宽度为W/B个像素、深度为M×B 2个第一特征图,也就是说,该三维的第一图像包括M×B 2张(H/B)×(W/B)的二维的第二特征图。
还应理解,该每个像素在该每张第二特征图中的位置与该每个像素所属的图像块在该第一特征图中的位置相关联,可以理解为,该第二特征图中包括的每个像素在该第二特征图中的相对位置与该每个像素所属的图像块在该第一特征图中的相对位置相同。
例如,图6示出了4×4×1的输入图像经卷积神经网络进行处理得到输出图像的示意图,应理解,为描述清楚起见,图6中每个像素上带括号的数字为该像素的编号或标识,而不是该像素上的像素值。
如图6所示,该输入图像包括1张4×4的输入特征图,将该输入特征图划分为4个2×2的图像块;图像块1包括编号为1、2、5、6的像素,图像块2包括编号为3、4、7、8的像素,图像块3包括编号为9、10、13、14的像素,图像块4包括编号为11、12、15、16的像素;从每个图像块的左上角位置取出一个像素组成输出特征图1,其中,该输出特征图1中的每个像素在该输出特征图1中的相对位置与该每个像素所属的图像块在该输入特征图中的相对位置相同,即输出特征图1中编号为1、3、9、11的像素的相对位置与图像块1、2、3、4的相对位置相同。
也就是说,不管该输出特征图1沿任何方向平移,该输出特征图中包括的像素之间的 相对位置是不变的。
同理,从每个图像块的右上角位置取出一个像素组成输出特征图2,从每个图像块的左下角位置取出一个像素组成输出特征图3,以及从每个图像块的右下角位置取出一个像素组成输出特征图4,即得到输出图像的尺寸为2×2×4。
本申请实施例提供的图像处理方法,通过对每个第一特征图中包括的像素进行拆分和组合重排,得到B 2个第二特征图,能够实现对该第一图像的B倍下采样处理。
此外,该B 2个第二特征图包括了该第一特征图中的所有像素,即该B 2个第二特征图中保留了该每个第一特征图中的所有图像信息;每个第二特征图中包括的像素之间的相对位置是根据该像素所属的图像块在第一特征图中的相对位置决定的,即由一个第一特征图得到的每个第二特征图都是该第一特征图的一个缩略图。
作为另一个可选实施例,该图像处理装置通过该第一图像处理层对该M张第一特征图进行卷积操作,得到该第二图像,该卷积操作在宽度方向和高度方向上的卷积步长均为B,该卷积操作采用N个卷积核,该N个卷积核中每个卷积核的高度为K个像素,该每个卷积核的宽度为J个像素,该每个卷积核的深度为M个特征图,该每张第一特征图填充的高度边界为P个像素,该每张第一特征图填充的宽度边界为P个像素,该第二图像包括N个第二特征图,该N个第二特征图中每个第二特征图的高度为
Figure PCTCN2018120830-appb-000009
个像素,该每个第二特征图的宽度为
Figure PCTCN2018120830-appb-000010
个像素,其中,N为大于0的整数,P为大于或等于0的整数,J和K大于或等于B。
可选地,该第一图像处理层例如可以为卷积层,但本申请实施例对此不作限定。
例如,图7示出了6×6×1的输入图像,通过same padding的卷积方式,高度方向和宽度方向上的卷积步长均为3个像素,当采用3个不同尺寸的卷积核例如,1×1×1、3×3×1和5×5×1进行卷积操作时,卷积过程的示意图。
由图7可以获知,(1)当卷积核的宽度(或高度)小于在宽度(或高度)方向上的卷积步长(例如图7中卷积核的尺寸为1×1×1,宽度和高度方向上的卷积步长为3个像素的卷积过程)时,相邻两次卷积操作中卷积核对应的卷积区域之间没有重叠,且卷积核滑动的过程中并没有覆盖到输入图像的所有像素,这种情况下,从信号源的方面考虑,输出图像会由于并没有使用该输入图像的全部像素计算而造成大量图像信息的流失。
(2)当卷积核的宽度(或高度)等于在宽度(或高度)方向上的卷积步长(例如图7中卷积核的尺寸为3×3×1,宽度和高度方向上的卷积步长为3个像素的卷积过程)时,相邻两次卷积操作中卷积核对应的卷积区域之间没有重叠,但卷积核滑动的过程中覆盖到输入图像的所有像素。
(3)当卷积核的宽度(或高度)大于在宽度(或高度)方向上的卷积步长(例如图7中卷积核的尺寸为5×5×1,宽度和高度方向上的卷积步长为3个像素的卷积过程)时,相邻两次卷积操作中卷积核对应的卷积区域之间有重叠,且卷积核滑动的过程中覆盖到输入图像的所有像素。
因此,在(2)和(3)这两种情况下,从信号源的方面考虑,输出图像并不会由于并没有使用输入图像的全部像素计算而造成大量图像信息的流失。
应理解,由于在抽象图像特征过程中会损失输入图像所携带的小部分图像信息,可以 通过增加输出图像包括的特征图的数量以达到更好的保留这部分图像信息的目的。
因此,在采用方式二对该第一图像进行B倍的下采样处理时,M、N和B可以满足以下公式:N≥M×B/2,即按一定限定条件增加输出图像的深度,以达到弥补输入图像损失的图像信息的效果。
本申请实施例提供的图像处理方法,通过该第一图像处理层对该第一图像进行卷积操作,该卷积操作在宽度方向和高度方向上的卷积步长均为B,该卷积操作采用N个卷积核,该N个卷积核中每个卷积核的高度为K个像素,该每个卷积核的宽度为J个像素,该每个卷积核的深度为M个特征图,该第一图像填充的高度边界为P个像素,宽度边界为P个像素,能够实现对该第一图像的B倍下采样。
另外,J和K大于或等于B,能够使得卷积核在卷积过程中至少遍历该第一图像中的每个像素,即保留该第一图像中的所有图像信息。
作为又一个可选实施例,该图像处理装置通过该第一图像处理层对该M张第一特征图进行池化操作,得到该第二图像,该池化操作在宽度方向和高度方向上的池化步长为B,该池化操作的池化核的高度为B个像素,该池化核的宽度为B个像素,该第二图像包括M张第二特征图,该M张第二特征图中每张第二特征图的高度为H/B个像素,该每张第二特征图的宽度为W/B个像素。其中,H/B和W/B均为整数。
可选地,该第一图像处理层可以为池化层,本申请实施例对此不作限定。
具体过程,可以参考上述术语解释中对池化层的解释以及图3,为避免重复,此处不再赘述。
可选地,该第一图像处理层可以包括P个子图像处理层,P为大于1的整数。
相应地,该图像处理装置通过该P个子图像处理层中的每个子图像处理层实现对该第一图像的B i倍的下采样处理,其中,B i满足以下公式:
Figure PCTCN2018120830-appb-000011
其中,B i>1。
可选地,S630中,该图像处理装置可以通过该第二图像处理层采用多种不同的操作,对该第二图像进行A倍的上采样处理,得到该第三图像,本申请实施例对此不作限定。
作为一个可选实施例,该图像处理装置可以通过该第二图像处理层对该第二图像进行反卷积操作,得到该第三图像,该反卷积操作在宽度和高度上的反卷积步长为A。
可选地,该第二图像处理层可以为反卷积层,但本申请实施例对此不作限定。
具体的上采样过程,可以参考上述术语解释中对反卷积层实现反卷积操作过程的解释,为避免重复,此处不再赘述。
作为另一个可选实施例,该图像处理装置可以通过该第二图像处理层对该第二图像进行亚像素卷积操作,则该图像处理装置对该第二图像进行亚像素卷积操作,得到第三图像,其中,该第三图像的宽度(或高度)为第二图像的宽度(或高度)的A倍,该第三图像的深度为该第二图像的深度的1/A。
可选地,该第二图像处理层可以为亚像素卷积层,但本申请实施例对此不作限定。
具体的上采样过程,可以参考上述术语解释中对亚像素卷积层实现亚像素卷积过程的解释以及图4,为避免重复,此处不再赘述。
可选地,该第二图像处理层可以包括Q个子图像处理层,Q为大于1的整数。
相应地,该图像处理装置通过该Q个子图像处理层中的每个子图像处理层实现对该第二图像的A i倍的上采样处理,其中,A i满足以下公式:
Figure PCTCN2018120830-appb-000012
其中,A i>1。
可选地,S630可以包括:该图像处理装置通过该第二图像处理层对该第二图像进行第一处理,得到第四图像,通过该第二图像处理层对该第四图像进行A倍的上采样处理,得到该第三图像。
其中,该第一处理可以为非采样处理(非采样处理包括非上采样处理和非下采样处理),例如使用宽度方向和高度方向上的卷积步长为1的卷积操作对第二图像进行处理,或者采用激活函数对该第二图像进行处理等,本申请实施例对此不作限定。
应理解,本申请实施例中,该图像处理装置可以先对该第一图像进行整数倍下采样处理,得到第二图像,再对该第二图像进行整数倍的上采样处理,得到第三图像,以实现对该第一图像的非整数倍下采样或非整数倍的上采样,但本申请实施例的保护范围不应受限于此。相应地,该图像处理装置也可以先对第一图像进行A倍的上采样处理,得到第五图像,再对该第五图像进行B倍的下采样处理,得到第六图像,同样可以实现对该第一图像的非整数倍的下采样或非整数倍的上采样,因此,该方案也应在本申请实施例的保护范围内。
上面结合图1至图8详细描述了本申请实施例提供的图像处理方法,下面将结合图9至同10介绍本申请实施例提供的图像处理装置。
图9示出了本申请实施例提供的图像处理装置900的示意性框图。该装置900包括:
获取单元910,用于获取第一图像;
处理单元920,用于通过卷积神经网络的第一图像处理层对该获取单元910获取的该第一图像进行B倍的下采样处理,得到第二图像,该卷积神经网络包括多个图像处理层,该多个图像处理层包括该第一图像处理层,其中,B为大于1的整数;通过该多个图像处理层中的第二图像处理层对该第二图像进行A倍的上采样处理,得到第三图像,其中,A为大于1的整数,且A不等于B。
可选地,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该处理单元具体用于:将该每张第一特征图划分为互不重叠的(H×W)/B 2个图像块,该(H×W)/B 2个图像块中每个图像块的高度为B个像素,该每个图像块的宽度为B个像素;根据该(H×W)/B 2个图像块,得到B 2张第二特征图,该B 2张第二特征图中每张第二特征图的高度为H/B个像素,该每张第二特征图的宽度为W/B个像素,该每张第二特征图中的每个像素取自该(H×W)/B 2个图像块中的不同图像块,该每个像素在该每张第二特征图中的位置与该每个像素所属的图像块在该第一特征图中的位置相关联。其中,(H×W)/B 2、H/B和W/B均为整数。
可选地,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该处理单元具体用于通过该第一图像处理层对该M张第一特征图进行卷积操作,得到该第二图像,该卷积操作在宽度方向和高度方向上的卷积步长均为B,该卷积操作采用N个卷积核,该N个卷积核中每个卷积核的高度为K个像素,该每个卷积核 的宽度为J个像素,该每个卷积核的深度为M个特征图,该每张第一特征图填充的高度边界为P个像素,该每张第一图像填充的宽度边界为P个像素,该第二图像包括N个第二特征图,该N个第二特征图中每个第二特征图的高度为
Figure PCTCN2018120830-appb-000013
个像素,该每个第二特征图的宽度为
Figure PCTCN2018120830-appb-000014
个像素,其中,N为大于0的整数,P为大于或等于0的整数,J和K大于或等于B。
可选地,M、N和B满足以下公式:N≥M×B/2。
可选地,该第一图像包括M张第一特征图,该M张第一特征图中每张第一特征图的高度为H个像素,该每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;该处理单元具体用于通过该第一图像处理层对该M张第一特征图进行池化操作,得到该第二图像,该池化操作在宽度方向和高度方向上的池化步长为B,该池化操作的池化核的高度为B个像素,该池化核的宽度为B个像素,该第二图像包括M张第二特征图,该M张第二特征图中每张第二特征图的高度为H/B个像素,该每张第二特征图的宽度为W/B个像素。其中,H/B和W/B均为整数。
可选地,A和B为互质数。
应理解,这里的图像处理装置900以功能单元的形式体现。这里的术语“单元”可以指应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。在一个可选例子中,本领域技术人员可以理解,图像处理装置900可以具体为上述方法600实施例中的图像处理装置,图像处理装置900可以用于执行上述方法600实施例中与图像处理装置对应的各个流程和/或步骤,为避免重复,在此不再赘述。
图10示出了本申请实施例提供的图像处理装置1000的示意性框图,该图像处理装置1000可以为图9中所述的图像处理装置,该图像处理装置可以采用如图10所示的硬件架构。该图像处理装置可以包括处理器1010、通信接口1020和存储器1030,该处理器1010、通信接口1020和存储器1030通过内部连接通路互相通信。图9中的处理单元920所实现的相关功能可以由处理器1010来实现,获取单元910所实现的相关功能可以由处理器1010控制通信接口1020来实现。
该处理器1010可以包括是一个或多个处理器,例如包括一个或多个中央处理单元(central processing unit,CPU),在处理器是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
该通信接口1020用于发送和/或接收数据。该通信接口可以包括发送接口和接收接口,发送接口用于发送数据,接收接口用于接收数据。
该存储器1030包括但不限于是随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程存储器(erasable programmable read only memory,EPROM)、只读光盘(compact disc read-only memory,CD-ROM),该存储器1030用于存储相关指令及数据。
存储器1030用于存储图像处理装置的程序代码和数据,可以为单独的器件或集成在处理器1010中。
具体地,所述处理器1010用于控制通信接口与其它装置,例如与其他图像处理装置进行数据传输。具体可参见方法实施例中的描述,在此不再赘述。
可以理解的是,图10仅仅示出了图像处理装置的简化设计。在实际应用中,图像检索装置还可以分别包含必要的其他元件,包含但不限于任意数量的通信接口、处理器、控制器、存储器等,而所有可以实现本申请的图像处理装置都在本申请的保护范围之内。
在一种可能的设计中,图像处理装置1000可以被替换为芯片装置,例如可以为可用于图像处理装置中的芯片,用于实现图像处理装置中处理器1010的相关功能。该芯片装置可以为实现相关功能的现场可编程门阵列,专用集成芯片,系统芯片,中央处理器,网络处理器,数字信号处理电路,微控制器,还可以采用可编程控制器或其他集成芯片。该芯片中,可选的可以包括一个或多个存储器,用于存储程序代码,当所述代码被执行时,使得处理器实现相应的功能。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种图像处理方法,其特征在于,包括:
    获取第一图像;
    通过卷积神经网络的第一图像处理层对所述第一图像进行B倍的下采样处理,得到第二图像,所述卷积神经网络包括多个图像处理层,所述多个图像处理层包括所述第一图像处理层,其中,B为大于1的整数;
    通过所述多个图像处理层中的第二图像处理层对所述第二图像进行A倍的上采样处理,得到第三图像,其中,A为大于1的整数,且A不等于B。
  2. 根据权利要求1所述的方法,其特征在于,所述第一图像包括M张第一特征图,所述M张第一特征图中每张第一特征图的高度为H个像素,所述每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;
    所述通过卷积神经网络的第一图像处理层对所述第一图像进行B倍的下采样处理,得到第二图像,包括:
    将所述每张第一特征图划分为互不重叠的(H×W)/B 2个图像块,所述(H×W)/B 2个图像块中每个图像块的高度为B个像素,所述每个图像块的宽度为B个像素;
    根据所述(H×W)/B 2个图像块,得到B 2张第二特征图,所述B 2张第二特征图中每张第二特征图的高度为H/B个像素,所述每张第二特征图的宽度为W/B个像素,所述每张第二特征图中的每个像素取自所述(H×W)/B 2个图像块中的不同图像块,所述每个像素在所述每张第二特征图中的位置与所述每个像素所属的图像块在所述第一特征图中的位置相关联。
  3. 根据权利要求1所述的方法,其特征在于,所述第一图像包括M张第一特征图,所述M张第一特征图中每张第一特征图的高度为H个像素,所述每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;
    所述通过卷积神经网络的第一图像处理层对所述第一图像进行B倍的下采样处理,得到第二图像,包括:
    通过所述第一图像处理层对所述M张第一特征图进行卷积操作,得到所述第二图像,所述卷积操作在宽度方向和高度方向上的卷积步长均为B,所述卷积操作采用N个卷积核,所述N个卷积核中每个卷积核的高度为K个像素,所述每个卷积核的宽度为J个像素,所述每个卷积核的深度为M个特征图,所述每张第一特征图填充的高度边界为P个像素,所述每张第一特征图填充的宽度边界为P个像素,所述第二图像包括N个第二特征图,所述N个第二特征图中每个第二特征图的高度为
    Figure PCTCN2018120830-appb-100001
    个像素,所述每个第二特征图的宽度为
    Figure PCTCN2018120830-appb-100002
    个像素,其中,N为大于0的整数,P为大于或等于0的整数,J和K大于或等于B。
  4. 根据权利要求3所述的方法,其特征在于,M、N和B满足以下公式:N≥M×B/2。
  5. 根据权利要求1所述的方法,其特征在于,所述第一图像包括M张第一特征图,所述M张第一特征图中每张第一特征图的高度为H个像素,所述每张第一特征图的宽度 为W个像素,H和W为大于1的整数,M为大于0的整数;
    所述通过卷积神经网络的第一图像处理层对所述第一图像进行B倍的下采样处理,得到第二图像,包括:
    通过所述第一图像处理层对所述M张第一特征图中每张第一特征图进行池化操作,得到所述第二图像,所述池化操作在宽度方向和高度方向上的池化步长为B,所述池化操作的池化核的高度为B个像素,所述池化核的宽度为B个像素,所述第二图像包括M张第二特征图,所述M张第二特征图中每张第二特征图的高度为H/B个像素,所述每张第二特征图的宽度为W/B个像素。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,A和B为互质数。
  7. 一种图像处理装置,其特征在于,包括:
    获取单元,用于获取第一图像;
    处理单元,用于通过卷积神经网络的第一图像处理层对所述获取单元获取的所述第一图像进行B倍的下采样处理,得到第二图像,所述卷积神经网络包括多个图像处理层,所述多个图像处理层包括所述第一图像处理层,其中,B为大于1的整数;通过所述多个图像处理层中的第二图像处理层对所述第二图像进行A倍的上采样处理,得到第三图像,其中,A为大于1的整数,且A不等于B。
  8. 根据权利要求7所述的装置,其特征在于,所述第一图像包括M张第一特征图,所述M张第一特征图中每张第一特征图的高度为H个像素,所述每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;
    所述处理单元具体用于:
    将所述每张第一特征图划分为互不重叠的(H×W)/B 2个图像块,所述(H×W)/B 2个图像块中每个图像块的高度为B个像素,所述每个图像块的宽度为B个像素;
    根据所述(H×W)/B 2个图像块,得到B 2张第二特征图,所述B 2张第二特征图中每张第二特征图的高度为H/B个像素,所述每张第二特征图的宽度为W/B个像素,所述每张第二特征图中的每个像素取自所述(H×W)/B 2个图像块中的不同图像块,所述每个像素在所述每张第二特征图中的位置与所述每个像素所属的图像块在所述第一特征图中的位置相关联。
  9. 根据权利要求7所述的装置,其特征在于,所述第一图像包括M张第一特征图,所述M张第一特征图中每张第一特征图的高度为H个像素,所述每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;
    所述处理单元具体用于通过所述第一图像处理层对所述M张第一特征图进行卷积操作,得到所述第二图像,所述卷积操作在宽度方向和高度方向上的卷积步长均为B,所述卷积操作采用N个卷积核,所述N个卷积核中每个卷积核的高度为K个像素,所述每个卷积核的宽度为J个像素,所述每个卷积核的深度为M个特征图,所述每张第一特征图填充的高度边界为P个像素,所述每张第一图像填充的宽度边界为P个像素,所述第二图像包括N个第二特征图,所述N个第二特征图中每个第二特征图的高度为
    Figure PCTCN2018120830-appb-100003
    个像素,所述每个第二特征图的宽度为
    Figure PCTCN2018120830-appb-100004
    个像素,其中,N为大于0的整数,P为大于或等于0的整数,J和K大于或等于B。
  10. 根据权利要求9所述的装置,其特征在于,M、N和B满足以下公式:N≥M×B/2。
  11. 根据权利要求7所述的装置,其特征在于,所述第一图像包括M张第一特征图,所述M张第一特征图中每张第一特征图的高度为H个像素,所述每张第一特征图的宽度为W个像素,H和W为大于1的整数,M为大于0的整数;
    所述处理单元具体用于通过所述第一图像处理层对所述M张第一特征图中每张第一特征图进行池化操作,得到所述第二图像,所述池化操作在宽度方向和高度方向上的池化步长为B,所述池化操作的池化核的高度为B个像素,所述池化核的宽度为B个像素,所述第二图像包括M张第二特征图,所述M张第二特征图中每张第二特征图的高度为H/B个像素,所述每张第二特征图的宽度为W/B个像素。
  12. 根据权利要求7至11中任一项所述的装置,其特征在于,A和B为互质数。
  13. 一种图像处理装置,所述装置包括存储器、处理器、通信接口及存储在所述存储器上并可在所述处理器上运行的指令,其中,所述存储器、所述处理器以及所述通信接口之间通过内部连接通路互相通信,其特征在于,所述处理器执行所述指令使得所述装置实现上述权利要求1至权利要求6中任一项所述的方法。
  14. 一种计算机可读介质,用于存储计算机程序,其特征在于,所述计算机程序包括用于实现上述权利要求1至权利要求6中任一项所述的方法的指令。
  15. 一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机上运行时,使得计算机实现上述权利要求1至权利要求6中任一项所述的方法。
PCT/CN2018/120830 2017-12-29 2018-12-13 图像处理方法和装置 WO2019128726A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711471002.9A CN109996023B (zh) 2017-12-29 2017-12-29 图像处理方法和装置
CN201711471002.9 2017-12-29

Publications (1)

Publication Number Publication Date
WO2019128726A1 true WO2019128726A1 (zh) 2019-07-04

Family

ID=67066527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/120830 WO2019128726A1 (zh) 2017-12-29 2018-12-13 图像处理方法和装置

Country Status (2)

Country Link
CN (1) CN109996023B (zh)
WO (1) WO2019128726A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110333827B (zh) * 2019-07-11 2023-08-08 山东浪潮科学研究院有限公司 一种数据加载装置和数据加载方法
CN111181574A (zh) * 2019-12-30 2020-05-19 浪潮(北京)电子信息产业有限公司 一种基于多层特征融合的端点检测方法、装置以及设备
CN111798385B (zh) * 2020-06-10 2023-09-15 Oppo广东移动通信有限公司 图像处理方法及装置、计算机可读介质和电子设备
CN113919405B (zh) * 2020-07-07 2024-01-19 华为技术有限公司 数据处理方法、装置与相关设备
CN112149694B (zh) * 2020-08-28 2024-04-05 特斯联科技集团有限公司 一种基于卷积神经网络池化模块的图像处理方法、系统、存储介质及终端
CN112232361B (zh) * 2020-10-13 2021-09-21 国网电子商务有限公司 图像处理的方法及装置、电子设备及计算机可读存储介质
CN112733685A (zh) * 2020-12-31 2021-04-30 北京澎思科技有限公司 人脸跟踪方法、系统及计算机可读存储介质
CN113239898A (zh) * 2021-06-17 2021-08-10 阿波罗智联(北京)科技有限公司 用于处理图像的方法、路侧设备和云控平台
CN113469910B (zh) * 2021-06-29 2023-03-24 展讯通信(上海)有限公司 图像处理方法、装置及设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101854491A (zh) * 2009-04-03 2010-10-06 深圳市融创天下科技发展有限公司 一种样条放大与自适应模版结合的图像下采样的方法
US20100260433A1 (en) * 2007-09-19 2010-10-14 Dong-Qing Zhang System and method for scaling images
CN106067161A (zh) * 2016-05-24 2016-11-02 深圳市未来媒体技术研究院 一种对图像进行超分辨的方法
CN106097355A (zh) * 2016-06-14 2016-11-09 山东大学 基于卷积神经网络的胃肠道肿瘤显微高光谱图像处理方法
CN107229918A (zh) * 2017-05-26 2017-10-03 西安电子科技大学 一种基于全卷积神经网络的sar图像目标检测方法
CN107240102A (zh) * 2017-04-20 2017-10-10 合肥工业大学 基于深度学习算法的恶性肿瘤计算机辅助早期诊断方法
CN107358576A (zh) * 2017-06-24 2017-11-17 天津大学 基于卷积神经网络的深度图超分辨率重建方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003276640A1 (en) * 2002-12-19 2004-07-14 Koninklijke Philips Electronics N.V. Image scaling
AU2017281281B2 (en) * 2016-06-20 2022-03-10 Butterfly Network, Inc. Automated image acquisition for assisting a user to operate an ultrasound device
CN107358575A (zh) * 2017-06-08 2017-11-17 清华大学 一种基于深度残差网络的单幅图像超分辨率重建方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100260433A1 (en) * 2007-09-19 2010-10-14 Dong-Qing Zhang System and method for scaling images
CN101854491A (zh) * 2009-04-03 2010-10-06 深圳市融创天下科技发展有限公司 一种样条放大与自适应模版结合的图像下采样的方法
CN106067161A (zh) * 2016-05-24 2016-11-02 深圳市未来媒体技术研究院 一种对图像进行超分辨的方法
CN106097355A (zh) * 2016-06-14 2016-11-09 山东大学 基于卷积神经网络的胃肠道肿瘤显微高光谱图像处理方法
CN107240102A (zh) * 2017-04-20 2017-10-10 合肥工业大学 基于深度学习算法的恶性肿瘤计算机辅助早期诊断方法
CN107229918A (zh) * 2017-05-26 2017-10-03 西安电子科技大学 一种基于全卷积神经网络的sar图像目标检测方法
CN107358576A (zh) * 2017-06-24 2017-11-17 天津大学 基于卷积神经网络的深度图超分辨率重建方法

Also Published As

Publication number Publication date
CN109996023A (zh) 2019-07-09
CN109996023B (zh) 2021-06-29

Similar Documents

Publication Publication Date Title
WO2019128726A1 (zh) 图像处理方法和装置
WO2020177651A1 (zh) 图像分割方法和图像处理装置
CN110473137B (zh) 图像处理方法和装置
TWI756378B (zh) 用於深度學習影像超解析度的系統及方法
US9736451B1 (en) Efficient dense stereo computation
AU2016349518B2 (en) Edge-aware bilateral image processing
CN111683269B (zh) 视频处理方法、装置、计算机设备和存储介质
CN109816615B (zh) 图像修复方法、装置、设备以及存储介质
US9865063B2 (en) Method and system for image feature extraction
US8401316B2 (en) Method and apparatus for block-based compression of light-field images
CN110084309B (zh) 特征图放大方法、装置和设备及计算机可读存储介质
TW202040986A (zh) 視頻圖像處理方法及裝置
US20170064280A1 (en) Image processing method and apparatus
WO2020207134A1 (zh) 图像处理方法、装置、设备以及计算机可读介质
WO2023065665A1 (zh) 图像处理方法、装置、设备、存储介质及计算机程序产品
KR20220120674A (ko) 3차원 재구성 방법, 장치, 기기 및 저장 매체
CN112149793A (zh) 人工神经网络模型和包括人工神经网络模型的电子设备
CN111353965B (zh) 图像修复方法、装置、终端及存储介质
WO2020232672A1 (zh) 图像裁剪方法、装置和拍摄装置
CN116630152A (zh) 图像分辨率重建方法、装置、存储介质及电子设备
WO2022033088A1 (zh) 图像处理方法、装置、电子设备和计算机可读介质
EP4040397A1 (en) Method and computer program product for producing a 3d representation of an object
WO2017209213A1 (ja) 画像処理装置、画像処理方法、及びコンピュータ読み取り可能な記録媒体
CN115604528A (zh) 鱼眼图像压缩、鱼眼视频流压缩以及全景视频生成方法
US20240013345A1 (en) Methods and apparatus for shared image processing among multiple devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18897812

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18897812

Country of ref document: EP

Kind code of ref document: A1