WO2021056304A1 - 图像处理方法、装置、可移动平台及机器可读存储介质 - Google Patents
图像处理方法、装置、可移动平台及机器可读存储介质 Download PDFInfo
- Publication number
- WO2021056304A1 WO2021056304A1 PCT/CN2019/108039 CN2019108039W WO2021056304A1 WO 2021056304 A1 WO2021056304 A1 WO 2021056304A1 CN 2019108039 W CN2019108039 W CN 2019108039W WO 2021056304 A1 WO2021056304 A1 WO 2021056304A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- channel
- original
- images
- neural network
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/70—Circuitry for compensating brightness variation in the scene
- H04N23/741—Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
- H04N23/84—Camera processing pipelines; Components thereof for processing colour signals
- H04N23/88—Camera processing pipelines; Components thereof for processing colour signals for colour balance, e.g. white-balance circuits or colour temperature control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10141—Special mode during image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present invention relates to the field of image processing, in particular to an image processing method, device, movable platform and machine-readable storage medium.
- the original image can be processed by demosaicing, denoising, white balance, color space conversion, image sharpening, and color enhancement to obtain sRGB (standard Red Green Blue, standard red and green). Blue) color image.
- sRGB standard Red Green Blue, standard red and green
- Blue color enhancement
- an image processing method including:
- an image processing method including:
- the multi-channel image is processed through a preset neural network to generate an image with improved image quality.
- an image processing method including:
- the preset neural network is trained according to the multi-channel training image; wherein, the preset neural network includes a multi-scale extraction network, and the multi-scale extraction network is used to obtain the information of each channel in the multi-channel training image. Multiple scale features of training images.
- an image processing method including:
- the preset neural network includes a multi-scale extraction network, and the multi-scale extraction network is used to obtain multiple scale features of the image of each channel in the multi-channel image.
- an image processing device in a fifth aspect of the present invention, includes a processor and a memory; the memory is used for storing computer instructions executable by the processor; the processor is used for downloading from the memory The computer instructions are read to implement the above-mentioned method steps.
- a movable platform including:
- a power system is provided in the body, and the power system is used to provide power for the movable platform;
- the image processing device as described above is used to process image frames taken by the sensor system.
- a camera including:
- the lens assembly is installed inside the housing
- a sensor assembly installed inside the housing for sensing light passing through the lens assembly and generating electrical signals; and, the above-mentioned image processing device.
- a machine-readable storage medium is provided, and a number of computer instructions are stored on the machine-readable storage medium, and the computer instructions implement the above-mentioned method when executed.
- the original training image can be decorrelated to obtain a multi-channel training image, and the preset neural network can be trained according to the multi-channel training image and the target image.
- the image to be processed can be processed based on the preset neural network to generate an image with improved image quality, so that the image quality can be improved.
- the above method can train a high-performance preset neural network, and convert the collected low-quality original images into high-quality color images, with high-efficiency imaging performance, high-quality imaging results, and good user experience.
- FIG. 1A and 1B are schematic diagrams of processing the original image to obtain an sRGB color image
- FIG. 2 is a schematic diagram of an example of an image processing method in an implementation
- 3A is a schematic diagram of processing multiple original images in an embodiment
- 3B is a schematic diagram of color decorrelation processing in an embodiment
- 3C is a schematic diagram of multi-scale correlation processing in an embodiment
- FIG. 3D is a schematic diagram of multi-scale correlation processing in an embodiment
- FIG. 3E is a schematic structural diagram of a multi-scale extraction network in an embodiment
- FIG. 3F is a schematic diagram of the comparison effect in an embodiment
- FIG. 4 is a schematic diagram of an example of an image processing method in another embodiment
- FIG. 5 is a schematic diagram of an example of an image processing method in another embodiment
- FIG. 6 is a schematic diagram of an example of an image processing method in another embodiment
- FIG. 7 is a block diagram of an example of an image processing device in an embodiment
- FIG. 8 is a block diagram of an example of a movable platform in an implementation manner
- Fig. 9 is a block diagram of an example of a camera in an embodiment.
- first, second, third, etc. may be used in the present invention to describe various information, the information should not be limited to these terms. These terms are used to distinguish the same type of information from each other.
- first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
- word "if” used can be interpreted as "when", or "when”, or "in response to certainty.”
- the original image In order to convert the original image into a high-quality sRGB color image, in a possible implementation, as shown in Figure 1A, after the original image (such as a raw image) is collected, the original image can be de-mosaic and denoised. , Color processing (such as white balance and color space conversion), image enhancement (such as image sharpening and color enhancement) and other processing to get sRGB color images.
- Color processing such as white balance and color space conversion
- image enhancement such as image sharpening and color enhancement
- other processing to get sRGB color images.
- the image quality of the sRGB color image obtained by the above method is still relatively low, and the image quality may be improved.
- the target camera can be used to collect a large-scale data set, the data set includes the original image (such as raw image) and the target image, and then the The data set trains the preset neural network. Based on the trained preset neural network, the collected original images can be processed to produce high-quality images.
- a high-performance preset neural network can be trained, and the collected low-quality original images can be converted into high-quality sRGB color images, which has high-efficiency imaging performance and obtains high-quality imaging results.
- the multi-exposure fusion method is used to generate high-quality target images, and a specific neural network structure is used for training to obtain high-quality imaging results.
- Embodiment 1 Refer to Fig. 2, which is a schematic flowchart of an image processing method, and the method includes:
- Step 201 Obtain an original training image and a target image corresponding to the original training image.
- multiple original images may be acquired, and the exposure of different original images may be the same, or the exposure of different original images may be different. Then, an original image is selected from the multiple original images as the original training image, and the multiple original images are subjected to multi-image fusion processing to obtain the target image.
- the multiple original images include multiple original images acquired in a bracketing exposure mode.
- the bracketing mode (Bracketing) is an advanced function of the camera. Based on the bracketing mode, when the shutter is pressed, instead of collecting one original image, it continuously collects multiple original images with different exposure combinations to ensure that the original image is available. Meet the intent of exposure.
- the bracketing exposure mode is implemented as follows: first collect an original image according to the metered value exposure, and then collect one or more original images by increasing and decreasing the exposure on the basis. The exposure of different original images is different.
- the exposure compensation information of each original image may be acquired, and according to the exposure compensation information of each original image, one original image is selected from a plurality of original images as the original training image.
- the RGB image corresponding to each of the multiple original images may be acquired, and the multiple RGB images corresponding to the multiple original images may be subjected to multi-image fusion processing to obtain the target image.
- obtaining the RGB image corresponding to each original image in the multiple original images may include: for each original image in the multiple original images, demosaicing, denoising, and automatic white balance are performed on the original image , Image sharpening, color enhancement, color space conversion, and pre-denoising at least one of the processing to obtain the RGB (ie, red, green, and blue) image corresponding to the original image.
- performing multi-image fusion processing on multiple RGB images corresponding to multiple original images to obtain the target image may include, but is not limited to: performing registration processing and fusion processing on multiple RGB images corresponding to multiple original images To generate the target image. Or, perform registration processing and fusion processing on multiple RGB images corresponding to multiple original images, and perform contrast enhancement and/or color space conversion on the processed images to generate the target image.
- performing registration processing and fusion processing on multiple RGB images corresponding to multiple original images to obtain the target image may include, but is not limited to: performing registration processing and fusion processing on multiple RGB images corresponding to multiple original images To generate the target image.
- the above method is only an example, and there is no restriction on this.
- multiple RGB images can be fused into a target image.
- the target image can meet the quality requirements of low noise and high dynamic range, and it is a target image that can meet the high quality requirements.
- a preset neural network needs to be trained according to a data set.
- the data set includes the original image and the target image. Therefore, a data set needs to be constructed, and the data set can meet the height of the image. Quality requirements, such as low noise level, high dynamic range, accurate color rendering, effective contrast enhancement results, etc.
- the data set construction process can include:
- the automatic mode of the target camera can be used to collect the original images of multiple scenes (such as Raw images).
- the number of scenes There is no limit to the number of scenes.
- the collection Original images of 500 scenes such as original images of daytime scenes, original images of night scenes, original images of rainy scenes, etc.
- the type of scenes There is no restriction on the type of scenes.
- the bracketing exposure mode can be used to collect multiple original images with different exposures, and the number of original images in each scene is not limited.
- the number of original images in different scenes can be the same or different.
- the bracketing mode can be used to collect 7-11 original images with different exposures.
- 7-11 is only an example, and there is no limitation on this.
- the bracketing mode for each scene, by adopting the bracketing mode to collect multiple original images with different exposures, the dynamic range of the entire scene can be included. Moreover, through these original images with different exposures, it is convenient for post-image registration and fusion, so as to eliminate noise in the post-image fusion process.
- bracketing exposure mode when using the bracketing exposure mode to collect multiple original images with different exposures, a small displacement of the objects in the picture is allowed, and the original image with a large motion displacement is not taken.
- the target image is an image for training a preset neural network. It has a reference-level image quality, and it is necessary to provide a target image that can meet high-quality requirements. Based on this, in order to effectively achieve the quality requirements of low noise and high dynamic range, multiple image fusion processing can be performed on multiple original images to obtain the target image.
- editing software such as Photoshop, etc.
- the original image is processed by demosaicing, automatic white balance, denoising, image sharpening, color enhancement, color space conversion operations, etc., to obtain the RGB image corresponding to the original image, such as a 16-bit universal RGB image in color space (such as Adobe RGB image).
- the denoising software is used to pre-denoise each RGB image to obtain a denoised RGB image.
- HDR software (such as Aurora HDR, etc.) is used to perform multi-image fusion processing on the denoised multiple RGB images to obtain the target image.
- the multiple RGB images after denoising are processed by registration, fusion, contrast enhancement and color space conversion to obtain the target image.
- the color space space conversion is used to convert the RGB image to the sRGB space, that is, the target image is an sRGB color image.
- the exposure compensation information of each original image can be acquired, and according to the exposure compensation information of each original image, one original image is selected as the original training image from the multiple original images.
- one original image from multiple original images of the scene can be selected as the original training image.
- the exposure compensation information of the original image is larger, it means that the shooting environment of the original image is not good, and exposure compensation is required.
- the exposure compensation information of the original image is smaller, it means that the shooting environment of the original image is better. Perform exposure compensation or perform only relatively small exposure compensation. Therefore, according to the exposure compensation information of each original image, an original image with a small exposure compensation information can be selected as the original training image from a plurality of original images.
- the original training image and target image of each scene can be obtained, and the processing process of each scene is the same, and the original training image and target image of a scene will be taken as an example in the following.
- Step 202 Perform a decorrelation operation on the original training image to obtain a multi-channel training image.
- the R channel image, the G channel image, and the B channel image can be acquired according to the original training image; then, the correlation between the R channel image, the G channel image, and the B channel image is removed to obtain a multi-channel image.
- Training image may include a luminance component and a chrominance component, and the chrominance component includes a first chrominance component and a second chrominance component.
- obtaining R channel images, G channel images, and B channel images according to the original training image may include but is not limited to: performing an up-sampling operation on the original training image (such as a 2 times bilinear up-sampling operation) to obtain an R channel image , G channel image and B channel image.
- performing an up-sampling operation on the original training image such as a 2 times bilinear up-sampling operation
- removing the correlation between the R channel image, the G channel image, and the B channel image to obtain a multi-channel training image may include, but is not limited to: removing the R channel image according to the first conversion matrix, The correlation between the G channel image and the B channel image is used to obtain a multi-channel training image; wherein, the first conversion matrix is used to release the correlation between the images.
- the first conversion matrix can be obtained in the following manner: the original input image is obtained, and the R channel input image, the G channel input image, and the B channel input image are obtained according to the original input image; further, based on principal component analysis, according to The R channel input image, the G channel input image, and the B channel input image acquire a first conversion matrix.
- obtaining the first conversion matrix according to the R channel input image, the G channel input image, and the B channel input image may include, but is not limited to: the R channel input image, the G channel
- the input image and the B channel input image each collect multiple pixel vectors, and the collected multiple pixel vectors form a pixel vector matrix; perform principal component analysis on the pixel vector matrix to obtain three basis vectors; according to the three basis vectors Obtain the first conversion matrix.
- a first conversion matrix can be obtained, and the first conversion matrix is used to resolve the correlation between images.
- the original training image is decorrelated through the first conversion matrix to obtain a multi-channel training image.
- the multi-channel training image includes a luminance component and a chrominance component, and the chrominance component includes a first chrominance component and a second chrominance component. Degree component.
- obtain the original input image for example, use multiple original images in the data set as the original input image.
- decorrelate the original input image to obtain a three-channel RGB image (that is, an RGB image in the camera color space).
- a bilinear up-sampling operation of 2 times may be performed on the original input image to obtain a three-channel RGB image.
- the three-channel RGB image includes an R channel input image, a G channel input image, and a B channel input image.
- N pixel vectors are collected from the R channel input image
- N pixel vectors are collected from the G channel input image
- N pixel vectors are collected from the B channel input image.
- N is the number of pixel samples.
- the value of N is configured based on experience. For example, N can be greater than or equal to 1000, and there is no restriction on this.
- the PCA algorithm can be used to perform principal component analysis on the pixel vector matrix to obtain three basis vectors (also called coordinate basis column vectors).
- the three basis vectors can be X 1 , X 2 and X 3 , X 1 is the principal component of the sample with the largest difference above the coordinate.
- the PCA (Principal Component Analysis) algorithm is a widely used data dimensionality reduction algorithm.
- the main idea is to map n-dimensional features to k-dimensions.
- This k-dimension is a brand-new orthogonal feature, which is also called
- the main component is a k-dimensional feature reconstructed on the basis of the original n-dimensional feature.
- the process of performing principal component analysis on the pixel vector matrix to obtain the three basis vectors can be implemented by using the PCA algorithm, which will not be repeated here, as long as the three basis vectors can be obtained.
- the multi-channel training image may include a luminance component, a first chrominance component, and a second chrominance component.
- the R channel image, G channel image, and B channel image First obtain the R channel image, G channel image, and B channel image according to the original training image. For example, an up-sampling operation (such as a bilinear up-sampling operation) is performed on the original training image to obtain a three-channel RGB image.
- the three-channel RGB image includes an R channel image, a G channel image, and a B channel image.
- the correlation between the R channel image, the G channel image, and the B channel image is removed to obtain a multi-channel training image;
- the multi-channel training image may include a luminance component and a chrominance component, and the chrominance
- the components include a first chroma component and a second chroma component.
- the above formula removes the correlation between the R channel image, the G channel image, and the B channel image according to the first conversion matrix to obtain the luminance component, the first chrominance component, and the second chrominance component.
- Step 203 Training the preset neural network according to the multi-channel training image and the target image.
- the multi-channel training image can be used as the input of the preset neural network to generate an output image; the network parameters of the preset neural network are updated according to the output image and the target image.
- the multi-channel training image may include luminance components and chrominance components
- the preset neural network may include a multi-scale extraction network
- the multi-scale extraction network includes a first multi-scale extraction network and a second multi-scale extraction network.
- a multi-scale extraction network is used to obtain multiple scale features of the luminance component
- the second multi-scale extraction network is used to obtain multiple scale features of the chrominance component.
- the chrominance component may include a first chrominance component and a second chrominance component
- the second multi-scale extraction network may include a multi-scale extraction network for acquiring multiple scale features of the first chrominance component, A multi-scale extraction network for acquiring multiple scale features of the second chrominance component.
- the first multi-scale extraction network may include but is not limited to: a convolutional layer and/or a pooling layer; the second multi-scale extraction network may include, but is not limited to: a convolutional layer and/or a pooling layer.
- the preset neural network may also include multiple splicing layers connected in sequence; the splicing layer is used to perform channel splicing operations on luminance components and chrominance components of the same scale; or, the splicing layer is used to splice luminance components of the same scale. And the chrominance component, and the output image of the previous splicing layer for channel splicing operation. Further, adjacent splicing layers are connected by a preset transposed convolutional layer, and the transposed convolutional layer is used to perform a transposed convolution operation on the output image of the previous spliced layer and output to the next spliced layer.
- updating the network parameters of the preset neural network according to the output image and the target image may include but is not limited to: converting the output image according to the second conversion matrix to obtain a camera RGB image; using camera metadata The camera RGB image is converted into a standard RGB image; the network parameters of the preset neural network are updated according to the standard RGB image and the target image.
- the second conversion matrix may be determined according to the first conversion matrix; where the first conversion matrix is [X 1 , X 2 , X 3 ] T , T is used to indicate transposition, and the second conversion matrix is [X 1 ,X 2 ,X 3 ].
- a multi-channel training image can be obtained.
- the multi-channel training image can include a luminance component and a chrominance component, and the chrominance component includes a first chrominance component and a second chrominance component.
- the target image can be obtained.
- the multi-channel training image and the target image are used as the input of the preset neural network, and the preset neural network is trained.
- the preset neural network may include a multi-scale extraction network, and the multi-scale extraction network includes a first multi-scale extraction network and a second multi-scale extraction network.
- the first multi-scale extraction network is used to obtain multiple scale features of the brightness component. Therefore, the first multi-scale extraction network may also be referred to as a brightness component multi-scale extraction network.
- the second multi-scale extraction network is used to obtain multiple scale features of chrominance components. Therefore, the second multi-scale extraction network may also be referred to as a chrominance component multi-scale extraction network.
- the second multi-scale extraction network may include a multi-scale extraction network for acquiring multiple scale features of the first chrominance component, and A multi-scale extraction network for acquiring multiple scale features of the second chrominance component.
- the luminance component of the multi-channel training image is used as the input of the first multi-scale extraction network
- the chrominance component of the multi-channel training image is used as the input of the second multi-scale extraction network, based on the first multi-scale extraction network
- the second multi-scale extraction network processes the luminance component and the chrominance component, such as convolution, pooling, channel splicing, transposed convolution, etc., and finally, the output image is obtained.
- the output image is converted according to the second conversion matrix to obtain the camera RGB image
- the camera metadata is used to convert the camera RGB image into a standard RGB image
- the network parameters of the preset neural network are performed according to the standard RGB image and the target image. Update, there is no restriction on the parameter update process, as long as the network parameters are updated based on the standard RGB image and the target image.
- the format of the standard RGB image is sRGB
- the format of the target image is also sRGB.
- the network parameters of the preset neural network include convolutional layer parameters, pooling layer parameters, etc., which are not limited.
- the first conversion matrix may be [X 1 , X 2 , X 3 ] T
- the second conversion matrix can be obtained based on the first conversion matrix
- the output image can be converted according to the second conversion matrix M'to obtain an image in the camera RGB space, which is called camera RGB in this article image.
- the camera metadata is used to calculate the color conversion matrix
- the camera RGB image is converted according to the color conversion matrix to obtain an image in the sRGB space, which is called a standard RGB image in this article.
- the network parameters are updated based on the standard RGB image and the target image.
- FIG. 3E is a schematic diagram of the structure of the first multi-scale extraction network and the second multi-scale extraction network.
- the first multi-scale extraction network and the second multi-scale extraction network are used for multi-scale processing.
- the first multi-scale extraction network and The second multi-scale extraction network has a multi-resolution processing structure to process images of different scales. For example, denoising and sharpening operations can be performed on small-scale images, and local illumination adjustment operations can be performed on large-scale images.
- the original training image can be decorrelated by the first conversion matrix to obtain a multi-channel training image.
- the multi-channel training image can include a luminance component L, a first chrominance component a, and a second chrominance component.
- the luminance component L is the projection of the image on the largest principal component, which is the luminance component of the image
- the first chrominance component a and the second chrominance component b are the chrominance components of the image.
- the luminance component is output to the first multi-scale extraction network
- the first multi-scale extraction network may include a convolutional layer and/or a pooling layer.
- the first multi-scale extraction network includes multiple convolutional layers and multiple pooling layers.
- the convolutional layer 11 performs convolution processing on the brightness component.
- the pooling layer 12 performs pooling (for example, 2 times the maximum pooling) processing on the brightness components after the convolution processing, to obtain an image of W/2*H/2, which is half of the original image.
- the convolution layer 13 For the brightness component with the size of W/2*H/2, the convolution layer 13 performs convolution processing on the brightness component, and the pooling layer 14 pools the convolution processed brightness component (for example, 2 times the maximum value). Pooling) processing to obtain W/4*H/4 images.
- the convolution layer 15 For the brightness component with the size of W/4*H/4, the convolution layer 15 performs convolution processing on the brightness component.
- the above takes 3 convolutional layers and 2 pooling layers as examples, and there is no restriction on this.
- the chrominance components are output to the second multi-scale extraction network, and the second multi-scale extraction network may include volume Build-up and/or pooling layers.
- the second multi-scale extraction network includes multiple convolutional layers and multiple pooling layers.
- the convolutional layer 21 performs convolution processing on the chrominance component.
- the pooling layer 22 performs pooling (for example, 2 times maximum pooling) processing on the chrominance components after the convolution processing, to obtain an image of W/2*H/2, which is half of the original image.
- the convolution layer 23 For the chrominance component of size W/2*H/2, the convolution layer 23 performs convolution processing on the chrominance component, and the pooling layer 24 pools the chrominance component after the convolution processing (such as 2 Double-maximum pooling) processing to obtain a W/4*H/4 image.
- the convolution layer 25 For the chrominance component with the size of W/4*H/4, the convolution layer 25 performs convolution processing on the chrominance component.
- the above takes 3 convolutional layers and 2 pooling layers as examples, and there is no restriction on this.
- the first multi-scale extraction network performs convolution and pooling processing on the luminance component
- the second multi-scale extraction network performs convolution and pooling processing on the chrominance component, which are independent of each other.
- the preset neural network may also include multiple splicing layers (ie, channel splicing layers), multiple transposed convolutional layers, and multiple convolutional layers.
- the luminance component of W/4*H/4 and the chrominance component of W/4*H/4 can be output to the splicing layer 31, and the splicing layer 31 can compare the luminance component of W/4*H/4 and the chrominance component of W/4*H/4 to the splicing layer 31.
- the 4*H/4 chrominance component (that is, the luminance component and the chrominance component of the same scale) performs the channel splicing operation, and the image after the channel splicing operation is output to the convolutional layer 32, and the convolutional layer 32 convolves the image
- the transposed convolution layer 33 performs a transposed convolution operation (such as a 2 times transposed convolution operation) on the convolution processed image.
- a transposed convolution operation such as a 2 times transposed convolution operation
- Output to the splicing layer 34, and the splicing layer 34 performs channel splicing operations on the image of W/2*H/2, the luminance component of W/2*H/2, and the chrominance component of W/2*H/2.
- the image after the channel splicing operation is output to the convolution layer 35, the convolution layer 35 performs convolution processing on the image, and the transposed convolution layer 36 performs transposition convolution operation on the convolution processed image to obtain W*H Image.
- the W*H image of the transposed convolution layer 36, the W*H brightness component of the convolution layer 11, and the W*H chroma component of the convolution layer 13 are output to the splicing layer 37, which is then spliced Layer 37 performs channel splicing operations on the W*H image, W*H luminance component, and W*H chrominance component, and outputs the image after the channel splicing operation to the convolutional layer 38, and the convolutional layer 38 performs the image processing on the image Convolution processing, and finally output W*H output image.
- the output image is converted according to the second conversion matrix to obtain the camera RGB image
- the camera metadata is used to convert the camera RGB image into a standard RGB image
- the network parameters of the preset neural network are performed according to the standard RGB image and the target image. Update.
- the luminance component and the chrominance component are connected together in a way of channel splicing to synthesize an up-sampled channel.
- the image of the brightness component and the hue component will be merged into the up-sampling convolution operation in the way of channel splicing.
- each convolutional layer may be composed of multiple Conv layers (ie, convolutional layers) and multiple ReLU (rectified linear unit) layers, as shown in FIG. 3E.
- the preset neural network may be a convolutional neural network (CNN) of any architecture, and the structure of the preset neural network is not limited.
- the preset neural network can implement color decorrelation processing and multi-resolution processing.
- color decorrelation processing refer to step 202
- multi-resolution processing refer to step 203.
- the original training image can be decorrelated to obtain a multi-channel training image, and the preset neural network can be trained according to the multi-channel training image and the target image.
- the image to be processed can be processed based on the preset neural network to generate an image with improved image quality, so that the image quality can be improved.
- the above method can train a high-performance preset neural network, and convert the collected low-quality original images into high-quality sRGB color images, with efficient imaging performance, high-quality imaging results, and good user experience.
- the image quality enhancement image generated based on the preset neural network has less noise, higher dynamic range, higher local detail presentation, and better visual effects.
- FIG. 3F is a comparison effect diagram of the image of the present invention and the existing image.
- the image in the upper left corner and the image in the upper right corner are existing images.
- the image in the lower left corner is based on the image quality enhancement image generated by the preset neural network, and the image in the lower right corner is the target image. Obviously, the image quality of the improved image is very high, with less noise, higher dynamic range, and higher local detail presentation.
- the training of a preset neural network is introduced. Based on the trained preset neural network, the image to be processed can be processed to obtain an image with improved image quality, which is described below.
- Embodiment 2 Refer to FIG. 4, which is a flowchart of an image processing method, and the method may include:
- Step 401 Obtain an image to be processed, that is, an image whose quality needs to be improved.
- the image to be processed can be collected, and there is no restriction on this.
- Step 402 Perform a decorrelation operation on the image to be processed to obtain a multi-channel image.
- the R channel image, the G channel image, and the B channel image can be acquired according to the image to be processed, and then the correlation between the R channel image, the G channel image, and the B channel image can be removed to obtain a multi-channel image.
- the multi-channel image may include a luminance component and a chrominance component
- the chrominance component may include a first chrominance component and a second chrominance component.
- the acquisition of R channel images, G channel images, and B channel images according to the image to be processed may include, but is not limited to: performing an upsampling operation (such as a double bilinear upsampling operation) on the image to be processed to obtain an R channel image, G channel image, B channel image.
- an upsampling operation such as a double bilinear upsampling operation
- Remove the correlation between the R channel image, the G channel image, and the B channel image to obtain a multi-channel image including: removing the correlation between the R channel image, the G channel image, and the B channel image according to the first conversion matrix to obtain a multi-channel image.
- Channel image; the first conversion matrix is used to remove the correlation between the images.
- the first conversion matrix can be obtained in the following manner: the original input image is obtained, and the R channel input image, the G channel input image, and the B channel input image are obtained according to the original input image; further, based on principal component analysis, according to The R channel input image, the G channel input image, and the B channel input image acquire a first conversion matrix.
- obtaining the first conversion matrix according to the R channel input image, the G channel input image, and the B channel input image may include, but is not limited to: the R channel input image, the G channel
- the input image and the B channel input image each collect multiple pixel vectors, and the collected multiple pixel vectors form a pixel vector matrix; perform principal component analysis on the pixel vector matrix to obtain three basis vectors; according to the three basis vectors Obtain the first conversion matrix.
- this embodiment can perform color decorrelation processing. Specifically, the first conversion matrix is obtained, and the first conversion matrix is used to resolve the correlation between the images. Then, a decorrelation operation is performed on the image to be processed through the first conversion matrix to obtain a multi-channel image.
- the multi-channel image includes a luminance component and a chrominance component, and the chrominance component includes a first chrominance component and a second chrominance component.
- the original input image is obtained, for example, multiple original images in the data set are used as the original input image.
- decorrelate the original input image to obtain a three-channel RGB image that is, an RGB image in the camera color space.
- a bilinear up-sampling operation of 2 times may be performed on the original input image to obtain a three-channel RGB image.
- the three-channel RGB image includes an R channel input image, a G channel input image, and a B channel input image.
- N pixel vectors are collected from the R channel input image
- N pixel vectors are collected from the G channel input image
- N pixel vectors are collected from the B channel input image.
- N is the number of pixel samples.
- the value of N is configured based on experience. For example, N can be greater than or equal to 1000, and there is no restriction on this.
- the PCA algorithm can be used to perform principal component analysis on the pixel vector matrix to obtain three basis vectors (also called coordinate basis column vectors).
- the three basis vectors can be X 1 , X 2 and X 3 , X 1 is the principal component of the sample with the largest difference above the coordinate.
- step 402 decorrelate the image to be processed through the first conversion matrix to obtain a multi-channel image.
- the multi-channel image includes a luminance component, a first chrominance component, and a second chrominance component.
- the R channel image, the G channel image, and the B channel image are first acquired according to the image to be processed.
- an up-sampling operation (such as a bilinear up-sampling operation) is performed on the image to be processed to obtain a three-channel RGB image.
- the three-channel RGB image includes an R channel image, a G channel image, and a B channel image.
- the correlation between the R channel image, the G channel image, and the B channel image is removed to obtain a multi-channel image;
- the multi-channel image may include a luminance component and a chrominance component, and the chrominance component includes The first chrominance component and the second chrominance component.
- the above formula removes the correlation between the R channel image, the G channel image, and the B channel image according to the first conversion matrix to obtain the luminance component, the first chrominance component, and the second chrominance component.
- Step 403 Process the multi-channel image through a preset neural network to generate an image with improved image quality.
- the multi-channel image is used as the input of the preset neural network to generate an output image; the output image is processed according to the network parameters of the preset neural network to generate an image with improved image quality.
- the multi-channel image may include luminance components and chrominance components
- the preset neural network may include a multi-scale extraction network
- the multi-scale extraction network may include a first multi-scale extraction network and a second multi-scale extraction network.
- a multi-scale extraction network is used to obtain multiple scale features of the luminance component
- a second multi-scale extraction network is used to obtain multiple scale features of the chrominance component.
- the chrominance component may include a first chrominance component and a second chrominance component; the second multi-scale extraction network may include a multi-scale extraction network used to obtain multiple scale features of the first chrominance component, and used to obtain the first chrominance component. Multi-scale extraction network of multi-scale features of dichromatic components.
- the first multi-scale extraction network may include but is not limited to: a convolutional layer and/or a pooling layer; the second multi-scale extraction network may include, but is not limited to: a convolutional layer and/or a pooling layer.
- processing the output image according to the network parameters of the preset neural network to generate the image quality-enhanced image may include: converting the output image according to the second conversion matrix to obtain the camera RGB image; using camera metadata to convert the camera The RGB image is converted into a standard RGB image; the standard RGB image is processed according to the network parameters of the preset neural network to generate an image with improved image quality.
- the second conversion matrix may be determined according to the first conversion matrix; where the first conversion matrix is [X 1 , X 2 , X 3 ] T , T is used to indicate transposition, and the second conversion matrix is [X 1 ,X 2 ,X 3 ].
- the preset neural network may also include multiple splicing layers connected in sequence; the splicing layer is used to perform channel splicing operations on luminance components and chrominance components of the same scale; or, the splicing layer is used to splice luminance components of the same scale. And the chrominance component, and the output image of the previous splicing layer for channel splicing operation. Further, adjacent splicing layers are connected by a preset transposed convolutional layer, and the transposed convolutional layer is used to perform a transposed convolution operation on the output image of the previous spliced layer and output to the next spliced layer.
- a multi-channel image may be obtained.
- the multi-channel image may include a luminance component and a chrominance component, and the chrominance component may include a first chrominance component and a second chrominance component.
- the multi-channel image can be used as an input of a preset neural network to generate an output image.
- the preset neural network includes a first multi-scale extraction network and a second multi-scale extraction network.
- the first multi-scale extraction network is used to obtain multiple scale features of the luminance component
- the second multi-scale extraction network is used to obtain chrominance.
- the multi-scale features of the components, the brightness component of the multi-channel image is used as the input of the first multi-scale extraction network
- the chrominance component of the multi-channel image is used as the input of the second multi-scale extraction network.
- the second multi-scale extraction network processes the luminance component and the chrominance component, such as convolution, pooling, channel splicing, transposed convolution, etc., and finally obtains the output image.
- the output image is converted according to the second conversion matrix to obtain a camera RGB image, and the camera metadata is used to convert the camera RGB image into a standard RGB image.
- the standard RGB image can be processed according to the network parameters of the preset neural network to generate an image with improved image quality.
- the output image can be converted according to the second conversion matrix M'to obtain an image in the camera RGB space, which is called camera RGB in this article image.
- the camera metadata is used to calculate the color conversion matrix
- the camera RGB image is converted according to the color conversion matrix to obtain an image in the sRGB space, which is called a standard RGB image in this article.
- the standard RGB image is processed according to the network parameters of the preset neural network to generate an image with improved image quality, and there is no restriction on the processing process of the preset neural network.
- FIG. 3E is a schematic diagram of the structure of the first multi-scale extraction network and the second multi-scale extraction network.
- the first multi-scale extraction network and the second multi-scale extraction network are used for multi-scale processing.
- the first multi-scale extraction network and The second multi-scale extraction network has a multi-resolution processing structure to process images of different scales. For example, denoising and sharpening operations can be performed on small-scale images, and local illumination adjustment operations can be performed on large-scale images.
- the image to be processed can be decorrelated by the first conversion matrix to obtain a multi-channel image.
- the multi-channel image can include a luminance component L, a first chrominance component a, and a second chrominance component b.
- Component L is the projection of the image on the largest principal component, which is the luminance component of the image, and the first chrominance component a and the second chrominance component b are the chrominance components of the image.
- the luminance component is output to the first multi-scale extraction network
- the first multi-scale extraction network may include a convolutional layer and/or a pooling layer.
- the first multi-scale extraction network includes multiple convolutional layers and multiple pooling layers.
- the convolutional layer 11 performs convolution processing on the brightness component.
- the pooling layer 12 performs pooling (for example, 2 times the maximum pooling) processing on the brightness components after the convolution processing, to obtain an image of W/2*H/2, which is half of the original image.
- the convolution layer 13 For the brightness component with the size of W/2*H/2, the convolution layer 13 performs convolution processing on the brightness component, and the pooling layer 14 pools the convolution processed brightness component (for example, 2 times the maximum value). Pooling) processing to obtain W/4*H/4 images.
- the convolution layer 15 For the brightness component with the size of W/4*H/4, the convolution layer 15 performs convolution processing on the brightness component.
- the above takes 3 convolutional layers and 2 pooling layers as examples, and there is no restriction on this.
- the chrominance components are output to the second multi-scale extraction network, and the second multi-scale extraction network may include volume Build-up and/or pooling layers.
- the second multi-scale extraction network includes multiple convolutional layers and multiple pooling layers.
- the convolutional layer 21 performs convolution processing on the chrominance component.
- the pooling layer 22 performs pooling (for example, 2 times maximum pooling) processing on the chrominance components after the convolution processing, to obtain an image of W/2*H/2, which is half of the original image.
- the convolution layer 23 For the chrominance component of size W/2*H/2, the convolution layer 23 performs convolution processing on the chrominance component, and the pooling layer 24 pools the chrominance component after the convolution processing (such as 2 Double-maximum pooling) processing to obtain a W/4*H/4 image.
- the convolution layer 25 For the chrominance component with the size of W/4*H/4, the convolution layer 25 performs convolution processing on the chrominance component.
- the above takes 3 convolutional layers and 2 pooling layers as examples, and there is no restriction on this.
- the first multi-scale extraction network performs convolution and pooling processing on the luminance component
- the second multi-scale extraction network performs convolution and pooling processing on the chrominance component, which are independent of each other.
- the preset neural network may also include multiple splicing layers (ie, channel splicing layers), multiple transposed convolutional layers, and multiple convolutional layers.
- the luminance component of W/4*H/4 and the chrominance component of W/4*H/4 can be output to the splicing layer 31, and the splicing layer 31 can compare the luminance component of W/4*H/4 and the chrominance component of W/4*H/4 to the splicing layer 31.
- the 4*H/4 chrominance component (that is, the luminance component and the chrominance component of the same scale) performs the channel splicing operation, and the image after the channel splicing operation is output to the convolutional layer 32, and the convolutional layer 32 convolves the image
- the transposed convolution layer 33 performs a transposed convolution operation (such as a 2 times transposed convolution operation) on the convolution processed image.
- a transposed convolution operation such as a 2 times transposed convolution operation
- Output to the splicing layer 34, and the splicing layer 34 performs channel splicing operations on the image of W/2*H/2, the luminance component of W/2*H/2, and the chrominance component of W/2*H/2.
- the image after the channel splicing operation is output to the convolution layer 35, the convolution layer 35 performs convolution processing on the image, and the transposed convolution layer 36 performs transposition convolution operation on the convolution processed image to obtain W*H Image.
- the W*H image of the transposed convolution layer 36, the W*H brightness component of the convolution layer 11, and the W*H chroma component of the convolution layer 13 are output to the splicing layer 37, which is then spliced Layer 37 performs channel splicing operations on the W*H image, W*H luminance component, and W*H chrominance component, and outputs the image after the channel splicing operation to the convolutional layer 38, and the convolutional layer 38 performs the image processing on the image Convolution processing, and finally output W*H output image.
- the output image is converted according to the second conversion matrix to obtain the camera RGB image
- the camera metadata is used to convert the camera RGB image into a standard RGB image
- the standard RGB image is processed according to the network parameters of the preset neural network to generate Improve image quality.
- the luminance component and the chrominance component are connected together in a way of channel splicing to synthesize an up-sampled channel.
- the image of the brightness component and the hue component will be merged into the up-sampling convolution operation in the way of channel splicing.
- each convolutional layer may be composed of multiple Conv layers (ie, convolutional layers) and multiple ReLU (Rectified Linear Unit) layers.
- the preset neural network may be a convolutional neural network of any architecture, and the structure of the preset neural network is not limited.
- a high-performance preset neural network can be trained, and the collected low-quality original images can be converted into high-quality sRGB color images, which has high-efficiency imaging performance and obtains high-quality imaging.
- the user experience is very good.
- Embodiment 3 In another training process of the preset neural network, the preset neural network can be trained according to the multi-channel training image. See FIG. 5, which is a flowchart of the image processing method.
- Step 501 Obtain a multi-channel training image according to the original training image.
- Embodiment 1 For the process of obtaining multi-channel training images based on the original training images, refer to Embodiment 1.
- Step 502 Train a preset neural network according to the multi-channel training image; wherein the preset neural network includes a multi-scale extraction network, and the multi-scale extraction network is used to obtain the multi-channel training image of each channel in the multi-channel training image.
- a scale feature For example, a multi-channel training image may include a luminance component and a chrominance component.
- the multi-scale extraction network may include a first multi-scale extraction network and a second multi-scale extraction network. The first multi-scale extraction network is used to obtain multiple scales of the luminance component.
- the second multi-scale extraction network is used to obtain multiple-scale features of chrominance components.
- the original training image and the target image corresponding to the original training image can be obtained. Based on this, the preset neural network can be trained according to the multi-channel training image and the target image.
- Obtaining the original training image and the target image corresponding to the original training image includes: obtaining multiple original images; different original images have the same or different exposure; selecting one original image from the multiple original images as the original training image, and comparing multiple original images
- the image undergoes multi-image fusion processing to obtain the target image.
- Embodiment 1 For the method of acquiring the original training image and the target image, refer to Embodiment 1, which will not be repeated here.
- training the preset neural network based on the multi-channel training image and the target image may include: using the multi-channel training image as the input of the preset neural network to generate an output image; The network parameters of the neural network are updated.
- updating the network parameters of the preset neural network according to the output image and the target image may include: converting the output image according to the second conversion matrix to obtain a camera RGB image; using camera metadata to convert the camera RGB image to standard RGB Image; further, the network parameters of the preset neural network are updated according to the standard RGB image and the target image.
- a high-performance preset neural network can be trained, and the collected low-quality original images can be converted into high-quality sRGB color images, which has high-efficiency imaging performance and obtains high-quality imaging.
- the user experience is very good.
- the training of a preset neural network is introduced. Based on the trained preset neural network, the image to be processed can be processed to obtain an image with improved image quality, which is described below.
- Embodiment 4 Refer to FIG. 6, which is a flowchart of an image processing method, and the method may include:
- Step 601 Obtain an image to be processed and a multi-channel image corresponding to the image to be processed.
- the multi-channel image is processed by a preset neural network to generate an image quality-enhanced image; wherein, the preset neural network may include a multi-scale extraction network, and the multi-scale extraction network is used to obtain the images in the multi-channel image. Multiple scale features of the image of each channel.
- the multi-channel image is processed by the preset neural network to generate the image quality improvement image, which may include: taking the multi-channel image as the input of the preset neural network to generate the output image; and outputting the output image according to the network parameters of the preset neural network
- the image is processed to generate an image with improved image quality.
- the output image is processed according to the network parameters of the preset neural network to generate the image quality improvement image, including: converting the output image according to the second conversion matrix to obtain the camera RGB image; using the camera metadata to convert the camera RGB image to Standard RGB image; the standard RGB image is processed according to the network parameters of the preset neural network to generate an image with improved image quality.
- a high-performance preset neural network can be trained, and the collected low-quality original images can be converted into high-quality sRGB color images, which has high-efficiency imaging performance and obtains high-quality imaging.
- the user experience is very good.
- an embodiment of the present invention also provides an image processing device 70.
- the image processing device 70 includes a processor 71 and a memory 72; the memory 72 is configured to store the A computer instruction executable by the processor 71; the processor 71 is configured to read the computer instruction from the memory 72 to implement the above-mentioned image processing method.
- the processor 71 may read the computer instructions from the memory 72 to implement the following operations:
- the processor 71 obtains the original training image and the target image corresponding to the original training image, it is specifically used to: obtain a plurality of original images; wherein the exposures of different original images are the same or different;
- Multi-image fusion processing is performed on the multiple original images to obtain the target image.
- the processor 71 selects an original image from a plurality of original images as the original training image, it is specifically used to: obtain the exposure compensation information of each original image, and obtain the exposure compensation information of each original image from the multiple original images.
- One of the original images is selected as the original training image.
- the processor 71 performs multi-image fusion processing on the multiple original images, and when the target image is obtained, it is specifically used to: obtain the RGB image corresponding to each original image in the multiple original images, and perform multi-image fusion processing on the multiple original images.
- the multiple RGB images corresponding to the image are subjected to multi-image fusion processing to obtain the target image.
- the processor 71 obtains the RGB image corresponding to each of the plurality of original images, it is specifically configured to: for each of the plurality of original images, demosaic, At least one of denoising, automatic white balance, image sharpening, color enhancement, color space conversion, and pre-denoising is processed to obtain an RGB image corresponding to the original image.
- the processor 71 performs multi-image fusion processing on the multiple RGB images corresponding to the multiple original images, and when the target image is obtained, it is specifically used to: register the multiple RGB images corresponding to the multiple original images Processing and fusion processing to generate the target image. Or, performing registration processing and fusion processing on multiple RGB images corresponding to the multiple original images, and performing contrast enhancement and/or color space conversion on the processed images to generate the target image.
- the processor 71 performs a decorrelation operation on the original training image, and when obtaining a multi-channel training image, it is specifically used to: obtain an R channel image, a G channel image, and a B channel image according to the original training image; remove the R channel The correlation between the image, the G channel image, and the B channel image is used to obtain the multi-channel training image.
- the processor 71 acquires the R channel image, the G channel image, and the B channel image according to the original training image, it is specifically configured to: perform an up-sampling operation on the original training image to obtain the R channel image and the G channel image. Image and the B channel image.
- the processor 71 removes the correlation between the R channel image, the G channel image, and the B channel image, and when obtaining the multi-channel training image, is specifically configured to: remove the R channel image according to the first conversion matrix.
- the correlation between the channel image, the G channel image, and the B channel image is used to obtain the multi-channel training image; wherein the first conversion matrix is used to release the correlation between the images.
- the processor 71 is further configured to: obtain the first conversion matrix according to the following manner;
- the processor 71 is specifically used to obtain the first conversion matrix according to the R channel input image, the G channel input image, and the B channel input image:
- the first conversion matrix is obtained according to the three basis vectors.
- the processor 71 trains the preset neural network according to the multi-channel training image and the target image, it is specifically used to: use the multi-channel training image as the input of the preset neural network to generate an output image;
- the target image updates the network parameters of the preset neural network.
- the multi-channel training image includes a luminance component and a chrominance component
- the preset neural network includes a multi-scale extraction network
- the multi-scale extraction network includes a first multi-scale extraction network and a second multi-scale extraction network.
- a multi-scale extraction network is used to obtain multiple scale features of the luminance component
- the second multi-scale extraction network is used to obtain multiple scale features of the chrominance component.
- the chrominance component includes a first chrominance component and a second chrominance component.
- the first multi-scale extraction network includes a convolutional layer and/or a pooling layer
- the second multi-scale extraction network includes a convolutional layer and/or a pooling layer.
- the preset neural network also includes a plurality of splicing layers that are connected in sequence; the splicing layer is used to perform channel splicing operations on luminance components and chrominance components of the same scale; or, the splicing layer is used to perform splicing operations on luminance components and chrominance components of the same scale.
- the chrominance component and the output image of the previous splicing layer perform channel splicing operations.
- the adjacent splicing layers are connected by a preset transposed convolutional layer, and the transposed convolutional layer is used to perform a transposed convolution operation on the output image of the previous spliced layer and output to the next spliced layer .
- the processor 71 updates the network parameters of the preset neural network according to the output image and the target image, it is specifically configured to: convert the output image according to the second conversion matrix to obtain a camera RGB image; The metadata converts the camera RGB image into a standard RGB image; and updates the network parameters of the preset neural network according to the standard RGB image and the target image.
- the processor 71 is further configured to determine a second conversion matrix according to the first conversion matrix; the first conversion matrix is [X1, X2, X3] T , T is used to represent transposition, and the second conversion matrix is [X1 ,X2,X3].
- the processor 71 may read the computer instructions from the memory 72 to implement the following operations: obtain an image to be processed; perform decorrelation operations on the image to be processed to obtain a multi-channel image; The multi-channel image is processed to generate an image with improved image quality.
- the processor 71 performs a decorrelation operation on the image to be processed, and when obtaining a multi-channel image, is specifically used to: obtain an R channel image, a G channel image, and a B channel image according to the image to be processed; and remove the R channel image And the correlation between the G channel image and the B channel image to obtain the multi-channel image.
- the processor 71 processes the multi-channel image through a preset neural network to generate an image quality-enhanced image, it is specifically used to: use the multi-channel image as an input of the preset neural network to generate an output image; The output image is processed according to the network parameters of the preset neural network to generate an image with improved image quality.
- the processor 71 may read the computer instructions from the memory 72 to implement the following operations: obtain a multi-channel training image according to the original training image;
- the preset neural network is trained according to the multi-channel training image; wherein, the preset neural network includes a multi-scale extraction network, and the multi-scale extraction network is used to obtain the information of each channel in the multi-channel training image. Multiple scale features of training images.
- the processor 71 is further configured to: obtain an original training image and a target image corresponding to the original training image; when the processor 71 trains a preset neural network according to the multi-channel training image, it is specifically configured to: The multi-channel training image and the target image train a preset neural network.
- the processor 71 may read the computer instructions from the memory 72 to implement the following operations: acquiring a to-be-processed image and a multi-channel image corresponding to the to-be-processed image;
- the preset neural network includes a multi-scale extraction network, and the multi-scale extraction network is used to obtain multiple scale features of the image of each channel in the multi-channel image.
- the movable platform 80 includes: a body 81; a power system 82, where the power system 82 is installed on the Inside the body 81, the power system 82 is used to provide power to the movable platform 80; the sensor system 83 is used to capture image frames; and the aforementioned image processing device 70 is used to capture images of the sensor system After the image frame is processed.
- the processor 71 of the image processing device 70 is further configured to control the power system 82 according to the processed image frame.
- the power system of the UAV may include an electronic governor (referred to as an ESC for short), a propeller, and a motor corresponding to the propeller.
- the motor is connected between the electronic governor and the propeller, and the motor and the propeller are arranged on the corresponding arm; the electronic governor is used to receive the drive signal generated by the control system, and provide the drive current to the motor according to the drive signal to control the motor's Rotating speed.
- the motor is used to drive the propeller to rotate, thereby providing power for the flight of the UAV.
- an embodiment of the present invention also provides a camera.
- the camera 90 includes: a housing 91; a lens assembly 92, where the lens assembly 92 is installed inside the housing 91; and a sensor The component 93, the sensor component 93 is installed inside the housing 91, and is used to sense light passing through the lens component 92 and generate an electrical signal; and, the above-mentioned image processing device 70.
- the embodiment of the present invention also provides a machine-readable storage medium, the machine-readable storage medium stores a number of computer instructions, and when the computer instructions are executed, the above-mentioned image processing is implemented.
- the method is the image processing method of the foregoing embodiments.
- a typical implementation device is a computer.
- the specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.
- the embodiments of the present invention may be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- these computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device,
- the instruction device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be loaded into a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, thereby executing instructions on the computer or other programmable equipment Provides steps for realizing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
一种图像处理方法、装置、可移动平台及机器可读存储介质,所述方法包括:获取原始训练图像、与所述原始训练图像对应的目标图像(201);对所述原始训练图像进行去相关操作,得到多通道训练图像(202);根据所述多通道训练图像和所述目标图像对预设神经网络进行训练(203)。该方法具有高效的成像性能,获得高质量的成像结果,用户使用感受很好。
Description
本发明涉及图像处理领域,尤其是涉及一种图像处理方法、装置、可移动平台及机器可读存储介质。
在传统方式中,在采集到原始图像后,可以对原始图像进行解马赛克、去噪、白平衡、颜色空间转换、图像锐化和彩色增强等处理,得到sRGB(standard Red Green Blue,标准红绿蓝)彩色图像。但是,上述方式得到的sRGB彩色图像,其图像质量仍然比较低,图像质量有提高的可能。
发明内容
本发明第一方面,提供一种图像处理方法,所述方法包括:
获取原始训练图像、与所述原始训练图像对应的目标图像;
对所述原始训练图像进行去相关操作,得到多通道训练图像;
根据所述多通道训练图像和所述目标图像对预设神经网络进行训练。
本发明第二方面,提供一种图像处理方法,所述方法包括:
获取待处理图像;
对所述待处理图像进行去相关操作,得到多通道图像;
通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像。
本发明第三方面,提供一种图像处理方法,所述方法包括:
根据原始训练图像获取多通道训练图像;
根据所述多通道训练图像对预设神经网络进行训练;其中,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络用于获取所述多通道训练图像中的每一通道的训练图像的多个尺度特征。
本发明第四方面,提供一种图像处理方法,所述方法包括:
获取待处理图像、与所述待处理图像对应的多通道图像;
通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像;
其中,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络用 于获取所述多通道图像中的每一通道的图像的多个尺度特征。
本发明第五方面,提供一种图像处理装置,所述装置包括处理器和存储器;所述存储器,用于存储所述处理器可执行的计算机指令;所述处理器,用于从所述存储器读取所述计算机指令以实现上述的方法步骤。
本发明第六方面,提供一种可移动平台,包括:
机体;
动力系统,设于所述机体,所述动力系统用于为所述可移动平台提供动力;
传感系统,用于拍摄图像帧;
以及,如以上所述的图像处理装置,用于对所述传感系统拍摄的图像帧进行处理。
本发明第七方面,提供一种相机,包括:
外壳;
镜头组件,安装在所述外壳内部;
传感器组件,安装在所述外壳内部,用于感知通过所述镜头组件的光并生成电信号;以及,上述的图像处理装置。
本发明第八方面,提供一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现上述的方法。
基于上述技术方案,本发明实施例中,可以对原始训练图像进行去相关操作,得到多通道训练图像,并根据多通道训练图像和目标图像对预设神经网络进行训练。这样,可以基于预设神经网络对待处理图像进行处理,以生成画质提升图像,从而能够提高图像质量。上述方式能够训练高性能的预设神经网络,并将采集到的低质量原始图像转换成高质量的彩色图像,具有高效的成像性能,获得高质量的成像结果,用户使用感受很好。
图1A和图1B是对原始图像进行处理得到sRGB彩色图像的示意图;
图2是一种实施方式中的图像处理方法的实施例示意图;
图3A是一种实施方式中的对多个原始图像进行处理的示意图;
图3B是一种实施方式中的颜色解相关处理的示意图;
图3C是一种实施方式中的多尺度相关处理的示意图;
图3D是一种实施方式中的多尺度相关处理的示意图;
图3E是一种实施方式中的多尺度提取网络的结构示意图;
图3F是一种实施方式中的对比效果示意图;
图4是另一种实施方式中的图像处理方法的实施例示意图;
图5是另一种实施方式中的图像处理方法的实施例示意图;
图6是另一种实施方式中的图像处理方法的实施例示意图;
图7是一种实施方式中的图像处理装置的实施例框图;
图8是一种实施方式中的可移动平台的实施例框图;
图9是一种实施方式中的相机的实施例框图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。另外,在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
本发明使用的术语仅仅是出于描述特定实施例的目的,而非限制本发明。本发明和权利要求书所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。应当理解,本文中使用的术语“和/或”是指包含一个或者多个相关联的列出项目的任何或所有可能的组合。
尽管在本发明可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语用来将同一类型的信息彼此区分开。例如,在不脱离本发明范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”,或者“当……时”,或者“响应于确定”。
为了将原始图像转换为高质量的sRGB彩色图像,在一种可能的实现方式中,参见图1A所示,在采集到原始图像(如raw图像)后,可以对原始图像进行解马赛克、去噪、颜色处理(如白平衡和颜色空间转换)、图像增强(如图像锐化和彩色增强)等处理,得到sRGB彩色图像。但是,上述方式得到的sRGB彩色图像,其图像质量仍然比较低,图像质量有提高的可能。
与上述方式不同的是,本发明实施例中,参见图1B所示,可以利用目标相机采集大规模的数据集,所述数据集包括原始图像(如raw图像)和目标图像,然后利用所述数据集对预设神经网络进行训练,基于训练后的预设神经网络,能够对采集到的原始图像进行处理,继而产生高质量的图像。
基于上述方式,能够训练高性能的预设神经网络,并将采集到的低质量原始图像转换成高质量的sRGB彩色图像,具有高效的成像性能,获得高质量的成像结果。在数据集构建过程中,采用多曝光融合的方法产生高质量的目标图像,并采用特定的神经网络结构进行训练,获得高质量的成像结果。
以下结合具体实施例,对本发明实施例的技术方案进行说明。
实施例1:参见图2所示,为图像处理方法的流程示意图,该方法包括:
步骤201,获取原始训练图像、与该原始训练图像对应的目标图像。
具体的,可以获取多个原始图像,不同原始图像的曝光可以相同,或者,不同原始图像的曝光可以不同。然后,从多个原始图像中选择一个原始图像作为原始训练图像,并对多个原始图像进行多图融合处理,得到目标图像。
示例性的,多个原始图像包括采用包围曝光模式采集的多个原始图像。其中,包围曝光模式(Bracketing)是相机的高级功能,基于包围曝光模式,在按下快门时,不是采集一个原始图像,而是以不同的曝光组合连续采集多个原始图像,从而保证有原始图像符合曝光意图。包围曝光模式的实现方式是:先按照测光值曝光采集一个原始图像,然后在其基础上增加和减少曝光量各采集一个或者多个原始图像,不同原始图像的曝光量不同。
示例性的,可以获取每个原始图像的曝光补偿信息,并根据每个原始图像的曝光补偿信息,从多个原始图像中选择一个原始图像作为原始训练图像。
示例性的,可以获取多个原始图像中的每个原始图像对应的RGB图像, 对多个原始图像对应的多个RGB图像进行多图融合处理,得到目标图像。
在一个例子中,获取多个原始图像中的每个原始图像对应的RGB图像,可以包括:针对多个原始图像中的每个原始图像,对该原始图像进行解马赛克、去噪、自动白平衡、图像锐化、彩色增强、颜色空间转换、预去噪中的至少一种处理,得到该原始图像对应的RGB(即红绿蓝)图像。
在一个例子中,对多个原始图像对应的多个RGB图像进行多图融合处理,得到目标图像,可以包括但不限于:对多个原始图像对应的多个RGB图像进行配准处理和融合处理,生成目标图像。或者,对多个原始图像对应的多个RGB图像进行配准处理和融合处理,并对处理后的图像进行对比度增强和/或颜色空间转换,生成目标图像。当然,上述方式只是示例,对此不做限制。
综上所述,可以将多个RGB图像融合成目标图像,目标图像能够满足低噪声和高动态范围的质量要求,是能够满足高质量要求的目标图像。
参见图1B所示,本发明实施例中,需要根据数据集对预设神经网络进行训练,所述数据集包括原始图像和目标图像,因此,需要构建数据集,且数据集能够满足图像的高质量要求,如低噪声水平、高动态范围、准确的彩色呈现、有效的对比度增强结果等。为此,数据集的构建过程可以包括:
1、采集多个原始图像,不同原始图像的曝光可以相同或者不同。
具体的,对于目标相机来说,可以利用目标相机的自动模式,采集多个场景的原始图像(如Raw图像),对此场景的数量不做限制,例如,场景的数量为500时,则采集500个场景的原始图像,如白天场景的原始图像、夜晚场景的原始图像、雨天场景的原始图像等,对此场景的类型不做限制。
针对每个场景来说,可以采用包围曝光模式采集多个不同曝光的原始图像,对每个场景的原始图像的数量不做限制,不同场景的原始图像的数量可以相同,也可以不同。例如,针对每个场景来说,可以采用包围曝光模式采集7-11个不同曝光的原始图像,当然,7-11只是一个示例,对此不做限制。
例如,针对白天场景来说,可以使用手持或三脚架拍摄方式,采用包围曝光模式采集最少7个不同曝光的原始图像,针对夜晚场景,可以采用三脚架拍摄方式,采用包围曝光模式采集最少11个不同曝光的原始图像。
其中,针对每个场景来说,通过采用包围曝光模式采集多个不同曝光的 原始图像,可以包含整个场景的动态范围。而且,通过这些不同曝光的原始图像,方便后期图像配准和融合,以便后期图像融合过程中消除噪声。
其中,在采用包围曝光模式采集多个不同曝光的原始图像时,允许画面中的物体有小幅位移,且不拍摄具有大幅度运动位移的原始图像。
2、对多个原始图像进行多图融合处理,得到目标图像。
目标图像是训练预设神经网络的图像,具有参考级的图像质量,需要提供能够满足高质量要求的目标图像。基于此,为了有效实现低噪声和高动态范围的质量要求,可以对多个原始图像进行多图融合处理,得到目标图像。
参见图3A所示,针对每个场景的多个原始图像来说,可以采用编辑软件(如Photoshop等)对多个原始图像进行处理,得到每个原始图像对应的RGB图像。例如,针对每个原始图像,对该原始图像进行解马赛克、自动白平衡、去噪、图像锐化、彩色增强、颜色空间转换操作等处理,得到该原始图像对应的RGB图像,如16bit的通用颜色空间的RGB图像(如Adobe RGB图像)。采用去噪软件对每个RGB图像进行预去噪处理,得到去噪后的RGB图像。然后,采用HDR软件(如Aurora HDR等)对去噪后的多个RGB图像进行多图融合处理,得到目标图像。例如,对去噪后的多个RGB图像进行配准、融合、对比度增强和颜色空间空间转换等处理,得到目标图像。颜色空间空间转换用于将RGB图像转换到sRGB空间,即目标图像为sRGB彩色图像。
3、从多个原始图像中选择一个原始图像作为原始训练图像。
具体的,可以获取每个原始图像的曝光补偿信息,并根据每个原始图像的曝光补偿信息,从多个原始图像中选择一个原始图像作为原始训练图像。
例如,针对每个场景来说,可以从该场景的多个原始图像中选择一个原始图像作为原始训练图像。当原始图像的曝光补偿信息越大时,则说明这个原始图像的拍摄环境不好,需要进行曝光补偿,当原始图像的曝光补偿信息越小时,则说明这个原始图像的拍摄环境比较好,不需要进行曝光补偿或者只进行比较小的曝光补偿,因此,可以根据每个原始图像的曝光补偿信息,从多个原始图像中选择曝光补偿信息小的原始图像作为原始训练图像。
综上所述,可以得到每个场景的原始训练图像和目标图像,且每个场景的处理过程相同,后续以一个场景的原始训练图像和目标图像为例。
步骤202,对原始训练图像进行去相关操作,得到多通道训练图像。
具体的,可以根据原始训练图像获取R通道图像、G通道图像、B通道图像;然后,去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到多通道训练图像。其中,所述多通道训练图像可以包括亮度分量和色度分量,且所述色度分量包括第一色度分量和第二色度分量。
其中,根据原始训练图像获取R通道图像、G通道图像、B通道图像,可以包括但不限于:对原始训练图像进行上采样操作(如2倍的双线性上采样操作),得到R通道图像、G通道图像和B通道图像。
其中,去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到多通道训练图像,可以包括但不限于:根据第一转换矩阵去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到多通道训练图像;其中,第一转换矩阵用于解除图像之间相关性。
在一个例子中,可以采用如下方式获取第一转换矩阵:获取原始输入图像,并根据原始输入图像获取R通道输入图像、G通道输入图像、B通道输入图像;进一步的,基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取第一转换矩阵。
示例性的,基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取第一转换矩阵,可以包括但不限于:从R通道输入图像、G通道输入图像、B通道输入图像中各采集多个像素向量,并将采集的多个像素向量构成像素向量矩阵;对所述像素向量矩阵进行主成分分析,得到三个基向量;根据三个基向量获取第一转换矩阵。
在一个例子中,考虑到彩色图像所表达的信息具有冗余性,一个像素点的三维颜色向量的各个元素之间具有线性相关性,若解除这样的相关性,就会将一个三维映射问题转化为3个独立的一维映射问题,使得问题得到简化,并提升算法的性能。基于此,本发明实施例中,可以进行颜色解相关处理。
参见图3B所示,为颜色解相关处理的示意图。首先,可以获取第一转换矩阵,第一转换矩阵用于解除图像之间的相关性。然后,通过第一转换矩阵对原始训练图像进行去相关操作,得到多通道训练图像,多通道训练图像包括亮度分量和色度分量,且所述色度分量包括第一色度分量和第二色度分量。
1、获取第一转换矩阵,第一转换矩阵用于解除图像之间的相关性。
首先,获取原始输入图像,例如,将数据集中的多个原始图像作为原始 输入图像。针对每个原始输入图像,对该原始输入图像进行解相关运算,得到三通道的RGB图像(即相机颜色空间的RGB图像)。例如,可以对原始输入图像进行2倍的双线性上采样运算,得到三通道的RGB图像,所述三通道的RGB图像包括R通道输入图像、G通道输入图像、B通道输入图像。
然后,针对每个RGB图像,从该RGB图像的R通道输入图像中采集多个像素向量,从该RGB图像的G通道输入图像中采集多个像素向量,从该RGB图像的B通道输入图像中采集多个像素向量,并将采集的像素向量构成像素向量矩阵。例如,从R通道输入图像中采集N个像素向量,从G通道输入图像中采集N个像素向量,从B通道输入图像中采集N个像素向量,这样,可以将采集的所有像素向量构成一个N*3的像素向量矩阵,N为像素样本数量。N的取值根据经验配置,如N可以大于或等于1000,对此不做限制。
然后,对该像素向量矩阵进行主成分分析,得到三个基向量,并根据三个基向量获取第一转换矩阵。例如,可以采用PCA算法对该像素向量矩阵进行主成分分析,得到三个基向量(也可以称为坐标基列向量),这三个基向量可以分别为X
1、X
2和X
3,X
1为样本在该坐标上方差最大的主分量。将这三个基向量构成第一转换矩阵M,M=[X
1,X
2,X
3]
T,T用于表示转置。
其中,PCA(Principal Component Analysis,主成分分析)算法是一种使用广泛的数据降维算法,主要思想是将n维特征映射到k维上,这k维是全新的正交特征,也被称为主成分,是在原有n维特征的基础上重新构造出来的k维特征。关于对像素向量矩阵进行主成分分析,得到三个基向量的过程,可以采用PCA算法实现,对此不再赘述,只要能够得到三个基向量即可。
2、通过第一转换矩阵对原始训练图像进行去相关操作,得到多通道训练图像,多通道训练图像可以包括亮度分量、第一色度分量和第二色度分量。
先根据原始训练图像获取R通道图像、G通道图像、B通道图像。例如,对原始训练图像进行上采样操作(如双线性上采样操作),得到三通道的RGB图像,三通道的RGB图像包括R通道图像、G通道图像和B通道图像。
然后,根据第一转换矩阵去除R通道图像、G通道图像、B通道图像之间的相关性,得到多通道训练图像;所述多通道训练图像可以包括亮度分量和色度分量,所述色度分量包括第一色度分量和第二色度分量。
例如,在得到R通道图像、G通道图像和B通道图像之后,可以采用如 下公式进行转换:I’=M·I,I表示R通道图像、G通道图像和B通道图像中的像素值,而I’表示多通道训练图像中的像素值,M表示第一转换矩阵。显然,上述公式是根据第一转换矩阵去除R通道图像、G通道图像、B通道图像之间的相关性,得到亮度分量、第一色度分量和第二色度分量。
步骤203,根据多通道训练图像和目标图像对预设神经网络进行训练。
具体的,可以以多通道训练图像作为预设神经网络的输入,生成输出图像;根据输出图像和目标图像对预设神经网络的网络参数进行更新。
在一个例子中,多通道训练图像可以包括亮度分量和色度分量,预设神经网络可以包括多尺度提取网络,多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络,该第一多尺度提取网络用于获取亮度分量的多个尺度特征,该第二多尺度提取网络用于获取色度分量的多个尺度特征。
示例性的,色度分量可以包括第一色度分量和第二色度分量,第二多尺度提取网络可以包括用于获取第一色度分量的多个尺度特征的多尺度提取网络、用于获取第二色度分量的多个尺度特征的多尺度提取网络。
示例性的,第一多尺度提取网络可以包括但不限于:卷积层和/或池化层;第二多尺度提取网络可以包括但不限于:卷积层和/或池化层。
示例性的,预设神经网络还可以包括多个依次连接的拼接层;拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。进一步的,相邻的拼接层之间通过预设的转置卷积层连接,转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
在一个例子中,根据输出图像和目标图像对预设神经网络的网络参数进行更新,可以包括但不限于:根据第二转换矩阵对所述输出图像进行转换,得到相机RGB图像;利用相机元数据将所述相机RGB图像转换为标准RGB图像;根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新。
示例性的,可以根据第一转换矩阵确定第二转换矩阵;其中,第一转换矩阵为[X
1,X
2,X
3]
T,T用于表示转置,第二转换矩阵为[X
1,X
2,X
3]。
参见图3B所示,步骤202中,可以得到多通道训练图像,多通道训练图像可以包括亮度分量和色度分量,且色度分量包括第一色度分量和第二色度分量。步骤201中,可以得到目标图像。基于此,在步骤203中,将多通道 训练图像和目标图像作为预设神经网络的输入,对预设神经网络进行训练。
参见图3C和图3D所示(图3D是图3C的一个示例,示出了原始训练图像(图3D中将原始训练图像称为Raw图像)、亮度分量、色度分量和输出图像等),预设神经网络可以包括多尺度提取网络,多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络。第一多尺度提取网络用于获取亮度分量的多个尺度特征,因此,第一多尺度提取网络也可以称为亮度分量多尺度提取网络。第二多尺度提取网络用于获取色度分量的多个尺度特征,因此,第二多尺度提取网络也可以称为色度分量多尺度提取网络。
其中,由于色度分量可以包括第一色度分量和第二色度分量,因此,第二多尺度提取网络可以包括用于获取第一色度分量的多个尺度特征的多尺度提取网络、用于获取第二色度分量的多个尺度特征的多尺度提取网络。
参见图3C所示,将多通道训练图像的亮度分量作为第一多尺度提取网络的输入,将多通道训练图像的色度分量作为第二多尺度提取网络的输入,基于第一多尺度提取网络和第二多尺度提取网络对亮度分量和色度分量进行处理,如卷积、池化、通道拼接、转置卷积等处理,最终,得到输出图像。
然后,根据第二转换矩阵对输出图像进行转换,得到相机RGB图像,并利用相机元数据将相机RGB图像转换为标准RGB图像,并根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新,对此参数更新过程不做限制,只要是基于标准RGB图像和目标图像对网络参数进行更新即可。
例如,标准RGB图像的格式是sRGB,目标图像的格式也是sRGB,通过不断调整预设神经网络的网络参数,使得标准RGB图像与目标图像越来越接近,这个过程就是网络参数的更新过程,最终得到最优的网络参数。其中,预设神经网络的网络参数包括卷积层参数、池化层参数等,对此不做限制。
参见上述实施例,已经介绍第一转换矩阵的获取过程,第一转换矩阵可以为[X
1,X
2,X
3]
T,基于第一转换矩阵能够得到第二转换矩阵,第二转换矩阵可以为[X
1,X
2,X
3],例如,第二转换矩阵为M’,且M’=[X
1,X
2,X
3]。
综上所述,基于第一多尺度提取网络和第二多尺度提取网络得到输出图像后,可以根据第二转换矩阵M’对输出图像进行转换,得到相机RGB空间的图像,本文称为相机RGB图像。然后,利用相机元数据计算颜色转换矩阵,并根据颜色转换矩阵对相机RGB图像进行转换,得到sRGB空间的图像,本 文称为标准RGB图像。基于标准RGB图像和目标图像对网络参数进行更新。
参见图3E所示,为第一多尺度提取网络和第二多尺度提取网络的结构示意图,通过第一多尺度提取网络和第二多尺度提取网络进行多尺度处理,第一多尺度提取网络和第二多尺度提取网络均具有多分辨率的处理结构,以便在不同尺度的图像上进行处理。例如,可以在小尺度的图像上可以进行去噪操作和锐化操作,可以在大尺度的图像上可以进行局部光照调整操作。
首先,参见上述实施例,可以通过第一转换矩阵对原始训练图像进行去相关操作,得到多通道训练图像,多通道训练图像可以包括亮度分量L、第一色度分量a和第二色度分量b,亮度分量L为图像在最大主分量上的投影,是图像的亮度分量,第一色度分量a和第二色度分量b是图像的色度分量。
然后,将亮度分量输出给第一多尺度提取网络,第一多尺度提取网络可以包括卷积层和/或池化层。参见图3E所示,第一多尺度提取网络包括多个卷积层和多个池化层,针对尺寸为W*H的亮度分量,由卷积层11对该亮度分量进行卷积处理,由池化层12对卷积处理后的亮度分量进行池化(如2倍最大值池化)处理,得到W/2*H/2的图像,即为原有图像的一半。针对尺寸为W/2*H/2的亮度分量,由卷积层13对该亮度分量进行卷积处理,由池化层14对卷积处理后的亮度分量进行池化(如2倍最大值池化)处理,得到W/4*H/4的图像。针对尺寸为W/4*H/4的亮度分量,由卷积层15对该亮度分量进行卷积处理。当然,上述是以3个卷积层、2个池化层为例,对此不做限制。
此外,将色度分量(如第一色度分量a和第二色度分量b,后续以色度分量为例进行说明)输出给第二多尺度提取网络,第二多尺度提取网络可以包括卷积层和/或池化层。参见图3E所示,第二多尺度提取网络包括多个卷积层和多个池化层,针对尺寸为W*H的色度分量,由卷积层21对该色度分量进行卷积处理,由池化层22对卷积处理后的色度分量进行池化(如2倍最大值池化)处理,得到W/2*H/2的图像,即为原有图像的一半。针对尺寸为W/2*H/2的色度分量,由卷积层23对该色度分量进行卷积处理,由池化层24对卷积处理后的色度分量进行池化(如2倍最大值池化)处理,得到W/4*H/4的图像。针对尺寸为W/4*H/4的色度分量,由卷积层25对该色度分量进行卷积处理。当然,上述是以3个卷积层、2个池化层为例,对此不做限制。
其中,第一多尺度提取网络对亮度分量进行卷积和池化处理,与第二多 尺度提取网络对色度分量进行卷积和池化处理,二者是相互独立的。
参见图3E所示,预设神经网络还可以包括多个拼接层(即通道拼接层)、多个转置卷积层、多个卷积层。例如,可以将W/4*H/4的亮度分量和W/4*H/4的色度分量输出给拼接层31,由拼接层31对W/4*H/4的亮度分量和W/4*H/4的色度分量(即同一尺度的亮度分量和色度分量)进行通道拼接操作,将通道拼接操作后的图像输出给卷积层32,由卷积层32对图像进行卷积处理,由转置卷积层33对卷积处理后的图像进行转置卷积操作(如2倍转置卷积操作),对此转置卷积操作的实现过程不做限制,得到W/2*H/2的图像。
将转置卷积层33的W/2*H/2的图像、卷积层13的W/2*H/2的亮度分量、卷积层23的W/2*H/2的色度分量,输出给拼接层34,由拼接层34对W/2*H/2的图像、W/2*H/2的亮度分量、W/2*H/2的色度分量进行通道拼接操作,将通道拼接操作后的图像输出给卷积层35,由卷积层35对图像进行卷积处理,由转置卷积层36对卷积处理后的图像进行转置卷积操作,得到W*H的图像。
进一步的,将转置卷积层36的W*H的图像、卷积层11的W*H的亮度分量、卷积层13的W*H的色度分量,输出给拼接层37,由拼接层37对W*H的图像、W*H的亮度分量、W*H的色度分量进行通道拼接操作,将通道拼接操作后的图像输出给卷积层38,由卷积层38对图像进行卷积处理,最终输出W*H的输出图像。然后,根据第二转换矩阵对输出图像进行转换,得到相机RGB图像,并利用相机元数据将相机RGB图像转换为标准RGB图像,并根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新。
在上述实施例中,亮度分量和色度分量是以通道拼接的方式连接在一起,合成一条上采样的通道。为了保留图像的纹理细节,按照各自的尺度,亮度分量和色调分量的图像会以通道拼接的方式合并到上采样的卷积操作中。
在上述实施例中,各卷积层可以由多个Conv层(即卷积层)和多个ReLU(Rectified Linear Unit,修正线性单元)层组成,参见图3E所示。
在上述实施例中,预设神经网络可以为任意架构的卷积神经网络(CNN),对此预设神经网络的结构不做限制。预设神经网络可以实现颜色解相关处理和多分辨率处理,颜色解相关处理参见步骤202,多分辨率处理参见步骤203。
基于上述技术方案,本发明实施例中,可以对原始训练图像进行去相关操作,得到多通道训练图像,并根据多通道训练图像和目标图像对预设神经 网络进行训练。这样,可以基于预设神经网络对待处理图像进行处理,以生成画质提升图像,从而能够提高图像质量。上述方式能够训练高性能的预设神经网络,并将采集到的低质量原始图像转换成高质量的sRGB彩色图像,具有高效的成像性能,获得高质量的成像结果,用户使用感受很好。
例如,基于预设神经网络生成的画质提升图像,具有更少的噪声、更高的动态范围、更加高的局部细节呈现,在视觉效果上更好。
参见图3F所示,是本发明图像与现有图像的对比效果图。左上角的图像和右上角的图像是现有图像。左下角的图像是基于预设神经网络生成的画质提升图像,右下角的图像是目标图像。显然,画质提升图像的图像质量很高,具有更少的噪声、更高的动态范围、更加高的局部细节呈现。
在上述实施例中,介绍了预设神经网络的训练,基于训练的预设神经网络,可以对待处理图像进行处理,得到画质提升图像,以下对此进行说明。
实施例2:参见图4所示,为图像处理方法的流程图,该方法可以包括:
步骤401,获取待处理图像,即需要提升质量的图像。
具体的,对于目标相机来说,可以采集待处理图像,对此不做限制。
步骤402,对该待处理图像进行去相关操作,得到多通道图像。
具体的,可以根据待处理图像获取R通道图像、G通道图像、B通道图像,然后,去除R通道图像、G通道图像、B通道图像之间的相关性,得到多通道图像。其中,所述多通道图像可以包括亮度分量和色度分量,且所述色度分量可以包括第一色度分量和第二色度分量。
其中,根据待处理图像获取R通道图像、G通道图像、B通道图像,可以包括但不限于:对待处理图像进行上采样操作(如2倍的双线性上采样操作),得到R通道图像、G通道图像、B通道图像。
去除R通道图像、G通道图像、B通道图像之间的相关性,得到多通道图像,包括:根据第一转换矩阵去除R通道图像、G通道图像、B通道图像之间的相关性,得到多通道图像;第一转换矩阵用于解除图像之间相关性。
在一个例子中,可以采用如下方式获取第一转换矩阵:获取原始输入图像,并根据原始输入图像获取R通道输入图像、G通道输入图像、B通道输入图像;进一步的,基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取第一转换矩阵。
示例性的,基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取第一转换矩阵,可以包括但不限于:从R通道输入图像、G通道输入图像、B通道输入图像中各采集多个像素向量,并将采集的多个像素向量构成像素向量矩阵;对所述像素向量矩阵进行主成分分析,得到三个基向量;根据三个基向量获取第一转换矩阵。
在一个例子中,考虑到彩色图像所表达的信息具有冗余性,一个像素点的三维颜色向量的各个元素之间具有线性相关性,若解除这样的相关性,就会将一个三维映射问题转化为3个独立的一维映射问题,使得问题得到简化,并提升算法的性能。基于此,本实施例可以进行颜色解相关处理。具体的,获取第一转换矩阵,第一转换矩阵用于解除图像之间的相关性。然后,通过第一转换矩阵对待处理图像进行去相关操作,得到多通道图像,多通道图像包括亮度分量和色度分量,且色度分量包括第一色度分量和第二色度分量。
1、获取第一转换矩阵,第一转换矩阵用于解除图像之间的相关性。需要注意的是,第一转换矩阵是在执行步骤401之前,便已经获取的。
首先,获取原始输入图像,例如,将数据集中的多个原始图像作为原始输入图像。针对每个原始输入图像,对该原始输入图像进行解相关运算,得到三通道的RGB图像(即相机颜色空间的RGB图像)。例如,可以对原始输入图像进行2倍的双线性上采样运算,得到三通道的RGB图像,所述三通道的RGB图像包括R通道输入图像、G通道输入图像、B通道输入图像。
然后,针对每个RGB图像,从该RGB图像的R通道输入图像中采集多个像素向量,从该RGB图像的G通道输入图像中采集多个像素向量,从该RGB图像的B通道输入图像中采集多个像素向量,并将采集的像素向量构成像素向量矩阵。例如,从R通道输入图像中采集N个像素向量,从G通道输入图像中采集N个像素向量,从B通道输入图像中采集N个像素向量,这样,可以将采集的所有像素向量构成一个N*3的像素向量矩阵,N为像素样本数量。N的取值根据经验配置,如N可以大于或等于1000,对此不做限制。
然后,对该像素向量矩阵进行主成分分析,得到三个基向量,并根据三个基向量获取第一转换矩阵。例如,可以采用PCA算法对该像素向量矩阵进行主成分分析,得到三个基向量(也可以称为坐标基列向量),这三个基向量可以分别为X
1、X
2和X
3,X
1为样本在该坐标上方差最大的主分量。将这三 个基向量构成第一转换矩阵M,M=[X
1,X
2,X
3]
T,T用于表示转置。
2、在步骤402中,通过第一转换矩阵对待处理图像进行去相关操作,得到多通道图像,多通道图像包括亮度分量、第一色度分量和第二色度分量。
具体的,先根据待处理图像获取R通道图像、G通道图像、B通道图像。例如,对待处理图像进行上采样操作(如双线性上采样操作),得到三通道的RGB图像,三通道的RGB图像包括R通道图像、G通道图像和B通道图像。
然后,根据第一转换矩阵去除R通道图像、G通道图像、B通道图像之间的相关性,得到多通道图像;所述多通道图像可以包括亮度分量和色度分量,所述色度分量包括第一色度分量和第二色度分量。
例如,在得到R通道图像、G通道图像和B通道图像之后,可以采用如下公式进行转换:I’=M·I,I表示R通道图像、G通道图像和B通道图像中的像素值,而I’表示多通道图像中的像素值,M表示第一转换矩阵。显然,上述公式是根据第一转换矩阵去除R通道图像、G通道图像、B通道图像之间的相关性,得到亮度分量、第一色度分量和第二色度分量。
步骤403,通过预设神经网络对多通道图像进行处理,以生成画质提升图像。具体的,以多通道图像作为预设神经网络的输入,生成输出图像;根据预设神经网络的网络参数对输出图像进行处理,生成画质提升图像。
在一个例子中,多通道图像可以包括亮度分量和色度分量,预设神经网络可以包括多尺度提取网络,该多尺度提取网络可以包括第一多尺度提取网络和第二多尺度提取网络,第一多尺度提取网络用于获取亮度分量的多个尺度特征,第二多尺度提取网络用于获取色度分量的多个尺度特征。
其中,色度分量可以包括第一色度分量和第二色度分量;第二多尺度提取网络可以包括用于获取第一色度分量的多个尺度特征的多尺度提取网络、用于获取第二色度分量的多个尺度特征的多尺度提取网络。
示例性的,第一多尺度提取网络可以包括但不限于:卷积层和/或池化层;第二多尺度提取网络可以包括但不限于:卷积层和/或池化层。
在一个例子中,根据预设神经网络的网络参数对输出图像进行处理,生成画质提升图像,可以包括:根据第二转换矩阵对输出图像进行转换,得到相机RGB图像;利用相机元数据将相机RGB图像转换为标准RGB图像;根据预设神经网络的网络参数对标准RGB图像进行处理,生成画质提升图像。
示例性的,可以根据第一转换矩阵确定第二转换矩阵;其中,第一转换矩阵为[X
1,X
2,X
3]
T,T用于表示转置,第二转换矩阵为[X
1,X
2,X
3]。
示例性的,预设神经网络还可以包括多个依次连接的拼接层;拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。进一步的,相邻的拼接层之间通过预设的转置卷积层连接,转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
步骤402中,可以得到多通道图像,该多通道图像可以包括亮度分量和色度分量,且色度分量可以包括第一色度分量和第二色度分量。步骤403中,可以将多通道图像作为预设神经网络的输入,生成输出图像。
具体的,预设神经网络包括第一多尺度提取网络和第二多尺度提取网络,第一多尺度提取网络用于获取亮度分量的多个尺度特征,第二多尺度提取网络用于获取色度分量的多个尺度特征,将多通道图像的亮度分量作为第一多尺度提取网络的输入,将多通道图像的色度分量作为第二多尺度提取网络的输入,基于第一多尺度提取网络和第二多尺度提取网络对亮度分量和色度分量进行处理,如卷积、池化、通道拼接、转置卷积等,最终得到输出图像。
然后,根据第二转换矩阵对输出图像进行转换,得到相机RGB图像,并利用相机元数据将相机RGB图像转换为标准RGB图像。在得到标准RGB图像后,由于上述实施例中已经训练预设神经网络的网络参数,因此,可以根据预设神经网络的网络参数对标准RGB图像进行处理,生成画质提升图像。
综上所述,基于第一多尺度提取网络和第二多尺度提取网络得到输出图像后,可以根据第二转换矩阵M’对输出图像进行转换,得到相机RGB空间的图像,本文称为相机RGB图像。然后,利用相机元数据计算颜色转换矩阵,并根据颜色转换矩阵对相机RGB图像进行转换,得到sRGB空间的图像,本文称为标准RGB图像。然后,根据预设神经网络的网络参数对标准RGB图像进行处理,生成画质提升图像,对此预设神经网络的处理过程不做限制。
参见图3E所示,为第一多尺度提取网络和第二多尺度提取网络的结构示意图,通过第一多尺度提取网络和第二多尺度提取网络进行多尺度处理,第一多尺度提取网络和第二多尺度提取网络均具有多分辨率的处理结构,以便在不同尺度的图像上进行处理。例如,可以在小尺度的图像上可以进行去噪 操作和锐化操作,可以在大尺度的图像上可以进行局部光照调整操作。
首先,参见上述实施例,可以通过第一转换矩阵对待处理图像进行去相关操作,得到多通道图像,多通道图像可以包括亮度分量L、第一色度分量a和第二色度分量b,亮度分量L为图像在最大主分量上的投影,是图像的亮度分量,第一色度分量a和第二色度分量b是图像的色度分量。
然后,将亮度分量输出给第一多尺度提取网络,第一多尺度提取网络可以包括卷积层和/或池化层。参见图3E所示,第一多尺度提取网络包括多个卷积层和多个池化层,针对尺寸为W*H的亮度分量,由卷积层11对该亮度分量进行卷积处理,由池化层12对卷积处理后的亮度分量进行池化(如2倍最大值池化)处理,得到W/2*H/2的图像,即为原有图像的一半。针对尺寸为W/2*H/2的亮度分量,由卷积层13对该亮度分量进行卷积处理,由池化层14对卷积处理后的亮度分量进行池化(如2倍最大值池化)处理,得到W/4*H/4的图像。针对尺寸为W/4*H/4的亮度分量,由卷积层15对该亮度分量进行卷积处理。当然,上述是以3个卷积层、2个池化层为例,对此不做限制。
此外,将色度分量(如第一色度分量a和第二色度分量b,后续以色度分量为例进行说明)输出给第二多尺度提取网络,第二多尺度提取网络可以包括卷积层和/或池化层。参见图3E所示,第二多尺度提取网络包括多个卷积层和多个池化层,针对尺寸为W*H的色度分量,由卷积层21对该色度分量进行卷积处理,由池化层22对卷积处理后的色度分量进行池化(如2倍最大值池化)处理,得到W/2*H/2的图像,即为原有图像的一半。针对尺寸为W/2*H/2的色度分量,由卷积层23对该色度分量进行卷积处理,由池化层24对卷积处理后的色度分量进行池化(如2倍最大值池化)处理,得到W/4*H/4的图像。针对尺寸为W/4*H/4的色度分量,由卷积层25对该色度分量进行卷积处理。当然,上述是以3个卷积层、2个池化层为例,对此不做限制。
其中,第一多尺度提取网络对亮度分量进行卷积和池化处理,与第二多尺度提取网络对色度分量进行卷积和池化处理,二者是相互独立的。
参见图3E所示,预设神经网络还可以包括多个拼接层(即通道拼接层)、多个转置卷积层、多个卷积层。例如,可以将W/4*H/4的亮度分量和W/4*H/4的色度分量输出给拼接层31,由拼接层31对W/4*H/4的亮度分量和W/4*H/4的色度分量(即同一尺度的亮度分量和色度分量)进行通道拼接操作,将通 道拼接操作后的图像输出给卷积层32,由卷积层32对图像进行卷积处理,由转置卷积层33对卷积处理后的图像进行转置卷积操作(如2倍转置卷积操作),对此转置卷积操作的实现过程不做限制,得到W/2*H/2的图像。
将转置卷积层33的W/2*H/2的图像、卷积层13的W/2*H/2的亮度分量、卷积层23的W/2*H/2的色度分量,输出给拼接层34,由拼接层34对W/2*H/2的图像、W/2*H/2的亮度分量、W/2*H/2的色度分量进行通道拼接操作,将通道拼接操作后的图像输出给卷积层35,由卷积层35对图像进行卷积处理,由转置卷积层36对卷积处理后的图像进行转置卷积操作,得到W*H的图像。
进一步的,将转置卷积层36的W*H的图像、卷积层11的W*H的亮度分量、卷积层13的W*H的色度分量,输出给拼接层37,由拼接层37对W*H的图像、W*H的亮度分量、W*H的色度分量进行通道拼接操作,将通道拼接操作后的图像输出给卷积层38,由卷积层38对图像进行卷积处理,最终输出W*H的输出图像。然后,根据第二转换矩阵对输出图像进行转换,得到相机RGB图像,并利用相机元数据将相机RGB图像转换为标准RGB图像,并根据预设神经网络的网络参数对标准RGB图像进行处理,生成画质提升图像。
在上述实施例中,亮度分量和色度分量是以通道拼接的方式连接在一起,合成一条上采样的通道。为了保留图像的纹理细节,按照各自的尺度,亮度分量和色调分量的图像会以通道拼接的方式合并到上采样的卷积操作中。
在上述实施例中,各卷积层可以由多个Conv层(即卷积层)和多个ReLU(Rectified Linear Unit,修正线性单元)层组成。上述实施例中,预设神经网络可以为任意架构的卷积神经网络,对此预设神经网络的结构不做限制。
基于上述技术方案,本发明实施例中,能够训练高性能的预设神经网络,并将采集到的低质量原始图像转换成高质量的sRGB彩色图像,具有高效的成像性能,获得高质量的成像结果,用户使用感受很好。
实施例3:在另一种预设神经网络的训练过程中,可以根据多通道训练图像对预设神经网络进行训练,参见图5所示,为该图像处理方法的流程图。
步骤501,根据原始训练图像获取多通道训练图像。
其中,根据原始训练图像获取多通道训练图像的过程,参见实施例1。
步骤502,根据多通道训练图像对预设神经网络进行训练;其中,该预设神经网络包括多尺度提取网络,多尺度提取网络用于获取多通道训练图像中 的每一通道的训练图像的多个尺度特征。例如,多通道训练图像可以包括亮度分量和色度分量,多尺度提取网络可以包括第一多尺度提取网络和第二多尺度提取网络,第一多尺度提取网络用于获取亮度分量的多个尺度特征,第二多尺度提取网络用于获取色度分量的多个尺度特征。
在一个例子中,可以获取原始训练图像、与原始训练图像对应的目标图像。基于此,可以根据多通道训练图像和目标图像对预设神经网络进行训练。
获取原始训练图像、与原始训练图像对应的目标图像,包括:获取多个原始图像;不同原始图像的曝光相同或不同;从多个原始图像中选择一个原始图像作为原始训练图像,对多个原始图像进行多图融合处理,得到目标图像。原始训练图像、目标图像的获取方式,参见实施例1,在此不再赘述。
其中,关于预设神经网络的结构,可以参见实施例1,在此不再赘述。
在一个例子中,根据多通道训练图像和目标图像对预设神经网络进行训练,可以包括:以多通道训练图像作为预设神经网络的输入,生成输出图像;根据输出图像和目标图像对预设神经网络的网络参数进行更新。
其中,根据输出图像和目标图像对预设神经网络的网络参数进行更新,可以包括:根据第二转换矩阵对输出图像进行转换,得到相机RGB图像;利用相机元数据将相机RGB图像转换为标准RGB图像;进一步的,根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新。
基于上述技术方案,本发明实施例中,能够训练高性能的预设神经网络,并将采集到的低质量原始图像转换成高质量的sRGB彩色图像,具有高效的成像性能,获得高质量的成像结果,用户使用感受很好。
在上述实施例3中,介绍了预设神经网络的训练,基于训练的预设神经网络,可以对待处理图像进行处理,得到画质提升图像,以下对此进行说明。
实施例4:参见图6所示,为图像处理方法的流程图,该方法可以包括:
步骤601,获取待处理图像、与待处理图像对应的多通道图像。
步骤602,通过预设神经网络对多通道图像进行处理,以生成画质提升图像;其中,该预设神经网络可以包括多尺度提取网络,所述多尺度提取网络用于获取多通道图像中的每一通道的图像的多个尺度特征。
其中,通过预设神经网络对多通道图像进行处理,以生成画质提升图像,可以包括:以多通道图像作为预设神经网络的输入,生成输出图像;根据预 设神经网络的网络参数对输出图像进行处理,生成画质提升图像。
其中,关于预设神经网络的结构,可以参见实施例1,在此不再赘述。
其中,根据预设神经网络的网络参数对输出图像进行处理,生成画质提升图像,包括:根据第二转换矩阵对输出图像进行转换,得到相机RGB图像;利用相机元数据将相机RGB图像转换为标准RGB图像;根据预设神经网络的网络参数对标准RGB图像进行处理,生成画质提升图像。
基于上述技术方案,本发明实施例中,能够训练高性能的预设神经网络,并将采集到的低质量原始图像转换成高质量的sRGB彩色图像,具有高效的成像性能,获得高质量的成像结果,用户使用感受很好。
实施例5:
基于与上述方法同样的构思,本发明实施例中还提供一种图像处理装置70,参见图7所示,图像处理装置70包括处理器71和存储器72;所述存储器72,用于存储所述处理器71可执行的计算机指令;所述处理器71,用于从所述存储器72读取所述计算机指令,以实现上述的图像处理方法。具体的,所述处理器71可以从所述存储器72读取所述计算机指令,以实现如下操作:
获取原始训练图像、与所述原始训练图像对应的目标图像;
对所述原始训练图像进行去相关操作,得到多通道训练图像;
根据所述多通道训练图像和所述目标图像对预设神经网络进行训练。
所述处理器71获取原始训练图像、与所述原始训练图像对应的目标图像时具体用于:获取多个原始图像;其中,不同原始图像的曝光相同或不同;
从所述多个原始图像中选择一个原始图像作为所述原始训练图像;
对所述多个原始图像进行多图融合处理,得到所述目标图像。
所述处理器71从多个原始图像中选择一个原始图像作为所述原始训练图像时具体用于:获取每个原始图像的曝光补偿信息,根据每个原始图像的曝光补偿信息,从多个原始图像中选择一个原始图像作为所述原始训练图像。
所述处理器71对所述多个原始图像进行多图融合处理,得到所述目标图像时具体用于:获取多个原始图像中的每个原始图像对应的RGB图像,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像。
所述处理器71获取所述多个原始图像中的每个原始图像对应的RGB图像时具体用于:针对所述多个原始图像中的每个原始图像,对所述原始图像 进行解马赛克、去噪、自动白平衡、图像锐化、彩色增强、颜色空间转换、预去噪中的至少一种处理,得到所述原始图像对应的RGB图像。
所述处理器71对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像时具体用于:对所述多个原始图像对应的多个RGB图像进行配准处理和融合处理,生成所述目标图像。或者,对所述多个原始图像对应的多个RGB图像进行配准处理和融合处理,并对处理后的图像进行对比度增强和/或颜色空间转换,生成所述目标图像。
所述处理器71对所述原始训练图像进行去相关操作,得到多通道训练图像时具体用于:根据所述原始训练图像获取R通道图像、G通道图像、B通道图像;去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道训练图像。所述处理器71根据所述原始训练图像获取R通道图像、G通道图像、B通道图像时具体用于:对所述原始训练图像进行上采样操作,得到所述R通道图像、所述G通道图像和所述B通道图像。
所述处理器71去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道训练图像时具体用于:根据第一转换矩阵去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道训练图像;其中,所述第一转换矩阵用于解除图像之间相关性。
所述处理器71还用于:根据如下方式获取所述第一转换矩阵;
获取原始输入图像,并根据所述原始输入图像获取R通道输入图像、G通道输入图像、B通道输入图像;基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取所述第一转换矩阵。
所述处理器71基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取所述第一转换矩阵时具体用于:
从所述R通道输入图像、所述G通道输入图像、所述B通道输入图像中各采集多个像素向量,并将采集的多个像素向量构成像素向量矩阵;
对所述像素向量矩阵进行主成分分析,得到三个基向量;
根据所述三个基向量获取所述第一转换矩阵。
所述处理器71根据多通道训练图像和目标图像对预设神经网络进行训练时具体用于:以多通道训练图像作为所述预设神经网络的输入,生成输出图像;根据所述输出图像和所述目标图像对预设神经网络的网络参数进行更新。
所述多通道训练图像包括亮度分量和色度分量,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络,所述第一多尺度提取网络用于获取所述亮度分量的多个尺度特征,所述第二多尺度提取网络用于获取所述色度分量的多个尺度特征。
所述色度分量包括第一色度分量和第二色度分量。
所述第一多尺度提取网络包括卷积层和/或池化层;
所述第二多尺度提取网络包括卷积层和/或池化层。
所述预设神经网络还包括多个依次连接的拼接层;所述拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。
相邻的所述拼接层之间通过预设的转置卷积层连接,所述转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
所述处理器71根据所述输出图像和所述目标图像对预设神经网络的网络参数进行更新时具体用于:根据第二转换矩阵对所述输出图像进行转换,得到相机RGB图像;利用相机元数据将所述相机RGB图像转换为标准RGB图像;根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新。
所述处理器71还用于:根据第一转换矩阵确定第二转换矩阵;所述第一转换矩阵为[X1,X2,X3]
T,T用于表示转置,第二转换矩阵为[X1,X2,X3]。
所述处理器71可以从所述存储器72读取所述计算机指令,以实现如下操作:获取待处理图像;对所述待处理图像进行去相关操作,得到多通道图像;通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像。
所述处理器71对所述待处理图像进行去相关操作,得到多通道图像时具体用于:根据所述待处理图像获取R通道图像、G通道图像、B通道图像;去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道图像。所述处理器71通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像时具体用于:以所述多通道图像作为所述预设神经网络的输入,生成输出图像;根据所述预设神经网络的网络参数对所述输出图像进行处理,生成画质提升图像。
又例如,所述处理器71可以从所述存储器72读取所述计算机指令,以实现如下操作:根据原始训练图像获取多通道训练图像;
根据所述多通道训练图像对预设神经网络进行训练;其中,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络用于获取所述多通道训练图像中的每一通道的训练图像的多个尺度特征。
所述处理器71还用于:获取原始训练图像、与所述原始训练图像对应的目标图像;所述处理器71根据所述多通道训练图像对预设神经网络进行训练时具体用于:根据多通道训练图像和所述目标图像对预设神经网络进行训练。
又例如,所述处理器71可以从所述存储器72读取所述计算机指令,以实现如下操作:获取待处理图像、与所述待处理图像对应的多通道图像;
通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像;
其中,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络用于获取所述多通道图像中的每一通道的图像的多个尺度特征。
基于与上述方法同样的构思,本发明实施例中还提供一种可移动平台,参见图8所示,该可移动平台80包括:机体81;动力系统82,所述动力系统82安装在所述机体81内部,所述动力系统82用于为所述可移动平台80提供动力;传感系统83,用于拍摄图像帧;以及,上述的图像处理装置70,用于对所述传感系统拍摄的图像帧进行处理后。优选地,所述图像处理装置70的处理器71还用于根据处理后的图像帧控制所述动力系统82。
以无人机为例,无人机的动力系统可以包括电子调速器(简称为电调)、螺旋桨以及与螺旋桨相对应的电机。电机连接在电子调速器与螺旋桨之间,电机和螺旋桨设置在对应的机臂上;电子调速器用于接收控制系统产生的驱动信号,并根据驱动信号提供驱动电流给电机,以控制电机的转速。电机用于驱动螺旋桨旋转,从而为无人机的飞行提供动力。
基于与上述方法同样的构思,本发明实施例中还提供一种相机,参见图9所示,相机90包括:外壳91;镜头组件92,所述镜头组件92安装在所述外壳91内部;传感器组件93,所述传感器组件93安装在所述外壳91内部,用于感知通过所述镜头组件92的光并生成电信号;以及,上述图像处理装置70。
基于与上述方法同样的构思,本发明实施例还提供一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,在所述计算机指令被执行时,实现上述的图像处理方法,即上述各实施例的图像处理方法。
上述实施例阐明的系统、装置、模块或单元,可以由计算机芯片或实体 实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本发明时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本发明实施例而已,并不用于限制本发明。对于本领域技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进,均应包含在本发明的权利要求范围之内。
Claims (60)
- 一种图像处理方法,其特征在于,所述方法包括:获取原始训练图像、与所述原始训练图像对应的目标图像;对所述原始训练图像进行去相关操作,得到多通道训练图像;根据所述多通道训练图像和所述目标图像对预设神经网络进行训练。
- 根据权利要求1所述的方法,其特征在于,获取原始训练图像、与所述原始训练图像对应的目标图像,包括:获取多个原始图像;其中,不同原始图像的曝光相同或不同;从所述多个原始图像中选择一个原始图像作为所述原始训练图像;对所述多个原始图像进行多图融合处理,得到所述目标图像。
- 根据权利要求2所述的方法,其特征在于,所述多个原始图像包括采用包围曝光模式采集的多个原始图像。
- 根据权利要求2所述的方法,其特征在于,从所述多个原始图像中选择一个原始图像作为所述原始训练图像,包括:获取每个原始图像的曝光补偿信息,并根据每个原始图像的曝光补偿信息,从所述多个原始图像中选择一个原始图像作为所述原始训练图像。
- 根据权利要求2所述的方法,其特征在于,对所述多个原始图像进行多图融合处理,得到所述目标图像,包括:获取所述多个原始图像中的每个原始图像对应的RGB图像,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像。
- 根据权利要求5所述的方法,其特征在于,获取所述多个原始图像中的每个原始图像对应的RGB图像,包括:针对所述多个原始图像中的每个原始图像,对所述原始图像进行解马赛克、去噪、自动白平衡、图像锐化、彩色增强、颜色空间转换、预去噪中的至少一种处理,得到所述原始图像对应的RGB图像。
- 根据权利要求5所述的方法,其特征在于,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像,包括:对所述多个原始图像对应的多个RGB图像进行配准处理和融合处理,生成所述目标图像。
- 根据权利要求5所述的方法,其特征在于,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像,包括:对所述多个原始图像对应的多个RGB图像进行配准处理和融合处理,并对处理后的图像进行对比度增强和/或颜色空间转换,生成所述目标图像。
- 根据权利要求1所述的方法,其特征在于,对所述原始训练图像进行去相关操作,得到多通道训练图像,包括:根据所述原始训练图像获取R通道图像、G通道图像、B通道图像;去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道训练图像。
- 根据权利要求9所述的方法,其特征在于,所述根据所述原始训练图像获取R通道图像、G通道图像、B通道图像,包括:对所述原始训练图像进行上采样操作,得到所述R通道图像、所述G通道图像和所述B通道图像。
- 根据权利要求9所述的方法,其特征在于,所述去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道训练图像,包括:根据第一转换矩阵去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道训练图像;其中,所述第一转换矩阵用于解除图像之间相关性。
- 根据权利要求11所述的方法,其特征在于,所述根据第一转换矩阵去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性之前,所述方法还包括:根据如下方式获取所述第一转换矩阵;获取原始输入图像,并根据所述原始输入图像获取R通道输入图像、G通道输入图像、B通道输入图像;基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取所述第一转换矩阵。
- 根据权利要求12所述的方法,其特征在于,所述基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取所述第一转换矩阵,包括:从所述R通道输入图像、所述G通道输入图像、所述B通道输入图像中各采集多个像素向量,并将采集的多个像素向量构成像素向量矩阵;对所述像素向量矩阵进行主成分分析,得到三个基向量;根据所述三个基向量获取所述第一转换矩阵。
- 根据权利要求1所述的方法,其特征在于,所述根据所述多通道训练图像和所述目标图像对预设神经网络进行训练,包括:以所述多通道训练图像作为所述预设神经网络的输入,生成输出图像;根据所述输出图像和所述目标图像对预设神经网络的网络参数进行更新。
- 根据权利要求14所述的方法,其特征在于,所述多通道训练图像包括亮度分量和色度分量,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络,所述第一多尺度提取网络用于获取所述亮度分量的多个尺度特征,所述第二多尺度提取网络用于获取所述色度分量的多个尺度特征。
- 根据权利要求15所述的方法,其特征在于,所述色度分量包括第一色度分量和第二色度分量。
- 根据权利要求15所述的方法,其特征在于,所述第一多尺度提取网络包括卷积层和/或池化层;所述第二多尺度提取网络包括卷积层和/或池化层。
- 根据权利要求15所述的方法,其特征在于,所述预设神经网络还包括多个依次连接的拼接层;所述拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,所述拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。
- 根据权利要求18所述的方法,其特征在于,相邻的所述拼接层之间通过预设的转置卷积层连接,所述转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
- 根据权利要求14所述的方法,其特征在于,所述根据所述输出图像和所述目标图像对预设神经网络的网络参数进行更新,包括:根据第二转换矩阵对所述输出图像进行转换,得到相机RGB图像;利用相机元数据将所述相机RGB图像转换为标准RGB图像;根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新。
- 根据权利要求20所述的方法,其特征在于,所述方法还包括:根据第一转换矩阵确定第二转换矩阵;其中,所述第一转换矩阵为[X 1,X 2,X 3] T,T用于表示转置,所述第二转换矩阵为[X 1,X 2,X 3]。
- 一种图像处理方法,其特征在于,所述方法包括:获取待处理图像;对所述待处理图像进行去相关操作,得到多通道图像;通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像。
- 根据权利要求22所述的方法,其特征在于,对所述待处理图像进行去相关操作,得到多通道图像,包括:根据所述待处理图像获取R通道图像、G通道图像、B通道图像;去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道图像。
- 根据权利要求23所述的方法,其特征在于,所述根据所述待处理图像获取R通道图像、G通道图像、B通道图像,包括:对所述待处理图像进行上采样操作,得到所述R通道图像、所述G通道图像、所述B通道图像。
- 根据权利要求23所述的方法,其特征在于,所述去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性,得到所述多通道图像,包括:根据第一转换矩阵去除所述R通道图像、所述G通道图像、所述B通道 图像之间的相关性,得到所述多通道图像;其中,所述第一转换矩阵用于解除图像之间相关性。
- 根据权利要求25所述的方法,其特征在于,所述根据第一转换矩阵去除所述R通道图像、所述G通道图像、所述B通道图像之间的相关性之前,所述方法还包括:根据如下方式获取所述第一转换矩阵;获取原始输入图像,并根据所述原始输入图像获取R通道输入图像、G通道输入图像、B通道输入图像;基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取所述第一转换矩阵。
- 根据权利要求26所述的方法,其特征在于,所述基于主成分分析,根据所述R通道输入图像、所述G通道输入图像、所述B通道输入图像获取所述第一转换矩阵,包括:从所述R通道输入图像、所述G通道输入图像、所述B通道输入图像中各采集多个像素向量,并将采集的多个像素向量构成像素向量矩阵;对所述像素向量矩阵进行主成分分析,得到三个基向量;根据所述三个基向量获取所述第一转换矩阵。
- 根据权利要求22所述的方法,其特征在于,所述通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像,包括:以所述多通道图像作为所述预设神经网络的输入,生成输出图像;根据所述预设神经网络的网络参数对所述输出图像进行处理,生成画质提升图像。
- 根据权利要求28所述的方法,其特征在于,所述多通道图像包括亮度分量和色度分量,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络,所述第一多尺度提取网络用于获取所述亮度分量的多个尺度特征,所述第二多尺度提取网络用于获取所述色度分量的多个尺度特征。
- 根据权利要求29所述的方法,其特征在于,所述色度分量包括第一色度分量和第二色度分量;所述第一多尺度提取网络包括卷积层和/或池化层;所述第二多尺度提取网络包括卷积层和/或池化层。
- 根据权利要求29所述的方法,其特征在于,所述预设神经网络还包括多个依次连接的拼接层;所述拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,所述拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。
- 根据权利要求31所述的方法,其特征在于,相邻的所述拼接层之间通过预设的转置卷积层连接,所述转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
- 根据权利要求28所述的方法,其特征在于,所述根据所述预设神经网络的网络参数对所述输出图像进行处理,生成画质提升图像,包括:根据第二转换矩阵对所述输出图像进行转换,得到相机RGB图像;利用相机元数据将所述相机RGB图像转换为标准RGB图像;根据所述预设神经网络的网络参数对所述标准RGB图像进行处理,生成画质提升图像。
- 一种图像处理方法,其特征在于,所述方法包括:根据原始训练图像获取多通道训练图像;根据所述多通道训练图像对预设神经网络进行训练;其中,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络用于获取所述多通道训练图像中的每一通道的训练图像的多个尺度特征。
- 根据权利要求34所述的方法,其特征在于,所述方法还包括:获取原始训练图像、与所述原始训练图像对应的目标图像;所述根据所述多通道训练图像对预设神经网络进行训练,包括:根据所述多通道训练图像和所述目标图像对预设神经网络进行训练。
- 根据权利要求35所述的方法,其特征在于,获取多个原始图像;其中,不同原始图像的曝光相同或不同;从所述多个原始图像中选择一个原始图像作为所述原始训练图像;对所述多个原始图像进行多图融合处理,得到所述目标图像。
- 根据权利要求36所述的方法,其特征在于,所述多个原始图像包括采用包围曝光模式采集的多个原始图像。
- 根据权利要求36所述的方法,其特征在于,从所述多个原始图像中选择一个原始图像作为所述原始训练图像,包括:获取每个原始图像的曝光补偿信息,并根据每个原始图像的曝光补偿信息,从所述多个原始图像中选择一个原始图像作为所述原始训练图像。
- 根据权利要求36所述的方法,其特征在于,对所述多个原始图像进行多图融合处理,得到所述目标图像,包括:获取所述多个原始图像中的每个原始图像对应的RGB图像,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像。
- 根据权利要求39所述的方法,其特征在于,获取所述多个原始图像中的每个原始图像对应的RGB图像,包括:针对所述多个原始图像中的每个原始图像,对所述原始图像进行解马赛克、去噪、自动白平衡、图像锐化、彩色增强、颜色空间转换、预去噪中的至少一种处理,得到所述原始图像对应的RGB图像。
- 根据权利要求39所述的方法,其特征在于,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像,包括:对所述多个原始图像对应的多个RGB图像进行配准处理和融合处理,生成所述目标图像。
- 根据权利要求39所述的方法,其特征在于,对所述多个原始图像对应的多个RGB图像进行多图融合处理,得到所述目标图像,包括:对所述多个原始图像对应的多个RGB图像进行配准处理和融合处理,并对处理后的图像进行对比度增强和/或颜色空间转换,生成所述目标图像。
- 根据权利要求34所述的方法,其特征在于,所述多通道训练图像包括亮度分量和色度分量,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络,所述第一多尺度提取网络用于获取所述亮度分量的多个尺度特 征,所述第二多尺度提取网络用于获取所述色度分量的多个尺度特征。
- 根据权利要求43所述的方法,其特征在于,所述色度分量包括第一色度分量和第二色度分量。
- 根据权利要求43所述的方法,其特征在于,所述第一多尺度提取网络包括卷积层和/或池化层;所述第二多尺度提取网络包括卷积层和/或池化层。
- 根据权利要求43所述的方法,其特征在于,所述预设神经网络还包括多个依次连接的拼接层;所述拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,所述拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。
- 根据权利要求46所述的方法,其特征在于,相邻的所述拼接层之间通过预设的转置卷积层连接,所述转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
- 根据权利要求35所述的方法,其特征在于,所述根据所述多通道训练图像和所述目标图像对预设神经网络进行训练,包括:以所述多通道训练图像作为所述预设神经网络的输入,生成输出图像;根据所述输出图像和所述目标图像对预设神经网络的网络参数进行更新。
- 根据权利要求48所述的方法,其特征在于,所述根据所述输出图像和所述目标图像对所述预设神经网络的网络参数进行更新,包括:根据第二转换矩阵对所述输出图像进行转换,得到相机RGB图像;利用相机元数据将所述相机RGB图像转换为标准RGB图像;根据标准RGB图像和目标图像对预设神经网络的网络参数进行更新。
- 一种图像处理方法,其特征在于,所述方法包括:获取待处理图像、与所述待处理图像对应的多通道图像;通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像;其中,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络用 于获取所述多通道图像中的每一通道的图像的多个尺度特征。
- 根据权利要求50所述的方法,其特征在于,所述通过预设神经网络对所述多通道图像进行处理,以生成画质提升图像,包括:以所述多通道图像作为所述预设神经网络的输入,生成输出图像;根据所述预设神经网络的网络参数对所述输出图像进行处理,生成画质提升图像。
- 根据权利要求51所述的方法,其特征在于,所述多通道图像包括亮度分量和色度分量,所述预设神经网络包括多尺度提取网络,所述多尺度提取网络包括第一多尺度提取网络和第二多尺度提取网络,所述第一多尺度提取网络用于获取所述亮度分量的多个尺度特征,所述第二多尺度提取网络用于获取所述色度分量的多个尺度特征。
- 根据权利要求52所述的方法,其特征在于,所述色度分量包括第一色度分量和第二色度分量;所述第一多尺度提取网络包括卷积层和/或池化层;所述第二多尺度提取网络包括卷积层和/或池化层。
- 根据权利要求52所述的方法,其特征在于,所述预设神经网络还包括多个依次连接的拼接层;所述拼接层用于对同一尺度的亮度分量和色度分量进行通道拼接操作;或,所述拼接层用于对同一尺度的亮度分量和色度分量,以及上一拼接层的输出图像进行通道拼接操作。
- 根据权利要求53所述的方法,其特征在于,相邻的所述拼接层之间通过预设的转置卷积层连接,所述转置卷积层用于将上一拼接层的输出图像进行转置卷积操作后输出至下一拼接层。
- 根据权利要求51所述的方法,其特征在于,所述根据所述预设神经网络的网络参数对所述输出图像进行处理,生成画质提升图像,包括:根据第二转换矩阵对所述输出图像进行转换,得到相机RGB图像;利用相机元数据将所述相机RGB图像转换为标准RGB图像;根据所述预设神经网络的网络参数对所述标准RGB图像进行处理,生成画质提升图像。
- 一种图像处理装置,其特征在于,所述装置包括处理器和存储器;所述存储器,用于存储所述处理器可执行的计算机指令;所述处理器,用于从所述存储器读取所述计算机指令以实现:权利要求1-21任一项所述的方法;或,权利要求22-33任一项所述的方法;或,权利要求34-49任一项所述的方法;或,权利要求50-56任一项所述的方法。
- 一种可移动平台,其特征在于,包括:机体;动力系统,设于所述机体,所述动力系统用于为所述可移动平台提供动力;传感系统,用于拍摄图像帧;以及,如权利要求57所述的图像处理装置,用于对所述传感系统拍摄的图像帧进行处理。
- 一种相机,其特征在于,包括:外壳;镜头组件,安装在所述外壳内部;传感器组件,安装在所述外壳内部,用于感知通过所述镜头组件的光并生成电信号;以及,如权利要求57所述的图像处理装置。
- 一种机器可读存储介质,其特征在于,所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现:权利要求1-21任一项所述的方法;或,权利要求22-33任一项所述的方法;或,权利要求34-49任一项所述的方法;或,权利要求50-56任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/108039 WO2021056304A1 (zh) | 2019-09-26 | 2019-09-26 | 图像处理方法、装置、可移动平台及机器可读存储介质 |
CN201980033198.4A CN112166455A (zh) | 2019-09-26 | 2019-09-26 | 图像处理方法、装置、可移动平台及机器可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/108039 WO2021056304A1 (zh) | 2019-09-26 | 2019-09-26 | 图像处理方法、装置、可移动平台及机器可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021056304A1 true WO2021056304A1 (zh) | 2021-04-01 |
Family
ID=73859374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/108039 WO2021056304A1 (zh) | 2019-09-26 | 2019-09-26 | 图像处理方法、装置、可移动平台及机器可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112166455A (zh) |
WO (1) | WO2021056304A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037632A (zh) * | 2021-07-02 | 2022-02-11 | 合肥工业大学 | 基于lab色彩空间的多尺度残差注意力图像去雾方法 |
CN114862737A (zh) * | 2022-06-02 | 2022-08-05 | 大连海事大学 | 基于图像反射分量和深度学习模型的水下图像增强方法 |
CN114913085A (zh) * | 2022-05-05 | 2022-08-16 | 福州大学 | 基于灰度提升的双路卷积低光照图像增强方法 |
CN115316982A (zh) * | 2022-09-02 | 2022-11-11 | 中国科学院沈阳自动化研究所 | 一种基于多模态传感的肌肉形变智能检测系统及方法 |
WO2023246392A1 (zh) * | 2022-06-22 | 2023-12-28 | 京东方科技集团股份有限公司 | 图像获取方法、装置、设备和非瞬态计算机存储介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139914B (zh) * | 2021-03-16 | 2023-02-17 | 哈尔滨工业大学(深圳) | 图像去噪方法、装置、电子设备及存储介质 |
CN113781326A (zh) * | 2021-08-11 | 2021-12-10 | 北京旷视科技有限公司 | 解马赛克方法、装置、电子设备及存储介质 |
CN113808043A (zh) * | 2021-09-16 | 2021-12-17 | 北京拙河科技有限公司 | 相机成像方法、装置、介质及设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108768A (zh) * | 2017-12-29 | 2018-06-01 | 清华大学 | 基于卷积神经网络的光伏玻璃缺陷分类方法及装置 |
CN108513672A (zh) * | 2017-07-27 | 2018-09-07 | 深圳市大疆创新科技有限公司 | 增强图像对比度的方法、设备及存储介质 |
WO2018233708A1 (zh) * | 2017-06-23 | 2018-12-27 | 华为技术有限公司 | 图像显著性物体检测方法和装置 |
CN110096981A (zh) * | 2019-04-22 | 2019-08-06 | 长沙千视通智能科技有限公司 | 一种基于深度学习的视频大数据交通场景分析方法 |
CN110163855A (zh) * | 2019-05-17 | 2019-08-23 | 武汉大学 | 一种基于多路深度卷积神经网络的彩色图像质量评价方法 |
-
2019
- 2019-09-26 CN CN201980033198.4A patent/CN112166455A/zh active Pending
- 2019-09-26 WO PCT/CN2019/108039 patent/WO2021056304A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018233708A1 (zh) * | 2017-06-23 | 2018-12-27 | 华为技术有限公司 | 图像显著性物体检测方法和装置 |
CN108513672A (zh) * | 2017-07-27 | 2018-09-07 | 深圳市大疆创新科技有限公司 | 增强图像对比度的方法、设备及存储介质 |
CN108108768A (zh) * | 2017-12-29 | 2018-06-01 | 清华大学 | 基于卷积神经网络的光伏玻璃缺陷分类方法及装置 |
CN110096981A (zh) * | 2019-04-22 | 2019-08-06 | 长沙千视通智能科技有限公司 | 一种基于深度学习的视频大数据交通场景分析方法 |
CN110163855A (zh) * | 2019-05-17 | 2019-08-23 | 武汉大学 | 一种基于多路深度卷积神经网络的彩色图像质量评价方法 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037632A (zh) * | 2021-07-02 | 2022-02-11 | 合肥工业大学 | 基于lab色彩空间的多尺度残差注意力图像去雾方法 |
CN114913085A (zh) * | 2022-05-05 | 2022-08-16 | 福州大学 | 基于灰度提升的双路卷积低光照图像增强方法 |
CN114862737A (zh) * | 2022-06-02 | 2022-08-05 | 大连海事大学 | 基于图像反射分量和深度学习模型的水下图像增强方法 |
WO2023246392A1 (zh) * | 2022-06-22 | 2023-12-28 | 京东方科技集团股份有限公司 | 图像获取方法、装置、设备和非瞬态计算机存储介质 |
CN115316982A (zh) * | 2022-09-02 | 2022-11-11 | 中国科学院沈阳自动化研究所 | 一种基于多模态传感的肌肉形变智能检测系统及方法 |
CN115316982B (zh) * | 2022-09-02 | 2024-08-20 | 中国科学院沈阳自动化研究所 | 一种基于多模态传感的肌肉形变智能检测系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112166455A (zh) | 2021-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021056304A1 (zh) | 图像处理方法、装置、可移动平台及机器可读存储介质 | |
US10897609B2 (en) | Systems and methods for multiscopic noise reduction and high-dynamic range | |
JP5158223B2 (ja) | 三次元モデリング装置、三次元モデリング方法、ならびに、プログラム | |
US8548232B2 (en) | Image processing apparatus, method and program | |
JP5704975B2 (ja) | 画像処理装置、画像処理方法、およびプログラム | |
CN114095662B (zh) | 拍摄指引方法及电子设备 | |
CN108024054A (zh) | 图像处理方法、装置及设备 | |
JP2023056056A (ja) | データ生成方法、学習方法及び推定方法 | |
JP2011209815A (ja) | 三次元モデリング装置、三次元モデリング方法、ならびに、プログラム | |
JP5263437B2 (ja) | 三次元モデリング装置、三次元モデリング方法、ならびに、プログラム | |
JP2011035636A (ja) | 画像処理装置及び方法 | |
CN107633497A (zh) | 一种图像景深渲染方法、系统及终端 | |
US8072487B2 (en) | Picture processing apparatus, picture recording apparatus, method and program thereof | |
CN118382876A (zh) | 使用运动数据生成更高分辨率图像 | |
CN117768774A (zh) | 图像处理器、图像处理方法、拍摄装置和电子设备 | |
US20100079582A1 (en) | Method and System for Capturing and Using Automatic Focus Information | |
CN107395989A (zh) | 图像拼接方法、用于图像拼接的移动终端和系统 | |
Popovic et al. | Design and implementation of real-time multi-sensor vision systems | |
CN113287147A (zh) | 一种图像处理方法及装置 | |
JP2014164497A (ja) | 画像処理装置、画像処理方法及びプログラム | |
KR20220133766A (ko) | 멀티뷰 어안 렌즈들을 이용한 실시간 전방위 스테레오 매칭 방법 및 그 시스템 | |
CN114125319A (zh) | 图像传感器、摄像模组、图像处理方法、装置和电子设备 | |
CN112241935A (zh) | 图像处理方法、装置及设备、存储介质 | |
WO2020039470A1 (ja) | 画像処理システム | |
CN105794193A (zh) | 图像处理设备、图像处理方法和程序 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19947384 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19947384 Country of ref document: EP Kind code of ref document: A1 |