WO2021017811A1 - 图像处理方法和装置、电子设备、计算机可读存储介质 - Google Patents
图像处理方法和装置、电子设备、计算机可读存储介质 Download PDFInfo
- Publication number
- WO2021017811A1 WO2021017811A1 PCT/CN2020/101817 CN2020101817W WO2021017811A1 WO 2021017811 A1 WO2021017811 A1 WO 2021017811A1 CN 2020101817 W CN2020101817 W CN 2020101817W WO 2021017811 A1 WO2021017811 A1 WO 2021017811A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- subject
- resolution
- processed
- target
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 33
- 238000012545 processing Methods 0.000 claims description 109
- 238000001514 detection method Methods 0.000 claims description 57
- 238000000034 method Methods 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 22
- 238000004422 calculation algorithm Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 18
- 238000001914 filtration Methods 0.000 claims description 16
- 230000004927 fusion Effects 0.000 claims description 13
- 230000007423 decrease Effects 0.000 claims description 9
- 230000000877 morphologic effect Effects 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 210000000746 body region Anatomy 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 230000000694 effects Effects 0.000 description 10
- 238000013507 mapping Methods 0.000 description 10
- 238000000605 extraction Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 241000282472 Canis lupus familiaris Species 0.000 description 3
- 241000282326 Felis catus Species 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000003705 background correction Methods 0.000 description 2
- 238000007526 fusion splicing Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007797 corrosion Effects 0.000 description 1
- 238000005260 corrosion Methods 0.000 description 1
- 230000003628 erosive effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
- G06T3/4076—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
Definitions
- This application relates to the field of imaging, in particular to an image processing method, device, electronic equipment, and computer-readable storage medium.
- the goal of super-resolution reconstruction technology is to reconstruct high-resolution images from low-resolution images to make the reconstructed images clearer. Through super-resolution reconstruction, some low-resolution images can be reconstructed to achieve the desired effect of users.
- the traditional super-resolution reconstruction technology generally performs a unified super-resolution reconstruction process for the entire image, and the reconstructed image has no difference in each area, and cannot take into account the details of the image.
- an image processing method, apparatus, electronic device, and computer-readable storage medium are provided.
- An image processing method including:
- the reconstructed foreground image and background image of the target subject are fused to obtain a target image, the resolution of the target image is greater than the first resolution.
- An image processing device including:
- An acquisition module for acquiring a to-be-processed image of the first resolution
- the recognition module is used to recognize the target subject in the image to be processed, and obtain a foreground image and a background image of the target subject;
- a reconstruction module for performing super-resolution reconstruction on the target subject foreground image and the background image respectively;
- the fusion module is used for fusing the reconstructed foreground image and background image of the target subject to obtain a target image, the resolution of the target image is greater than the first resolution.
- An electronic device includes a memory and a processor.
- the memory stores a computer program.
- the processor executes the following steps:
- the reconstructed foreground image and background image of the target subject are fused to obtain a target image, the resolution of the target image is greater than the first resolution.
- a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the following steps are implemented:
- the reconstructed foreground image and background image of the target subject are fused to obtain a target image, the resolution of the target image is greater than the first resolution.
- the above-mentioned image processing method and device, electronic equipment, and computer-readable storage medium by acquiring the image to be processed with the first resolution, identify the target subject in the image to be processed, and obtain the foreground image and background image of the target subject.
- Image and background image are super-resolution reconstruction, the reconstructed target subject foreground image and background image are merged to obtain the target image.
- the resolution of the target image is greater than the first resolution, which can take into account the details of the image and improve the details of image reconstruction Treatment effect.
- Fig. 1 is a block diagram of the internal structure of an electronic device in an embodiment.
- FIG. 2 is a flowchart of an image processing method in an embodiment.
- Fig. 3 is an architecture diagram of an image reconstruction model in an embodiment.
- Figure 4 is a structural diagram of a cascade block in an embodiment.
- Fig. 5 is a structural diagram of a cascade block in another embodiment.
- Fig. 6 is a flowchart of super-resolution reconstruction of a background image in an embodiment.
- Fig. 7 is a flowchart of an image processing method applied to a video processing scene in an embodiment.
- Fig. 8 is a flowchart of identifying the target subject in the image to be processed in an embodiment.
- Fig. 9 is a flowchart of determining the target subject in the image to be processed according to the subject area confidence map in an embodiment.
- Fig. 10 is a schematic diagram of the effect of subject recognition on an image to be processed in an embodiment.
- Fig. 11 is a structural diagram of an image processing method in an embodiment.
- Fig. 12 is a structural block diagram of an image processing device in an embodiment.
- Fig. 13 is a schematic diagram of the internal structure of an electronic device in another embodiment.
- the image processing method in the embodiment of the present application can be applied to electronic equipment.
- the electronic device may be a computer device with a camera, a personal digital assistant, a tablet computer, a smart phone, a wearable device, etc.
- the camera in the electronic device takes an image, it will automatically focus to ensure that the captured image is clear.
- the above electronic device may include an image processing circuit, which may be implemented by hardware and/or software components, and may include various processing units that define an ISP (Image Signal Processing, image signal processing) pipeline.
- Fig. 1 is a schematic diagram of an image processing circuit in an embodiment. As shown in FIG. 1, for ease of description, only various aspects of the image processing technology related to the embodiments of the present application are shown.
- the image processing circuit includes a first ISP processor 130, a second ISP processor 140, and a control logic 150.
- the first camera 110 includes one or more first lenses 112 and a first image sensor 114.
- the first image sensor 114 may include a color filter array (such as a Bayer filter).
- the first image sensor 114 may acquire the light intensity and wavelength information captured by each imaging pixel of the first image sensor 114, and provide information that can be obtained by the first ISP.
- the second camera 120 includes one or more second lenses 122 and a second image sensor 124.
- the second image sensor 124 may include a color filter array (such as a Bayer filter).
- the second image sensor 124 may acquire the light intensity and wavelength information captured by each imaging pixel of the second image sensor 124, and provide information that can be used by the second ISP.
- a set of image data processed by the processor 140 includes one or more first lenses 112 and a first image sensor 114.
- the first image collected by the first camera 110 is transmitted to the first ISP processor 130 for processing.
- the statistical data of the first image (such as image brightness, image contrast value) , The color of the image, etc.) are sent to the control logic 150, and the control logic 150 can determine the control parameters of the first camera 110 according to the statistical data, so that the first camera 110 can perform operations such as auto focus and auto exposure according to the control parameters.
- the first image may be stored in the image memory 160 after being processed by the first ISP processor 130, and the first ISP processor 130 may also read the image stored in the image memory 160 for processing.
- the first image can be directly sent to the display 170 for display after being processed by the ISP processor 130, and the display 170 can also read the image in the image memory 160 for display.
- the first ISP processor 130 processes image data pixel by pixel in multiple formats.
- each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the first ISP processor 130 may perform one or more image processing operations on the image data and collect statistical information about the image data.
- the image processing operations can be performed with the same or different bit depth accuracy.
- the image memory 160 may be a part of a memory device, a storage device, or an independent dedicated memory in an electronic device, and may include DMA (Direct Memory Access) features.
- DMA Direct Memory Access
- the first ISP processor 130 may perform one or more image processing operations, such as temporal filtering.
- the processed image data can be sent to the image memory 160 for additional processing before being displayed.
- the first ISP processor 130 receives the processed data from the image memory 160, and performs image data processing in the RGB and YCbCr color spaces on the processed data.
- the image data processed by the first ISP processor 130 may be output to the display 170 for viewing by the user and/or further processed by a graphics engine or a GPU (Graphics Processing Unit, graphics processor).
- the output of the first ISP processor 130 can also be sent to the image memory 160, and the display 170 can read image data from the image memory 160.
- the image memory 160 may be configured to implement one or more frame buffers.
- the statistical data determined by the first ISP processor 130 may be sent to the control logic 150.
- the statistical data may include first image sensor 114 statistical information such as automatic exposure, automatic white balance, automatic focus, flicker detection, black level compensation, and shading correction of the first lens 112.
- the control logic 150 may include a processor and/or microcontroller that executes one or more routines (such as firmware), and the one or more routines can determine the control parameters and the first camera 110 of the first camera 110 based on the received statistical data.
- the control parameters of the first camera 110 may include gain, integration time of exposure control, anti-shake parameters, flash control parameters, first lens 112 control parameters (for example, focal length for focusing or zooming), or a combination of these parameters.
- the ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (for example, during RGB processing), and the first lens 112 shading correction parameters.
- the second image collected by the second camera 120 is transmitted to the second ISP processor 140 for processing.
- the statistical data of the second image (such as image brightness, image The contrast value of the image, the color of the image, etc.) are sent to the control logic 150.
- the control logic 150 can determine the control parameters of the second camera 120 according to the statistical data, so that the second camera 120 can perform automatic focusing, automatic exposure and other operations according to the control parameters.
- the second image can be stored in the image memory 160 after being processed by the second ISP processor 140, and the second ISP processor 140 can also read the image stored in the image memory 160 for processing.
- the second image can be directly sent to the display 170 for display after being processed by the ISP processor 140, and the display 170 can also read the image in the image memory 160 for display.
- the second camera 120 and the second ISP processor 140 may also implement the processing procedures described by the first camera 110 and the first ISP processor 130.
- the first camera 110 may be a color camera
- the second camera 120 may be a TOF (Time Of Flight) camera or a structured light camera.
- TOF camera can obtain TOF depth map
- structured light camera can obtain structured light depth map.
- the first camera 110 and the second camera 120 may both be color cameras. Obtain binocular depth maps through two color cameras.
- the first ISP processor 130 and the second ISP processor 140 may be the same ISP processor.
- the first camera 110 and the second camera 120 collect the same scene to obtain the to-be-processed image and the depth map at the first resolution, respectively, and send the to-be-processed image and the depth map at the first resolution to the ISP processor.
- the ISP processor can register the image to be processed at the first resolution with the depth map according to the camera calibration parameters to keep the field of view completely consistent; and then generate a center weight map corresponding to the image to be processed at the first resolution.
- the weight value represented by the center weight map gradually decreases from the center to the edge; the first resolution image to be processed and the center weight map are input into the trained subject detection model to obtain the subject area confidence map, and then according to the subject area
- the confidence map determines the target subject in the image to be processed at the first resolution; you can also input the image to be processed at the first resolution, the depth map, and the center weight map into the trained subject detection model to obtain the confidence of the subject area Figure, and then determine the target subject in the to-be-processed image of the first resolution according to the subject region confidence map, and obtain the foreground image and background image of the target subject.
- the electronic device performs super-resolution reconstruction of the target subject foreground image and the background image, and merges the reconstructed target subject foreground image and background image to obtain a target image.
- the resolution of the target image is greater than the first resolution. Rate, can improve the detail processing effect of the target subject, but also can improve the detail processing effect of image reconstruction.
- FIG. 2 is a flowchart of an image processing method in an embodiment.
- the image processing method in this embodiment is described by taking the terminal or server in FIG. 1 as an example.
- the image processing method includes:
- the first resolution refers to the image resolution
- the image resolution refers to the amount of information stored in the image, and is the number of pixels present in each inch of the image.
- the image to be processed can be obtained by shooting any scene with a camera, and it can be a color image or a black and white image.
- the image to be processed may be stored locally by the electronic device, may also be stored by other devices, may also be stored on the network, or may be captured by the electronic device in real time, but is not limited to this.
- the ISP processor or central processing unit of the electronic device can obtain the image to be processed at the first resolution from a local or other device or the network, or use a camera to shoot a scene at the first resolution to obtain the image to be processed.
- Operation 204 Identify the target subject in the image to be processed, and obtain a foreground image and a background image of the target subject.
- the subject refers to various objects, such as people, flowers, cats, dogs, cows, blue sky, white clouds, background, etc.
- the target subject refers to the subject in need, which can be selected according to needs.
- Salient object detection refers to automatically processing regions of interest when facing a scene and selectively ignoring regions of interest.
- the area of interest is called the body area.
- the target subject foreground image refers to the image of the target subject area in the image to be processed, and the background image refers to the image of the remaining area except the target subject area in the image to be processed.
- the electronic device may input the image to be processed into the subject detection model, identify the target subject in the image to be processed through the subject detection model, and segment the image to be processed into a foreground image and a background image of the target subject. Further, the segmented binarized mask map can be output through the subject detection model.
- super-resolution reconstruction refers to the reconstruction of low-resolution images or image sequences to obtain high-resolution images.
- the target subject foreground image may be input into the image reconstruction model.
- the super-resolution reconstruction of the foreground image of the target subject is performed through the image reconstruction model, and the reconstructed high-resolution foreground image of the target subject is obtained.
- the resolution of the reconstructed foreground image of the target subject is greater than the first resolution.
- the electronic device can perform super-resolution reconstruction on the background image of the first resolution through a fast super-division algorithm or an interpolation algorithm, etc., to obtain a reconstructed high-resolution background image.
- the resolution of the reconstructed background image is greater than the first resolution.
- the resolution of the foreground image and the resolution of the background image of the reconstructed target subject may be the same resolution or different resolutions.
- the reconstructed foreground image and background image of the target subject are merged to obtain a target image, the resolution of the target image is greater than the first resolution.
- the electronic device performs fusion splicing processing on the reconstructed foreground image and background image of the target subject, and the merged and spliced image is the target image.
- the resolution of the target image obtained after reconstruction is greater than the first resolution of the image to be processed.
- the image processing method of this embodiment by acquiring the image to be processed with the first resolution, the target subject in the image to be processed is recognized, and the foreground image and background image of the target subject are obtained. Perform super-resolution reconstruction on the foreground image and background image of the target subject respectively, and perform different super-resolution processing on the foreground image and background image of the target subject. The reconstructed foreground image and background image of the target subject are fused to obtain the target image.
- the resolution of the target image is greater than the first resolution, so that the details of the image can be taken into account and the detail processing effect of image reconstruction is improved.
- performing super-resolution reconstruction of the target subject foreground image includes: extracting features of the target subject foreground image through an image reconstruction model to obtain a feature map, and the image reconstruction model is performed in advance based on the subject foreground image sample pair
- the trained model, the subject foreground image sample pair includes the subject foreground image of the first resolution and the subject foreground image of the second resolution; super-resolution processing is performed on the feature map through the image reconstruction model to obtain the second resolution Rate of the target subject foreground image, the second resolution is greater than the first resolution.
- the feature map refers to the image obtained by feature extraction of the image to be processed.
- the electronic device may collect a large number of subject foreground image sample pairs in advance, and each subject foreground image sample pair includes a subject foreground image of the first resolution and the subject foreground image of the second resolution.
- the subject foreground image of the first resolution is input to the untrained image reconstruction model for super-resolution reconstruction, the subject foreground image output by the image reconstruction model is compared with the subject foreground image of the second resolution, and the image is adjusted according to the difference Rebuild the model. After repeated training and adjustment, until the difference between the subject foreground image reconstructed by the image reconstruction model and the subject foreground image of the second resolution is less than the threshold, the training is stopped.
- the electronic device inputs the foreground image of the target subject into the trained image reconstruction model, and the image reconstruction model can perform feature extraction on the foreground image of the target subject through a convolutional layer to obtain a feature map corresponding to the foreground image of the target subject.
- the channel information of the feature map is converted into spatial information through the image reconstruction model to obtain the target subject foreground image of the second resolution, the second resolution being greater than the first resolution.
- the image processing method in this embodiment extracts the features of the foreground image of the target subject by using a trained image reconstruction model to obtain a feature map, and performs super-resolution processing on the feature map through the image reconstruction model to obtain a second resolution
- the target subject foreground image, the second resolution is greater than the first resolution, and local super-resolution reconstruction processing can be performed on the target subject foreground image, which can better process the details of the target subject foreground image, thereby ensuring the target subject’s Clarity.
- FIG. 3 it is an architecture diagram of an image reconstruction model in an embodiment.
- the image reconstruction model includes a convolutional layer, a nonlinear mapping layer and an up-sampling layer.
- the residual unit (Residual) in the nonlinear mapping layer and the first convolution layer are sequentially cascaded to obtain a cascading block (CascadingBlock).
- the non-linear mapping layer includes a plurality of concatenated blocks, and the concatenated blocks and the second convolution layer are sequentially concatenated to form a non-linear mapping layer. That is, the arrows in Figure 3 are called global cascade connections.
- the nonlinear mapping layer is connected with the up-sampling layer, and the up-sampling layer converts the channel information of the image into spatial information, and outputs a high-resolution image.
- the electronic device inputs the first-resolution target subject foreground image into the convolutional layer of the image reconstruction model to perform feature extraction to obtain a feature map.
- the feature map is input to the nonlinear mapping layer of the image reconstruction model, and the output is obtained through the first cascaded block processing, and the feature map output by the convolutional layer is spliced with the output of the first cascaded block, and then input to the first cascaded block.
- a first convolutional layer performs dimensionality reduction processing. Then, the dimensionality-reduced feature map is input to the second cascaded block for processing, and the feature map output by the convolutional layer, the output of the first cascaded block and the output of the second cascaded block are spliced together.
- the layer After splicing Input to the second first convolutional layer for dimensionality reduction processing. Similarly, after the output of the Nth cascade block is obtained, the output of each cascade block before the Nth cascade block and the feature map of the convolutional layer output are spliced, and then input the Nth first convolution after splicing The layer performs dimensionality reduction processing until the output of the last first convolutional layer in the nonlinear mapping layer is obtained.
- the first convolution layer in this embodiment may be a 1 ⁇ 1 point convolution.
- the residual feature map output by the nonlinear mapping layer is input to the upsampling layer, and the upsampling layer converts the channel information of the residual feature map into spatial information, for example, the magnification of the super-resolution is ⁇ 4, which is input to the feature map channel of the upsampling layer It must be 16 ⁇ 3.
- the channel information is converted into spatial information, that is, the final output image of the up-sampling layer is a three-channel color image of 4 times the size.
- each concatenated block includes three residual units and three first convolutional layers, and the residual units and the first convolutional layer are in sequence. United.
- the residual units are connected together by local cascade, and the function of local cascade connection is the same as that of global cascade connection.
- the feature map output by the convolutional layer is used as the input of the cascade block, and the output is obtained through the first residual unit processing, and the feature map output by the convolution layer and the output of the first residual unit are spliced, and then input after splicing Go to the first first convolutional layer for dimensionality reduction processing.
- the output of the Nth residual unit is obtained, the output of each residual unit before the Nth residual unit and the feature map output by the convolutional layer are spliced, and then input the Nth first convolution after splicing
- the layer performs dimensionality reduction processing until the output of the last first convolutional layer in a concatenated block is obtained.
- the first convolutional layer in this embodiment is the first convolutional layer in a concatenated block, and the first convolutional layer may be a 1 ⁇ 1 point convolution.
- the 1 ⁇ 1 point volume corresponding to each residual unit in FIG. 4 can be replaced with a combination of group convolution and 1 ⁇ 1 point convolution to reduce the processing time.
- the number of parameters It can be understood that the number of concatenated blocks and the first convolutional layer in the image reconstruction model is not limited, and the number of residual units and the first convolutional layer in each concatenated block is not limited either. Adjust according to different needs.
- performing super-resolution reconstruction on the background image includes:
- Operation 602 Perform super-resolution reconstruction on the background image by using the interpolation algorithm to obtain a background image of a third resolution, where the third resolution is greater than the first resolution.
- interpolation algorithms include but are not limited to nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation.
- the electronic device may perform super-resolution reconstruction on the background image of the first resolution by using at least one of the nearest neighbor interpolation algorithm, the bilinear interpolation algorithm, and the bicubic interpolation algorithm to obtain the reconstructed third resolution.
- Background image the third resolution is greater than the first resolution.
- the electronic device may also perform super-resolution reconstruction on the background image of the first resolution by using the fast super-resolution algorithm to obtain the reconstructed background image of the third resolution.
- the reconstructed foreground image and background image of the target subject are merged to obtain the target image, including:
- the target subject foreground image of the second resolution and the background image of the third resolution are adjusted to corresponding sizes.
- the electronic device can determine the size of the target subject foreground image of the second resolution, and adjust the size of the background image of the third resolution according to the size of the target subject foreground image of the second resolution, so that the reconstructed target subject foreground The size of the image and the background image are the same.
- the electronic device may also adjust the size of the reconstructed target subject foreground image according to the size of the reconstructed background image, so that the reconstructed target subject foreground image and the background image have the same size.
- the electronic device can adjust both the size of the reconstructed foreground image of the target subject and the size of the background image, so that the size of the reconstructed foreground image of the target subject and the background image reach the same target size.
- the resized second-resolution target subject foreground image and the third-resolution background image are merged to obtain a target image.
- image fusion refers to the process of image processing and computer technology on image data about the same image collected by multiple source channels to maximize the extraction of favorable information in the channel to synthesize a high-quality image.
- the electronic device may merge the resized target subject foreground image of the second resolution and the background image of the third resolution.
- the electronic device can process the reconstructed foreground image and background image of the target subject through the Poisson fusion algorithm, etc., to obtain the target image.
- the above-mentioned image processing method uses the interpolation algorithm to perform super-resolution reconstruction on the background image to obtain a third-resolution background image, and adjust the second-resolution target subject foreground image and the third-resolution background image to corresponding sizes , Can adjust images of different resolutions and sizes to the same size.
- the resized second-resolution target subject foreground image and the third-resolution background image are merged to obtain a complete reconstructed image, thereby obtaining the target image.
- the electronic device may pre-train the image reconstruction model based on the background image samples.
- the background sample pair contains two identical background images, one is the labeled high-resolution background image, and the unlabeled low-resolution background image is input to the untrained image reconstruction model for reconstruction processing, and the reconstructed background image is compared with
- the labeled high-resolution background images are compared to continuously adjust the parameters of the image reconstruction model, and the training is stopped when the threshold is met.
- the electronic device can input the background image of the image to be processed into the trained image reconstruction model, and perform super-resolution reconstruction on the background image through the trained image reconstruction model to obtain the reconstructed background image.
- the resolution of the reconstructed background image is greater than the first resolution.
- the image processing method is applied to video processing; the image to be processed at the first resolution is each frame of image to be processed in the video at the first resolution.
- the image processing method is applied to video processing, and a low-resolution video image can be reconstructed into a high-resolution image through the image processing method.
- the electronic device may use the resolution of the video to be processed as the first resolution, and the to-be-processed image of the first resolution is each frame of the to-be-processed image in the video.
- the obtaining of the to-be-processed image of the first resolution includes:
- Operation 702 Obtain each frame of image to be processed in the video of the first resolution.
- the electronic device may obtain the video of the first resolution from a local or other device or the network, or may record the video through the electronic device.
- the electronic device can obtain each frame of the image to be processed in the video of the first resolution.
- the target subject in each frame of the image to be processed in the video is identified, and the foreground image and background image of the target subject in each frame of the image to be processed are obtained.
- the electronic device can input each frame of the to-be-processed image into the subject detection model, identify the target subject in each frame of the to-be-processed image through the subject detection model, and segment each frame of the to-be-processed image into a foreground image and a background image of the target subject. Further, the binary mask map corresponding to the segmentation of each frame of the image to be processed can be output through the subject detection model.
- super-resolution reconstruction is performed on the foreground image and the background image of the target subject in each frame of the image to be processed.
- the electronic device After the electronic device obtains the foreground image and the background image of the target subject in each frame of the image to be processed through the subject recognition model, it can input the foreground image of the target subject in each frame of the image to be processed into the image reconstruction model.
- the super-resolution reconstruction of the target subject foreground image in each frame of the to-be-processed image is performed by the image reconstruction model, and a high-resolution target subject foreground image after the reconstruction of the target subject foreground image of each frame of the image to be processed is obtained.
- the resolution of the reconstructed foreground image of the target subject is greater than the first resolution.
- the electronic device can perform super-resolution reconstruction on the background image in each frame of the to-be-processed image through a fast super-division algorithm or interpolation algorithm, etc., to obtain a reconstructed high-resolution background image of each frame of the to-be-processed image.
- the resolution of the reconstructed background image is greater than the first resolution.
- the resolution of the foreground image and the resolution of the background image of the reconstructed target subject may be the same resolution or different resolutions.
- the resolution of the foreground image of the target subject in each frame after reconstruction is the same, and the resolution of the background image of each frame after reconstruction is the same.
- the resolutions of the reconstructed target subject foreground image and the background image of each frame are the same resolution.
- the reconstructed foreground image and background image of the target subject corresponding to each frame of the image to be processed are merged to obtain each frame of target image.
- the electronic device may establish a mapping relationship between the image to be processed, the reconstructed foreground image of the target subject, and the background image. Then, the electronic device performs fusion splicing processing on the reconstructed foreground image and background image of the target subject with a mapping relationship to obtain each frame of target image. Similarly, the resolution of each frame of the target image obtained after reconstruction is greater than the first resolution of the corresponding frame to be processed.
- a target video is generated according to each frame of the target image, and the resolution of the target video is greater than the first resolution.
- the electronic device may merge and superimpose each frame of target image in the order of each frame to be processed to obtain a high-resolution video, that is, the target video.
- the resolution of the target video is greater than the first resolution
- the resolution of each frame of the target image in the target video is greater than the first resolution.
- the above image processing method is applied to video processing scenes.
- identify the target subject in each frame of the to-be-processed image in the video and obtain the foreground image and background image of the target subject in each frame of the to-be-processed image.
- the foreground image and background image of the target subject in the frame to be processed are reconstructed by super resolution.
- the reconstructed foreground image and background image of the target subject corresponding to each frame of image to be processed are merged to obtain the target image of each frame.
- the image generates a target video, the resolution of the target video is greater than the first resolution, and the low-resolution video can be reconstructed into a high-resolution video.
- the identifying the target subject in the image to be processed includes:
- a center weight map corresponding to the image to be processed is generated, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.
- the central weight map refers to a map used to record the weight value of each pixel in the image to be processed.
- the weight value recorded in the center weight map gradually decreases from the center to the four sides, that is, the center weight is the largest, and the weight gradually decreases toward the four sides.
- the weight value from the center pixel point of the image to be processed to the edge pixel point of the image is gradually reduced by the center weight graph.
- the ISP processor or the central processor can generate a corresponding central weight map according to the size of the image to be processed.
- the weight value represented by the center weight map gradually decreases from the center to the four sides.
- the center weight map can be generated using a Gaussian function, a first-order equation, or a second-order equation.
- the Gaussian function may be a two-dimensional Gaussian function.
- Operation 804 Input the to-be-processed image and the center weight map into the subject detection model to obtain a confidence map of the subject area, where the subject detection model is based on the to-be-processed image, the center weight map and the corresponding already processed image of the same scene in advance.
- the model obtained by training the labeled subject mask.
- the subject detection model is obtained by pre-collecting a large amount of training data, and inputting the training data into the subject detection model containing the initial network weight for training.
- Each set of training data includes the image to be processed corresponding to the same scene, the center weight map and the labeled subject mask map.
- the image to be processed and the center weight map are used as the input of the trained subject detection model, and the labeled subject mask map is used as the ground truth that the trained subject detection model expects to output.
- the subject mask map is an image filter template used to identify the subject in the image, which can block other parts of the image and filter out the subject in the image.
- the subject detection model can be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.
- the ISP processor or the central processor can input the to-be-processed image and the center weight map into the subject detection model, and the subject area confidence map can be obtained by performing the detection.
- the subject area confidence map is used to record the probability of the subject which can be recognized. For example, the probability of a certain pixel belonging to a person is 0.8, the probability of a flower is 0.1, and the probability of a background is 0.1.
- Operation 806 Determine a target subject in the image to be processed according to the subject region confidence map.
- the ISP processor or the central processing unit can select the highest or second highest confidence level as the subject in the image to be processed according to the subject area confidence map. If there is one subject, the subject will be the target subject; if there are multiple subjects , You can select one or more subjects as the target subject according to your needs.
- the image to be processed is obtained, and after the center weight map corresponding to the image to be processed is generated, the image to be processed and the center weight map are input into the corresponding subject detection model for detection, and the subject area confidence can be obtained According to the confidence map of the subject area, the target subject in the image to be processed can be determined.
- the center weight map can make the object in the center of the image easier to be detected.
- the subject detection model trained on the film map can more accurately identify the target subject in the image to be processed.
- the determining the target subject in the image to be processed according to the subject region confidence map includes:
- the subject region confidence map is processed to obtain a subject mask map.
- the subject region confidence map can be filtered by the ISP processor or the central processing unit to obtain the subject mask map.
- the filtering process can be configured to configure a confidence threshold to filter pixels with a confidence value lower than the confidence threshold in the confidence map of the subject area.
- the confidence threshold may be an adaptive confidence threshold, a fixed threshold, or a corresponding threshold configured by region.
- the image to be processed is detected, and a highlight area in the image to be processed is determined.
- the highlight area refers to an area where the brightness value is greater than the brightness threshold.
- the ISP processor or the central processing unit performs highlight detection on the image to be processed, selects target pixels with a brightness value greater than the brightness threshold, and applies connected domain processing to the target pixels to obtain the highlight area.
- Operation 906 according to the highlight area in the image to be processed and the subject mask map, determine a target subject for eliminating the highlight in the image to be processed.
- the ISP processor or the central processing unit can perform a difference calculation or a logical AND calculation between the highlight area in the image to be processed and the subject mask map to obtain the target subject for eliminating the highlight in the image to be processed.
- the subject area confidence map is filtered to obtain the subject mask map, which improves the reliability of the subject area confidence map.
- the image to be processed is detected to obtain the highlight area, and then processed with the subject mask map.
- the target subject with the highlight eliminated is obtained, and the filter is used to process the highlight and highlight areas that affect the accuracy of the subject recognition separately, which improves the accuracy and accuracy of the subject recognition.
- processing the subject region confidence map to obtain a subject mask map includes: performing adaptive confidence threshold filtering processing on the subject region confidence map to obtain a binarized mask map.
- the binarized mask image includes a main body area and a background area; the binarized mask image is subjected to morphological processing and guided filtering processing to obtain the main body mask image.
- the ISP processor or the central processing unit filters the confidence map of the subject area according to the adaptive confidence threshold, and then uses 1 to represent the confidence value of the retained pixels, and uses 0 to represent the confidence value of the removed pixels. , Get the binarization mask map.
- Morphological treatments can include corrosion and expansion. You can perform the erosion operation on the binarized mask first, and then perform the expansion operation to remove the noise; and then conduct the guided filtering process on the binarized mask after morphological processing to realize the edge filtering operation and obtain the main mask for edge extraction.
- Membrane diagram
- the binarized mask image includes a subject area and a background area
- fusing the reconstructed target subject foreground image and the background image to obtain the target image includes: the reconstructed target subject foreground image Fusion is performed with the main body region in the binarized mask image, and the reconstructed background image is blended with the background region in the binarized mask image to obtain a target image.
- the binarization mask image includes a main body area and a background area.
- the main body area may be white
- the background area may be black.
- the electronic device merges the reconstructed target subject foreground image with the main body area in the binarized mask image, that is, merges with the black part, and combines the reconstructed background image with the background in the binarized mask image The area is merged, and the black part is merged to obtain the target image.
- the method further includes: acquiring a depth map corresponding to the image to be processed; the depth map includes at least one of a TOF depth map, a binocular depth map, and a structured light depth map; The depth map undergoes registration processing to obtain the to-be-processed image and the depth map after the registration of the same scene.
- the depth map refers to a map containing depth information.
- the corresponding depth map is obtained by shooting the same scene with a depth camera or binocular camera.
- the depth camera may be a structured light camera or a TOF camera.
- the depth map may be at least one of a structured light depth map, a TOF depth map, and a binocular depth map.
- the electronic device can use the ISP processor or the central processing unit to shoot the same scene through the camera to obtain the image to be processed and the corresponding depth map, and then use the camera calibration parameters to register the image to be processed with the depth map to obtain the registered image to be processed. Process images and depth maps.
- a simulated depth map when the depth map cannot be obtained by shooting, a simulated depth map can be automatically generated.
- the depth value of each pixel in the simulated depth map can be a preset value.
- the depth value of each pixel in the simulated depth map may correspond to different preset values.
- the inputting the image to be processed and the center weight map into the subject detection model to obtain the confidence map of the subject region includes: the registered image to be processed, the depth map, and the center weight The image is input into the subject detection model to obtain the confidence map of the subject region; where the subject detection model is trained in advance based on the image to be processed, the depth map, the center weight map and the corresponding labeled subject mask map of the same scene Model.
- the subject detection model is obtained by pre-collecting a large amount of training data, and inputting the training data into the subject detection model containing the initial network weight for training.
- Each set of training data includes the image to be processed corresponding to the same scene, the depth map, the center weight map and the labeled subject mask map.
- the image to be processed and the center weight map are used as the input of the trained subject detection model, and the labeled subject mask map is used as the true value that the trained subject detection model expects to output.
- the subject mask map is an image filter template used to identify the subject in the image, which can block other parts of the image and filter out the subject in the image.
- the subject detection model can be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.
- the depth map and the center weight map are used as the input of the subject detection model.
- the depth information of the depth map can be used to make objects closer to the camera easier to be detected.
- the center weight map is used to have a large center weight and a small weight on the four sides.
- the central attention mechanism makes it easier to detect the object in the center of the image.
- the introduction of a depth map to enhance the depth of the subject, and the introduction of a central weight map to enhance the central attention feature of the subject not only can accurately identify the target subject in a simple scene , It greatly improves the accuracy of subject recognition in complex scenes.
- the introduction of depth maps can solve the problem of poor robustness of traditional target detection methods to the ever-changing targets of natural images.
- a simple scene refers to a scene with a single subject and low contrast in the background area.
- Fig. 10 is a schematic diagram of the effect of subject recognition on an image to be processed in an embodiment.
- the image to be processed is an RGB image 1002, and there is a butterfly in the RGB image 1002.
- the subject area confidence map 1004 is obtained, and then the subject area confidence map 1004 is filtered And binarization to obtain a binarized mask map 1006, and then perform morphological processing and guided filtering on the binarized mask map 1006 to achieve edge enhancement, and obtain a main mask map 1008.
- an image processing method including:
- Operation (a1) is to obtain the image to be processed at the first resolution.
- Operation (a2) is to generate a center weight map corresponding to the image to be processed, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.
- Operation (a3) input the to-be-processed image and the center weight map into the subject detection model to obtain the confidence map of the subject area, where the subject detection model is based on the to-be-processed image, center weight map and corresponding The model obtained by training on the marked subject mask map.
- an adaptive confidence threshold filtering process is performed on the confidence map of the main body area to obtain a binarized mask image, which includes the main body area and the background area.
- Operation (a5) is to perform morphological processing and guided filtering processing on the binarized mask image to obtain the main body mask image.
- Operation (a6) is to detect the image to be processed, and determine the highlight area in the image to be processed.
- Operation (a7) according to the highlight area in the image to be processed and the subject mask map, determine the target subject to eliminate the highlight in the image to be processed, and obtain the target subject foreground image and background image.
- Operation (a8) extract the features of the target subject foreground map through the image reconstruction model, and obtain the feature map.
- the image reconstruction model is a model obtained by pre-training the subject foreground map sample pair.
- the subject foreground map sample pair includes the first The subject foreground image of the resolution and the subject foreground image of the second resolution.
- super-resolution processing is performed on the feature map through the image reconstruction model to obtain a target subject foreground image with a second resolution, the second resolution being greater than the first resolution.
- super-resolution reconstruction is performed on the background image through the interpolation algorithm to obtain a background image with a third resolution, the third resolution being greater than the first resolution.
- the target subject foreground image of the second resolution and the background image of the third resolution are adjusted to corresponding sizes.
- the resized second-resolution target subject foreground image and the main body area in the binarized mask image are merged, and the resized third-resolution background image and the binary The background area in the mask image is fused to obtain the target image.
- subject recognition is performed on the image to be processed with the first resolution through the subject detection model, and the foreground image and background image of the target subject can be quickly and accurately obtained.
- the super-resolution reconstruction of the foreground image of the target subject through the image reconstruction model can better process the details of the foreground image of the target subject, and make the details of the reconstructed target subject foreground image clearer.
- the super-resolution reconstruction of the background image is carried out through the interpolation algorithm, and the speed of the super-resolution reconstruction is taken into account while ensuring the clarity of the foreground image of the target subject.
- the reconstructed foreground image and background image of the target subject with different resolutions are adjusted to the same size, and merged with the corresponding regions in the binarized mask image to obtain the target image.
- FIG. 11 it is a structural diagram of an image processing method in an embodiment.
- the electronic device inputs the to-be-processed image of the first resolution into the subject detection model to obtain a foreground image and a background image of the target subject.
- the image reconstruction model composed of the cascaded residual network is used to perform super-resolution reconstruction of the foreground image of the target subject, and the background image is super-resolution reconstruction through the interpolation algorithm.
- the reconstructed foreground image and background image of the target subject are fused to obtain a target image, and the resolution of the target image is greater than the first resolution.
- Fig. 12 is a structural block diagram of an image processing apparatus according to an embodiment. As shown in FIG. 12, it includes: an acquisition module 1202, an identification module 1204, a reconstruction module 1206, and a fusion module 1208.
- the obtaining module 1202 is used to obtain the image to be processed at the first resolution.
- the recognition module 1204 is used to recognize the target subject in the image to be processed to obtain the foreground image and background image of the target subject.
- the reconstruction module 1206 is configured to perform super-resolution reconstruction of the target subject foreground image and the background image respectively.
- the fusion module 1208 is used for fusing the reconstructed foreground image and background image of the target subject to obtain a target image, the resolution of the target image is greater than the first resolution.
- the above-mentioned image processing device obtains a to-be-processed image of the first resolution and recognizes a target subject in the to-be-processed image to obtain a foreground image and a background image of the target subject. Perform super-resolution reconstruction on the foreground image and background image of the target subject respectively, and perform different super-resolution processing on the foreground image and background image of the target subject. The reconstructed foreground image and background image of the target subject are fused to obtain the target image.
- the resolution of the target image is greater than the first resolution, so that the details of the image can be taken into account and the detail processing effect of image reconstruction is improved.
- the reconstruction module 1206 is further used to: extract the features of the foreground image of the target subject through an image reconstruction model to obtain a feature map.
- the image reconstruction model is a model obtained by training based on the subject foreground image sample pair in advance.
- the foreground image sample pair includes the subject foreground image of the first resolution and the subject foreground image of the second resolution; the feature map is super-resolution processed through the image reconstruction model to obtain the target subject foreground image of the second resolution, The second resolution is greater than the first resolution.
- the above-mentioned image processing device extracts the features of the foreground image of the target subject by using the trained image reconstruction model to obtain a feature map, and performs super-resolution processing on the feature map through the image reconstruction model to obtain the target subject foreground image of the second resolution
- the second resolution is greater than the first resolution, and local super-resolution reconstruction processing can be performed on the foreground image of the target subject, and the details of the foreground image of the target subject can be better processed, thereby ensuring the clarity of the target subject.
- the reconstruction module 1206 is further configured to: perform super-resolution reconstruction on the background image through the interpolation algorithm to obtain a background image with a third resolution, where the third resolution is greater than the first resolution;
- the fusion module 1208 is also used to: adjust the target subject foreground image of the second resolution and the background image of the third resolution to corresponding sizes; adjust the resized target subject foreground image of the second resolution and the third resolution Fusion of the background image to obtain the target image.
- the image processing device in this embodiment uses the interpolation algorithm to perform super-resolution reconstruction on the background image to obtain a third-resolution background image, and adjust the second-resolution target subject foreground image and the third-resolution background image For the corresponding size, images of different resolutions and sizes can be adjusted to the same size.
- the resized second-resolution target subject foreground image and the third-resolution background image are merged to obtain a complete reconstructed image, thereby obtaining the target image.
- the image processing method is applied to video processing; the image to be processed at the first resolution is each frame of image to be processed in the video at the first resolution;
- the obtaining module 1202 is further configured to obtain each frame of image to be processed in the video of the first resolution.
- the identification module 1204 is also used to identify the target subject in each frame of the image to be processed in the video, and obtain the foreground image and background image of the target subject in each frame of the image to be processed.
- the reconstruction module 1206 is also used to perform super-resolution reconstruction on the foreground image and background image of the target subject in each frame of the image to be processed.
- the fusion module 1208 is also used to: fuse the reconstructed target subject foreground image and background image corresponding to each frame of the image to be processed to obtain a target image of each frame; generate a target video according to each frame of target image, and the resolution of the target video is greater than The first resolution.
- the above-mentioned image processing device is applied to video processing scenes.
- identify the target subject in each frame of the to-be-processed image in the video and obtain the foreground image and background image of the target subject in each frame of the to-be-processed image.
- the foreground image and background image of the target subject in the frame to be processed are reconstructed by super resolution.
- the reconstructed foreground image and background image of the target subject corresponding to each frame of image to be processed are merged to obtain the target image of each frame.
- the image generates a target video, the resolution of the target video is greater than the first resolution, and the low-resolution video can be reconstructed into a high-resolution video.
- the recognition module 1204 is further configured to: generate a center weight map corresponding to the image to be processed, wherein the weight value represented by the center weight map gradually decreases from the center to the edge;
- the center weight map is input into the subject detection model to obtain a confidence map of the subject area, where the subject detection model is trained in advance based on the image to be processed in the same scene, the center weight map and the corresponding labeled subject mask map Model; Determine the target subject in the image to be processed according to the subject region confidence map.
- the image processing device in this embodiment obtains the image to be processed, and generates a center weight map corresponding to the image to be processed, and then inputs the image to be processed and the center weight map into the corresponding subject detection model for detection, and the subject area confidence can be obtained According to the confidence map of the subject area, the target subject in the image to be processed can be determined. Using the center weight map can make the object in the center of the image easier to be detected. Use the trained image to be processed, the center weight map and the subject mask. The subject detection model trained on the film map can more accurately identify the target subject in the image to be processed.
- the recognition module 1204 is further used to: process the subject region confidence map to obtain a subject mask map; detect the to-be-processed image to determine the highlight area in the to-be-processed image; according to the to-be-processed image The highlight area in the middle and the subject mask map determine the target subject to eliminate the highlight in the image to be processed.
- the subject area confidence map is filtered to obtain the subject mask map, which improves the reliability of the subject area confidence map.
- the image to be processed is detected to obtain the highlight area, and then processed with the subject mask map.
- the target subject with the highlight eliminated is obtained, and the filter is used to process the highlights and highlight areas that affect the accuracy of the subject recognition separately, which improves the accuracy and accuracy of the subject recognition.
- the recognition module 1204 is further configured to: perform adaptive confidence threshold filtering processing on the subject region confidence map to obtain a binarized mask map, the binarized mask map including the subject region and the background region ; Perform morphological processing and guided filtering processing on the binary mask image to obtain the main body mask image;
- the fusion module 1208 is also used to: fuse the reconstructed target subject foreground image with the subject area in the binarized mask image, and fuse the reconstructed background image with the background area in the binarized mask image , Get the target image.
- the acquisition module 1202 is further configured to: acquire a depth map corresponding to the image to be processed; the depth map includes at least one of a TOF depth map, a binocular depth map, and a structured light depth map; Process the image and the depth map for registration processing, and obtain the to-be-processed image and the depth map after the registration of the same scene.
- the recognition module 1204 is also used to: input the registered image to be processed, the depth map, and the center weight map into the subject detection model to obtain the subject region confidence map; wherein, the subject detection model is based on the same scene in advance The model obtained by training the image to be processed, the depth map, the center weight map, and the corresponding labeled subject mask map.
- the depth map and the center weight map are used as the input of the subject detection model.
- the depth information of the depth map can be used to make objects closer to the camera easier to be detected.
- the center weight map is used to have a large center weight and a small weight on the four sides.
- the central attention mechanism makes it easier to detect the object in the center of the image.
- the introduction of a depth map to enhance the depth of the subject, and the introduction of a central weight map to enhance the central attention feature of the subject not only can accurately identify the target subject in a simple scene , Which greatly improves the accuracy of subject recognition in complex scenes.
- the introduction of depth maps can solve the problem of poor robustness of traditional target detection methods to the ever-changing targets of natural images.
- a simple scene refers to a scene with a single subject and low contrast in the background area.
- the image processing apparatus may be divided into different modules as required to complete all or part of the functions of the above-mentioned image processing apparatus.
- FIG. 13 is a schematic diagram of the internal structure of an electronic device in an embodiment.
- the electronic device includes a processor and a memory connected through a system bus.
- the processor is used to provide calculation and control capabilities to support the operation of the entire electronic device.
- the memory may include a non-volatile storage medium and internal memory.
- the non-volatile storage medium stores an operating system and a computer program.
- the computer program can be executed by a processor to implement an image processing method provided in the following embodiments.
- the internal memory provides a cached operating environment for the operating system computer program in the non-volatile storage medium.
- the electronic device can be a mobile phone, a tablet computer or a personal digital assistant or a wearable device.
- each module in the image processing apparatus provided in the embodiment of the present application may be in the form of a computer program.
- the computer program can be run on a terminal or server.
- the program module composed of the computer program can be stored in the memory of the terminal or server.
- the embodiment of the present application also provides a computer-readable storage medium.
- a computer program product containing instructions that, when run on a computer, causes the computer to execute an image processing method.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM), which acts as external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM double data rate SDRAM
- ESDRAM enhanced SDRAM
- SLDRAM synchronous chain Channel
- RDRAM synchronous chain Channel
- RDRAM direct RAM
- DRAM direct memory bus dynamic RAM
- RDRAM memory bus dynamic RAM
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
一种图像处理方法,包括:获取第一分辨率的待处理图像;识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;分别对所述目标主体前景图和所述背景图进行超分辨率重建;将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
Description
相关申请的交叉引用
本申请要求于2019年07月26日提交中国专利局、申请号为2019106834921、发明名称为“图像处理方法和装置、电子设备、计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及影像领域,特别是涉及一种图像处理方法、装置、电子设备、计算机可读存储介质。
超分辨重建技术目标是从低分辨率图像重建得到高分辨率图像,使得重建得到的图像更清晰。通过超分辨重建可以将一些低分辨率图像重建达到用户想要的效果。传统的超分辨重建技术一般是针对整张图像作统一的超分辨重建处理,重建得到的图像各个区域无差别,无法兼顾图像的细节。
发明内容
根据本申请的各种实施例,提供一种图像处理方法、装置、电子设备、计算机可读存储介质。
一种图像处理方法,包括:
获取第一分辨率的待处理图像;
识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;
分别对所述目标主体前景图和所述背景图进行超分辨率重建;
将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
一种图像处理装置,包括:
获取模块,用于获取第一分辨率的待处理图像;
识别模块,用于识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;
重建模块,用于分别对所述目标主体前景图和所述背景图进行超分辨率重建;
融合模块,用于将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:
获取第一分辨率的待处理图像;
识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;
分别对所述目标主体前景图和所述背景图进行超分辨率重建;及
将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:
获取第一分辨率的待处理图像;
识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;
分别对所述目标主体前景图和所述背景图进行超分辨率重建;及
将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
上述图像处理方法和装置、电子设备、计算机可读存储介质,通过获取第一分辨率的待处理图像,识别待处理图像中的目标主体,得到目标主体前景图和背景图,分别对目标主体前景图和背景图进行超分辨率重建,将重建后的目标主体前景图和背景图进行融合,得到目标图像,目标图像的分辨率大于第一分辨率,能够兼顾图像的细节,提高图像重建的细节处理效果。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中电子设备的内部结构框图。
图2为一个实施例中图像处理方法的流程图。
图3为一个实施例中图像重建模型的架构图。
图4为一个实施例中级联块的结构图。
图5为另一个实施例中级联块的结构图。
图6为一个实施例中对背景图进行超分辨率重建的流程图。
图7为一个实施例中图像处理方法应用于视频处理场景的流程图。
图8为一个实施例中识别该待处理图像中的目标主体的流程图。
图9为一个实施例中根据主体区域置信度图确定待处理图像中的目标主体的流程图。
图10为一个实施例中对待处理图像进行主体识别的效果示意图。
图11为一个实施例中图像处理方法的架构图。
图12为一个实施例中图像处理装置的结构框图。
图13为另一个实施例中电子设备的内部结构示意图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例中的图像处理方法可应用于电子设备。该电子设备可为带有摄像头的计算机设备、个人数字助理、平板电脑、智能手机、穿戴式设备等。电子设备中的摄像头在拍摄图像时,会进行自动对焦,以保证拍摄的图像清晰。
在一个实施例中,上述电子设备中可包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义ISP(Image Signal Processing,图像信号处理)管线的各种处理单元。图1为一个实施例中图像处理电路的示意图。如图1所示,为便于说明,仅示出与本申请实施例相关的图像处理技术的各个方面。
如图1所示,图像处理电路包括第一ISP处理器130、第二ISP处理器140和控制逻辑器150。第一摄像头110包括一个或多个第一透镜112和第一图像传感器114。第一图像传感器114可包括色彩滤镜阵列(如Bayer滤镜),第一图像传感器114可获取用第一图像传感器114的每个成像像素捕捉的光强度和波长信息,并提供可由第一ISP处理器130处理的一组图像数据。第二摄像头120包括一个或多个第二透镜122和第二图像传感器124。第二图像传感器124可包括色彩滤镜阵列(如Bayer滤镜),第二图像传感器124可获取用第二图像传 感器124的每个成像像素捕捉的光强度和波长信息,并提供可由第二ISP处理器140处理的一组图像数据。
第一摄像头110采集的第一图像传输给第一ISP处理器130进行处理,第一ISP处理器130处理第一图像后,可将第一图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器150,控制逻辑器150可根据统计数据确定第一摄像头110的控制参数,从而第一摄像头110可根据控制参数进行自动对焦、自动曝光等操作。第一图像经过第一ISP处理器130进行处理后可存储至图像存储器160中,第一ISP处理器130也可以读取图像存储器160中存储的图像以对进行处理。另外,第一图像经过ISP处理器130进行处理后可直接发送至显示器170进行显示,显示器170也可以读取图像存储器160中的图像以进行显示。
其中,第一ISP处理器130按多种格式逐个像素地处理图像数据。例如,每个图像像素可具有8、10、12或14比特的位深度,第一ISP处理器130可对图像数据进行一个或多个图像处理操作、收集关于图像数据的统计信息。其中,图像处理操作可按相同或不同的位深度精度进行。
图像存储器160可为存储器装置的一部分、存储设备、或电子设备内的独立的专用存储器,并可包括DMA(Direct Memory Access,直接直接存储器存取)特征。
当接收到来自第一图像传感器114接口时,第一ISP处理器130可进行一个或多个图像处理操作,如时域滤波。处理后的图像数据可发送给图像存储器160,以便在被显示之前进行另外的处理。第一ISP处理器130从图像存储器160接收处理数据,并对处理数据进行RGB和YCbCr颜色空间中的图像数据处理。第一ISP处理器130处理后的图像数据可输出给显示器170,以供用户观看和/或由图形引擎或GPU(Graphics Processing Unit,图形处理器)进一步处理。此外,第一ISP处理器130的输出还可发送给图像存储器160,且显示器170可从图像存储器160读取图像数据。在一个实施例中,图像存储器160可被配置为实现一个或多个帧缓冲器。
第一ISP处理器130确定的统计数据可发送给控制逻辑器150。例如,统计数据可包括自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、第一透镜112阴影校正等第一图像传感器114统计信息。控制逻辑器150可包括执行一个或多个例程(如固件)的处理器和/或微控制器,一个或多个例程可根据接收的统计数据,确定第一摄像头110的控制参数及第一ISP处理器130的控制参数。例如,第一摄像头110的控制参数可包括增益、曝光控制的积分时间、防抖参数、闪光控制参数、第一透镜112控制参数(例如聚焦或变焦用焦距)、或这些参数的组合等。ISP控制参数可包括用于自动白平衡和颜色调整(例如,在RGB处理期间)的增益水平和色彩校正矩阵,以及第一透镜112阴影校正参数。
同样地,第二摄像头120采集的第二图像传输给第二ISP处理器140进行处理,第二ISP处理器140处理第一图像后,可将第二图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器150,控制逻辑器150可根据统计数据确定第二摄像头120的控制参数,从而第二摄像头120可根据控制参数进行自动对焦、自动曝光等操作。第二图像经过第二ISP处理器140进行处理后可存储至图像存储器160中,第二ISP处理器140也可以读取图像存储器160中存储的图像以对进行处理。另外,第二图像经过ISP处理器140进行处理后可直接发送至显示器170进行显示,显示器170也可以读取图像存储器160中的图像以进行显示。第二摄像头120和第二ISP处理器140也可以实现如第一摄像头110和第一ISP处理器130所描述的处理过程。
在一个实施例中,第一摄像头110可为彩色摄像头,第二摄像头120可为TOF(Time Of Flight,飞行时间)摄像头或结构光摄像头。TOF摄像头可获取TOF深度图,结构光摄像头可获取结构光深度图。第一摄像头110和第二摄像头120可均为彩色摄像头。通过两个彩色摄像头获取双目深度图。第一ISP处理器130和第二ISP处理器140可为同一ISP处理器。
第一摄像头110和第二摄像头120采集同一场景分别得到第一分辨率的待处理图像和深 度图,将第一分辨率的待处理图像和深度图发送给ISP处理器。ISP处理器可根据相机标定参数对第一分辨率的待处理图像和深度图进行配准,保持视野完全一致;然后再生成与第一分辨率的待处理图像对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小;将第一分辨率的待处理图像和中心权重图输入到训练好的主体检测模型中,得到主体区域置信度图,再根据主体区域置信度图确定第一分辨率的待处理图像中的目标主体;也可将第一分辨率的待处理图像、深度图和中心权重图输入到训练好的主体检测模型中,得到主体区域置信度图,再根据主体区域置信度图确定第一分辨率的待处理图像中的目标主体,得到目标主体前景图和背景图。接着,电子设备分别对该目标主体前景图和该背景图进行超分辨率重建,将重建后的目标主体前景图和背景图进行融合,得到目标图像,该目标图像的分辨率大于该第一分辨率,能够提高目标主体的细节处理效果,同时也能够提高图像重建的细节处理效果。
图2为一个实施例中图像处理方法的流程图。本实施例中的图像处理方法,以运行于图1中的终端或服务器上为例进行描述。如图2所示,该图像处理方法包括:
操作202,获取第一分辨率的待处理图像。
其中,第一分辨率是指图像分辨率,图像分辨率是指图像中存储的信息量,是每英寸图像内存在的像素点的数量。待处理图像可通过摄像头拍摄任意场景得到图像,可以是彩色图像或黑白图像。该待处理图像可为电子设备本地存储的,也可为其他设备存储的,也可以为从网络上存储的,还可为电子设备实时拍摄的,不限于此。
具体地,电子设备的ISP处理器或中央处理器可从本地或其他设备或网络上获取第一分辨率的待处理图像,或者通过摄像头以第一分辨率拍摄一场景得到待处理图像。
操作204,识别该待处理图像中的目标主体,得到目标主体前景图和背景图。
其中,主体是指各种对象,如人、花、猫、狗、牛、蓝天、白云、背景等。目标主体是指需要的主体,可根据需要选择。主体检测(salient object detection)是指面对一个场景时,自动地对感兴趣区域进行处理而选择性的忽略不感兴趣区域。感兴趣区域称为主体区域。目标主体前景图是指待处理图像中的目标主体区域的图像,背景图是指待处理图像中除目标主体区域外的其余区域的图像。
具体地,电子设备可将待处理图像输入主体检测模型,通过主体检测模型识别出该待处理图像中的目标主体,并将待处理图像分割为目标主体前景图和背景图。进一步地,可通过主体检测模型输出分割的二值化掩膜图。
操作206,分别对目标主体前景图和背景图进行超分辨率重建。
其中,超分辨率重建是指通过低分辨率图像或图像序列重建得到高分辨率的图像。
具体地,电子设备通过主体识别模型得到第一分辨率的目标主体前景图和第一分辨率的背景图后,可将目标主体前景图输入图像重建模型。通过图像重建模型对目标主体前景图进行超分辨率重建,得到重建后的高分辨率的目标主体前景图。并且,该重建后的目标主体前景图的分辨率大于第一分辨率。接着,电子设备可通过快速超分算法或者插值算法等对第一分辨率的背景图进行超分辨率重建,得到重建后的高分辨率的背景图。并且,该重建后的背景图的分辨率大于第一分辨率。
在本实施例中,重建后的目标主体前景图的分辨率和背景图的分辨率可为相同的分辨率,也可为不同的分辨率。
操作208,将重建后的目标主体前景图和背景图进行融合,得到目标图像,该目标图像的分辨率大于该第一分辨率。
具体地,电子设备将重建后的目标主体前景图和背景图进行融合拼接处理,融合拼接后的图像即为目标图像。同样的,重建后得到的目标图像的分辨率大于待处理图像的第一分辨率。
本实施例的图像处理方法,通过获取第一分辨率的待处理图像,识别待处理图像中的目标主体,得到目标主体前景图和背景图。分别对目标主体前景图和背景图进行超分辨率重建, 可对目标主体前景图和背景图做不同的超分处理。将重建后的目标主体前景图和背景图进行融合,得到目标图像,目标图像的分辨率大于第一分辨率,使得可以兼顾图像的细节,提高图像重建的细节处理效果。
在一个实施例中,对该目标主体前景图进行超分辨率重建,包括:通过图像重建模型提取该目标主体前景图的特征,得到特征图,该图像重建模型是预先根据主体前景图样本对进行训练得到的模型,该主体前景图样本对中包括第一分辨率的主体前景图和第二分辨率的该主体前景图;通过该图像重建模型对特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,该第二分辨率大于该第一分辨率。
其中,特征图是指对待处理图像进行特征提取得到的图像。
具体地,电子设备可预先采集大量的主体前景图样本对,每个主体前景图样本对中包括一张第一分辨率的主体前景图和第二分辨率的该主体前景图。并将第一分辨率的主体前景图输入未训练的图像重建模型进行超分辨率重建,将图像重建模型输出的主体前景图与第二分辨率的该主体前景图进行对比,并根据差异调整图像重建模型。经过反复训练和调整,直到图像重建模型重建的主体前景图与第二分辨率的该主体前景图的差异小于阈值时,停止训练。
电子设备将目标主体前景图输入训练好的图像重建模型,图像重建模型可通过卷积层对该目标主体前景图进行特征提取,得到该目标主体前景图对应的特征图。通过该图像重建模型将特征图的通道信息转化为空间信息,得到第二分辨率的目标主体前景图,该第二分辨率大于该第一分辨率。
本实施例中的图像处理方法,通过使用训练好的图像重建模型提取该目标主体前景图的特征,得到特征图,通过该图像重建模型对特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,该第二分辨率大于该第一分辨率,能够针对目标主体前景图做局部的超分辨率重建处理,能够更好地处理目标主体前景图的细节,从而能够保证目标主体的清晰度。
如图3所示,为一个实施例中图像重建模型的架构图。该图像重建模型包括卷积层、非线性映射层和上采样层。非线性映射层中的残差单元(Residual)与第一卷积层依次级联,得到级联块(CascadingBlock)。该非线性映射层中包括多个级联块,级联块与第二卷积层依次级联,构成非线性映射层。即图3中的箭头称为全局级联连接。非线性映射层与上采样层连接,上采样层将图像的通道信息转换为空间信息,输出高分辨率图像。
电子设备将第一分辨率的目标主体前景图输入图像重建模型的卷积层进行特征提取,得到特征图。将特征图输入图像重建模型的非线性映射层,通过第一个级联块处理得到输出,并将卷积层输出的特征图和第一个级联块的输出进行拼接,拼接之后输入到第一个第一卷积层进行降维处理。接着,将降维后的特征图输入第二个级联块进行处理,将卷积层输出的特征图、第一个级联块的输出和第二个级联块的输出进行拼接,拼接之后输入到第二个第一卷积层进行降维处理。类似地,得到第N个级联块的输出后,将第N个级联块之前的各个级联块的输出和卷积层输出的特征图进行拼接,拼接之后输入第N个第一卷积层进行降维处理,直至得到非线性映射层中的最后一个第一卷积层的输出。本实施例中的第一卷积层可以是1×1点卷积。
将非线性映射层输出的残差特征图输入到上采样层,上采样层将残差特征图通道信息转换为空间信息,比如超分的倍率为×4,输入到上采样层的特征图通道必须为16×3,通过上采样层之后通道信息被转换成空间信息,即上采样层最后输出图像为4倍大小的三通道彩色图。
在一个实施例中,每个级联块的结构如图4所示,一个级联块中包括三个残差单元和三个第一卷积层,残差单元与第一卷积层依次级联。残差单元之间通过局部级联连接在一起,局部级联连接功能与全局级联连接功能相同。将卷积层输出的特征图作为级联块的输入,通过第一个残差单元处理得到输出,并将卷积层输出的特征图和第一个残差单元的输出进行拼接,拼接之后输入到第一个第一卷积层进行降维处理。类似地,得到第N个残差单元的输出后,将第N个残差单元之前的各个残差单元的输出和卷积层输出的特征图进行拼接,拼接之后输入第N个第一卷积层进行降维处理,直至得到一个级联块中的最后一个第一卷积层的输 出。需要注意的是,本实施例中的第一卷积层均为一个级联块中的第一卷积层,第一卷积层可以是1×1点卷积。
在一个实施例中,如图5所示,可将图4中的每个残差单元对应的1×1点卷替换为组卷积加1×1点卷积的组合,以减少处理过程中的参数数量。可以理解的是,该图像重建模型中的级联块和第一卷积层的数量并不限定,每个级联块中的残差单元和第一卷积层的数量也不做限定,可根据不同需求调整。
在一个实施例中,如图6所示,对该背景图进行超分辨率重建,包括:
操作602,通过该插值算法对背景图进行超分辨率重建,得到第三分辨率的背景图,该第三分辨率大于该第一分辨率。
其中,插值算法包括但不限于最近邻插值、双线性插值和双三次插值等。
具体地,电子设备可通过最近邻插值算法、双线性插值算法和双三次插值算法中的至少一种对第一分辨率的背景图进行超分辨率重建,得到重建后的第三分辨率的背景图,该第三分辨率大于该第一分辨率。
在本实施例中,电子设备还可通过快速超分算法对第一分辨率的背景图进行超分辨率重建,以得到重建后的第三分辨率的背景图。
该将重建后的目标主体前景图和背景图进行融合,得到目标图像,包括:
操作604,将第二分辨率的目标主体前景图和第三分辨率的背景图调整为相应的尺寸。
具体地,电子设备可将确定第二分辨率的目标主体前景图的尺寸,根据第二分辨率的目标主体前景图的尺寸调整第三分辨率的背景图的尺寸,使得重建后的目标主体前景图和背景图的尺寸相同。
在本实施例中,电子设备也可根据重建后的背景图的尺寸调整重建后的目标主体前景图的尺寸,使得重建后的目标主体前景图和背景图的尺寸相同。
在本实施例中,电子设备可对重建后的目标主体前景图的尺寸和背景图的尺寸都进行调整,使得重建后的目标主体前景图的尺寸和背景图达到同一目标尺寸。
操作606,将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合,得到目标图像。
其中,图像融合是指将多源信道所采集到的关于同一图像的图像数据经过图像处理和计算机技术,最大限度提取信道中的有利信息合成高质量的图像。
具体地,电子设备可将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合。电子设备可通过泊松融合算法等对重建后的目标主体前景图和背景图进行处理,得到目标图像。
上述图像处理方法,通过该插值算法对背景图进行超分辨率重建,得到第三分辨率的背景图,将第二分辨率的目标主体前景图和第三分辨率的背景图调整为相应的尺寸,能够将不同分辨率及不同尺寸的图像调整为相同的尺寸。将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合,得到完整的重建图像,从而得到目标图像。
在一个实施例中,电子设备可以预先根据背景图样本对对图像重建模型进行训练。背景样本对中为两张相同的背景图,一张为已标注的高分辨率背景图,未标注的低分辨率背景图输入未训练图像重建模型进行重建处理,并将重建后的背景图与已标注的高分辨率背景图进行对比,以不断调整图像重建模型的参数,直到满足阈值时停止训练。接着,电子设备可将待处理图像的背景图输入训练好的图像重建模型,通过训练好的图像重建模型对背景图进行超分辨率重建,得到重建后的背景图。该重建后的背景图的分辨率大于第一分辨率。
在一个实施例中,如图7所示,该图像处理方法应用于视频处理;该第一分辨率的待处理图像为第一分辨率的视频中的每帧待处理图像。
具体地,该图像处理方法应用于视频处理,通过该图像处理方法可将低分辨率的视频图像重建为高分辨率的图像。该图像处理方法应用于视频处理时,电子设备可将需要处理的视频的分辨率作为第一分辨率,则第一分辨率的待处理图像为该视频中的每帧待处理图像。
该获取第一分辨率的待处理图像,包括:
操作702,获取第一分辨率的视频中的每帧待处理图像。
具体地,电子设备可从本地或其他设备或网络上获取第一分辨率的视频,也可以通过电子设备进行视频录制。电子设备可获取第一分辨率的视频中的每一帧待处理图像。
该识别该待处理图像中的目标主体,得到目标主体前景图和背景图,包括:
操作704,识别该视频中的每帧待处理图像中的目标主体,得到每帧待处理图像中的目标主体前景图和背景图。
接着,电子设备可将每帧待处理图像输入主体检测模型,通过主体检测模型识别出每帧待处理图像中的目标主体,并将每帧待处理图像分割为目标主体前景图和背景图。进一步地,可通过主体检测模型输出每帧待处理图像对应分割的二值化掩膜图。
该分别对该目标主体前景图和该背景图进行超分辨率重建,包括:
操作706,分别对每帧待处理图像中的目标主体前景图和背景图进行超分辨率重建。
具体地,电子设备通过主体识别模型得到每帧待处理图像中的目标主体前景图和背景图后,可将每帧待处理图像中的目标主体前景图输入图像重建模型。通过图像重建模型对每帧待处理图像中的目标主体前景图进行超分辨率重建,得到每帧待处理图像的目标主体前景图重建后的高分辨率的目标主体前景图。并且,该重建后的目标主体前景图的分辨率均大于第一分辨率。接着,电子设备可通过快速超分算法或者插值算法等对每帧待处理图像中的背景图进行超分辨率重建,得到每帧待处理图像的重建后的高分辨率的背景图。并且,该重建后的背景图的分辨率均大于第一分辨率。
在本实施例中,重建后的目标主体前景图的分辨率和背景图的分辨率可为相同的分辨率,也可为不同的分辨率。
在本实施例中,重建后的各帧目标主体前景图的分辨率相同,重建后的各帧背景图的分辨率相同。
在本实施例中,重建后的各帧目标主体前景图和各帧背景图的分辨率均为同一分辨率。
该将重建后的主体前景图和背景图进行融合,得到目标图像,该目标图像的分辨率大于该第一分辨率,包括:
操作708,将每帧待处理图像对应的重建后的目标主体前景图和背景图进行融合,得到每帧目标图像。
具体地,电子设备可建立待处理图像、重建后的目标主体前景图和背景图三者之间的映射关系。接着,电子设备将重建后的具有映射关系的目标主体前景图和背景图进行融合拼接处理,得到每帧目标图像。类似地,重建后得到的每帧目标图像的分辨率大于对应的各帧待处理图像的第一分辨率。
操作710,根据每帧目标图像生成目标视频,该目标视频的分辨率大于该第一分辨率。
具体地,电子设备可将每帧目标图像按照各帧待处理图像的顺序融合叠加,得到高分辨率的视频,即目标视频。该目标视频的分辨率大于该第一分辨率,该目标视频中的每帧目标图像的分辨率均大于第一分辨率。
上述图像处理方法,应用于视频处理场景。通过获取第一分辨率的视频中的每帧待处理图像,识别该视频中的每帧待处理图像中的目标主体,得到每帧待处理图像中的目标主体前景图和背景图,分别对每帧待处理图像中的目标主体前景图和背景图进行超分辨率重建,将每帧待处理图像对应的重建后的目标主体前景图和背景图进行融合,得到每帧目标图像,根据每帧目标图像生成目标视频,该目标视频的分辨率大于该第一分辨率,能够将低分辨率的视频重建为高分辨率的视频。通过对目标主体前景图和背景图分别进行不同的超分辨率重建处理,能够提高对图像细节的处理效果。
在一个实施例中,如图8所示,该识别该待处理图像中的目标主体,包括:
操作802,生成与该待处理图像对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小。
其中,中心权重图是指用于记录待处理图像中各个像素点的权重值的图。中心权重图中记录的权重值从中心向四边逐渐减小,即中心权重最大,向四边权重逐渐减小。通过中心权重图表征待处理图像的图像中心像素点到图像边缘像素点的权重值逐渐减小。
ISP处理器或中央处理器可以根据待处理图像的大小生成对应的中心权重图。该中心权重图所表示的权重值从中心向四边逐渐减小。中心权重图可采用高斯函数、或采用一阶方程、或二阶方程生成。该高斯函数可为二维高斯函数。
操作804,将该待处理图像和该中心权重图输入到主体检测模型中,得到主体区域置信度图,其中,该主体检测模型是预先根据同一场景的待处理图像、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。
其中,主体检测模型是预先采集大量的训练数据,将训练数据输入到包含有初始网络权重的主体检测模型进行训练得到的。每组训练数据包括同一场景对应的待处理图像、中心权重图及已标注的主体掩膜图。其中,待处理图像和中心权重图作为训练的主体检测模型的输入,已标注的主体掩膜(mask)图作为训练的主体检测模型期望输出得到的真实值(ground truth)。主体掩膜图是用于识别图像中主体的图像滤镜模板,可以遮挡图像的其他部分,筛选出图像中的主体。主体检测模型可训练能够识别检测各种主体,如人、花、猫、狗、背景等。
具体地,ISP处理器或中央处理器可将该待处理图像和中心权重图输入到主体检测模型中,进行检测可以得到主体区域置信度图。主体区域置信度图是用于记录主体属于哪种能识别的主体的概率,例如某个像素点属于人的概率是0.8,花的概率是0.1,背景的概率是0.1。
操作806,根据该主体区域置信度图确定该待处理图像中的目标主体。
具体地,ISP处理器或中央处理器可根据主体区域置信度图选取置信度最高或次高等作为待处理图像中的主体,若存在一个主体,则将该主体作为目标主体;若存在多个主体,可根据需要选择其中一个或多个主体作为目标主体。
本实施例中的图像处理方法,获取待处理图像,并生成与待处理图像对应的中心权重图后,将待处理图像和中心权重图输入到对应的主体检测模型中检测,可以得到主体区域置信度图,根据主体区域置信度图可以确定得到待处理图像中的目标主体,利用中心权重图可以让图像中心的对象更容易被检测,利用训练好的利用待处理图像、中心权重图和主体掩膜图等训练得到的主体检测模型,可以更加准确的识别出待处理图像中的目标主体。
在一个实施例中,如图9所示,该根据该主体区域置信度图确定该待处理图像中的目标主体,包括:
操作902,对该主体区域置信度图进行处理,得到主体掩膜图。
具体地,主体区域置信度图中存在一些置信度较低、零散的点,可通过ISP处理器或中央处理器对主体区域置信度图进行过滤处理,得到主体掩膜图。该过滤处理可采用配置置信度阈值,将主体区域置信度图中置信度值低于置信度阈值的像素点过滤。该置信度阈值可采用自适应置信度阈值,也可以采用固定阈值,也可以采用分区域配置对应的阈值。
操作904,检测该待处理图像,确定该待处理图像中的高光区域。
其中,高光区域是指亮度值大于亮度阈值的区域。
具体地,ISP处理器或中央处理器对待处理图像进行高光检测,筛选得到亮度值大于亮度阈值的目标像素点,对目标像素点采用连通域处理得到高光区域。
操作906,根据该待处理图像中的高光区域与该主体掩膜图,确定该待处理图像中消除高光的目标主体。
具体地,ISP处理器或中央处理器可将待处理图像中的高光区域与该主体掩膜图做差分计算或逻辑与计算得到待处理图像中消除高光的目标主体。
本实施例中,对主体区域置信度图做过滤处理得到主体掩膜图,提高了主体区域置信度图的可靠性,对待处理图像进行检测得到高光区域,然后与主体掩膜图进行处理,可得到消除了高光的目标主体,针对影响主体识别精度的高光、高亮区域单独采用滤波器进行处理, 提高了主体识别的精度和准确性。
在一个实施例中,该对该主体区域置信度图进行处理,得到主体掩膜图,包括:对该主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图,该二值化掩膜图包括主体区域和背景区域;对该二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
具体地,ISP处理器或中央处理器将主体区域置信度图按照自适应置信度阈值过滤处理后,将保留的像素点的置信度值采用1表示,去掉的像素点的置信度值采用0表示,得到二值化掩膜图。
形态学处理可包括腐蚀和膨胀。可先对二值化掩膜图进行腐蚀操作,再进行膨胀操作,去除噪声;再对形态学处理后的二值化掩膜图进行引导滤波处理,实现边缘滤波操作,得到边缘提取的主体掩膜图。
通过形态学处理和引导滤波处理可以保证得到的主体掩膜图的噪点少或没有噪点,边缘更加柔和。
在一个实施例中,该二值化掩膜图包括主体区域和背景区域,该将重建后的目标主体前景图和背景图进行融合,得到目标图像,包括:将该重建后的目标主体前景图与该二值化掩膜图中的主体区域进行融合,将重建后的背景图和该二值化掩膜图中背景区域进行融合,得到目标图像。
具体地,二值化掩膜图中包括主体区域和背景区域,主体区域可为白色,背景区域可为黑色。电子设备将该重建后的目标主体前景图与该二值化掩膜图中的主体区域进行融合,即与黑色的部分进行融合,将重建后的背景图和该二值化掩膜图中背景区域进行融合,与黑色的部分进行融合,从而得到目标图像。
在一个实施例中,该方法还包括:获取与该待处理图像对应的深度图;该深度图包括TOF深度图、双目深度图和结构光深度图中至少一种;对该待处理图像和深度图进行配准处理,得到同一场景配准后的待处理图像和深度图。
其中,深度图是指包含深度信息的图。通过深度摄像头或双目摄像头拍摄同一场景得到对应的深度图。深度摄像头可为结构光摄像头或TOF摄像头。深度图可为结构光深度图、TOF深度图和双目深度图中的至少一种。
具体地,电子设备通过ISP处理器或中央处理器可通过摄像头拍摄同一场景得到待处理图像和对应的深度图,然后采用相机标定参数对待处理图像和深度图进行配准,得到配准后的待处理图像和深度图。
在其他实施例中,当无法拍摄得到深度图,可自动生成的仿真深度图。仿真深度图中的各个像素点的深度值可为预设值。此外,仿真深度图中的各个像素点的深度值可对应不同的预设值。
在一个实施例中,该将该待处理图像和该中心权重图输入到主体检测模型中,得到主体区域置信度图,包括:将该配准后的待处理图像、该深度图和该中心权重图输入到主体检测模型中,得到主体区域置信度图;其中,该主体检测模型是预先根据同一场景的待处理图像、深度图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。
其中,主体检测模型是预先采集大量的训练数据,将训练数据输入到包含有初始网络权重的主体检测模型进行训练得到的。每组训练数据包括同一场景对应的待处理图像、深度图、中心权重图及已标注的主体掩膜图。其中,待处理图像和中心权重图作为训练的主体检测模型的输入,已标注的主体掩膜图作为训练的主体检测模型期望输出得到的真实值。主体掩膜图是用于识别图像中主体的图像滤镜模板,可以遮挡图像的其他部分,筛选出图像中的主体。主体检测模型可训练能够识别检测各种主体,如人、花、猫、狗、背景等。
本实施例中,将深度图和中心权重图作为主体检测模型的输入,可以利用深度图的深度信息让距离摄像头更近的对象更容易被检测,利用中心权重图中中心权重大,四边权重小的中心注意力机制,让图像中心的对象更容易被检测,引入深度图实现对主体做深度特征增强,引入中心权重图对主体做中心注意力特征增强,不仅可以准确识别简单场景下的目标主体, 更大大提高了复杂场景下的主体识别准确度,引入深度图可以解决传统目标检测方法对自然图像千变万化的目标鲁棒性较差的问题。简单场景是指主体单一,背景区域对比度不高的场景。
图10为一个实施例中对待处理图像进行主体识别的效果示意图。如图10所示,待处理图像为RGB图1002,RGB图1002中存在一只蝴蝶,将RGB图输入到主体检测模型后得到主体区域置信度图1004,然后对主体区域置信度图1004进行滤波和二值化得到二值化掩膜图1006,再对二值化掩膜图1006进行形态学处理和引导滤波实现边缘增强,得到主体掩膜图1008。
在一个实施例中,提供了一种图像处理方法,包括:
操作(a1),获取第一分辨率的待处理图像。
操作(a2),生成与该待处理图像对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小。
操作(a3),将该待处理图像和该中心权重图输入到主体检测模型中,得到主体区域置信度图,其中,该主体检测模型是预先根据同一场景的待处理图像、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。
操作(a4),对该主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图,该二值化掩膜图包括主体区域和背景区域。
操作(a5),对该二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
操作(a6),检测该待处理图像,确定该待处理图像中的高光区域。
操作(a7),根据该待处理图像中的高光区域与该主体掩膜图,确定该待处理图像中消除高光的目标主体,得到目标主体前景图和背景图。
操作(a8),通过图像重建模型提取该目标主体前景图的特征,得到特征图,该图像重建模型是预先根据主体前景图样本对进行训练得到的模型,该主体前景图样本对中包括第一分辨率的主体前景图和第二分辨率的该主体前景图。
操作(a9),通过该图像重建模型对该特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,该第二分辨率大于该第一分辨率。
操作(a10),通过该插值算法对该背景图进行超分辨率重建,得到第三分辨率的背景图,该第三分辨率大于该第一分辨率。
操作(a11),将该第二分辨率的目标主体前景图和该第三分辨率的背景图调整为相应的尺寸。
操作(a12),将调整尺寸后的第二分辨率的目标主体前景图与该二值化掩膜图中的主体区域进行融合,将调整尺寸后的第三分辨率的背景图和该二值化掩膜图中背景区域进行融合,得到目标图像。
上述图像处理方法,通过主体检测模型对第一分辨率的待处理图像进行主体识别,可快速准确得到目标主体前景图和背景图。通过图像重建模型对目标主体前景图进行超分辨率重建处理,能够更好地处理目标主体前景图的细节,使得重建后的目标主体前景图细节更清晰。并通过插值算法对背景图进行超分辨率重建,在保证目标主体前景图的清晰度的同时兼顾到超分辨率重建的速度。将重建后的不同分辨率的目标主体前景图和背景图调整为相同尺寸,并与二值化掩膜图中相应的区域进行融合,得到目标图像。本方案解决了传统超分辨率重建时,图片的各个区域处理无差别,重建无法兼顾图像的细节和效率的情况。
如图11所示,为一个实施例中图像处理方法的架构图。电子设备将第一分辨率的待处理图像输入主体检测模型,得到目标主体前景图和背景图。通过级联残差网络构成的图像重建模型对目标主体前景图进行超分辨率重建处理,并通过插值算法对背景图进行超分辨率重建。将重建后的目标主体前景图和背景图进行融合,得到目标图像,该目标图像的分辨率大于第一分辨率。
应该理解的是,虽然图2-图9的流程图中的各个操作按照箭头的指示依次显示,但是这 些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,图2-图9中的至少一部分操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。
图12为一个实施例的图像处理装置的结构框图。如图12所示,包括:获取模块1202、识别模块1204、重建模块1206和融合模块1208。
获取模块1202,用于获取第一分辨率的待处理图像。
识别模块1204,用于识别该待处理图像中的目标主体,得到目标主体前景图和背景图。
重建模块1206,用于分别对该目标主体前景图和该背景图进行超分辨率重建。
融合模块1208,用于将重建后的目标主体前景图和背景图进行融合,得到目标图像,该目标图像的分辨率大于该第一分辨率。
上述图像处理装置,通过获取第一分辨率的待处理图像,识别待处理图像中的目标主体,得到目标主体前景图和背景图。分别对目标主体前景图和背景图进行超分辨率重建,可对目标主体前景图和背景图做不同的超分处理。将重建后的目标主体前景图和背景图进行融合,得到目标图像,目标图像的分辨率大于第一分辨率,使得可以兼顾图像的细节,提高图像重建的细节处理效果。
在一个实施例中,重建模块1206还用于:通过图像重建模型提取该目标主体前景图的特征,得到特征图,该图像重建模型是预先根据主体前景图样本对进行训练得到的模型,该主体前景图样本对中包括第一分辨率的主体前景图和第二分辨率的该主体前景图;通过该图像重建模型对特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,该第二分辨率大于该第一分辨率。
上述图像处理装置,通过使用训练好的图像重建模型提取该目标主体前景图的特征,得到特征图,通过该图像重建模型对特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,该第二分辨率大于该第一分辨率,能够针对目标主体前景图做局部的超分辨率重建处理,能够更好地处理目标主体前景图的细节,从而能够保证目标主体的清晰度。
在一个实施例中,重建模块1206还用于:通过该插值算法对背景图进行超分辨率重建,得到第三分辨率的背景图,该第三分辨率大于该第一分辨率;
融合模块1208还用于:将第二分辨率的目标主体前景图和第三分辨率的背景图调整为相应的尺寸;将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合,得到目标图像。
本实施例中的图像处理装置,通过该插值算法对背景图进行超分辨率重建,得到第三分辨率的背景图,将第二分辨率的目标主体前景图和第三分辨率的背景图调整为相应的尺寸,能够将不同分辨率及不同尺寸的图像调整为相同的尺寸。将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合,得到完整的重建图像,从而得到目标图像。
在一个实施例中,该图像处理方法应用于视频处理;该第一分辨率的待处理图像为第一分辨率的视频中的每帧待处理图像;
获取模块1202还用于:获取第一分辨率的视频中的每帧待处理图像。
该识别模块1204还用于:识别该视频中的每帧待处理图像中的目标主体,得到每帧待处理图像中的目标主体前景图和背景图。
重建模块1206还用于:分别对每帧待处理图像中的目标主体前景图和背景图进行超分辨率重建。
融合模块1208还用于:将每帧待处理图像对应的重建后的目标主体前景图和背景图进行融合,得到每帧目标图像;根据每帧目标图像生成目标视频,该目标视频的分辨率大于该第一分辨率。
上述图像处理装置,应用于视频处理场景。通过获取第一分辨率的视频中的每帧待处理 图像,识别该视频中的每帧待处理图像中的目标主体,得到每帧待处理图像中的目标主体前景图和背景图,分别对每帧待处理图像中的目标主体前景图和背景图进行超分辨率重建,将每帧待处理图像对应的重建后的目标主体前景图和背景图进行融合,得到每帧目标图像,根据每帧目标图像生成目标视频,该目标视频的分辨率大于该第一分辨率,能够将低分辨率的视频重建为高分辨率的视频。通过对目标主体前景图和背景图分别进行不同的超分辨率重建处理,能够提高对图像细节的处理效果。
在一个实施例中,识别模块1204还用于:生成与该待处理图像对应的中心权重图,其中,该中心权重图所表示的权重值从中心到边缘逐渐减小;将该待处理图像和该中心权重图输入到主体检测模型中,得到主体区域置信度图,其中,该主体检测模型是预先根据同一场景的待处理图像、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型;根据该主体区域置信度图确定该待处理图像中的目标主体。
本实施例中的图像处理装置,获取待处理图像,并生成与待处理图像对应的中心权重图后,将待处理图像和中心权重图输入到对应的主体检测模型中检测,可以得到主体区域置信度图,根据主体区域置信度图可以确定得到待处理图像中的目标主体,利用中心权重图可以让图像中心的对象更容易被检测,利用训练好的利用待处理图像、中心权重图和主体掩膜图等训练得到的主体检测模型,可以更加准确的识别出待处理图像中的目标主体。
在一个实施例中,识别模块1204还用于:对该主体区域置信度图进行处理,得到主体掩膜图;检测该待处理图像,确定该待处理图像中的高光区域;根据该待处理图像中的高光区域与该主体掩膜图,确定该待处理图像中消除高光的目标主体。
本实施例中,对主体区域置信度图做过滤处理得到主体掩膜图,提高了主体区域置信度图的可靠性,对待处理图像进行检测得到高光区域,然后与主体掩膜图进行处理,可得到消除了高光的目标主体,针对影响主体识别精度的高光、高亮区域单独采用滤波器进行处理,提高了主体识别的精度和准确性。
在一个实施例中,识别模块1204还用于:对该主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图,该二值化掩膜图包括主体区域和背景区域;对该二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图;
融合模块1208还用于:将该重建后的目标主体前景图与该二值化掩膜图中的主体区域进行融合,将重建后的背景图和该二值化掩膜图中背景区域进行融合,得到目标图像。
在一个实施例中,该获取模块1202还用于:获取与该待处理图像对应的深度图;该深度图包括TOF深度图、双目深度图和结构光深度图中至少一种;对该待处理图像和深度图进行配准处理,得到同一场景配准后的待处理图像和深度图。
识别模块1204还用于:将该配准后的待处理图像、该深度图和该中心权重图输入到主体检测模型中,得到主体区域置信度图;其中,该主体检测模型是预先根据同一场景的待处理图像、深度图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。
本实施例中,将深度图和中心权重图作为主体检测模型的输入,可以利用深度图的深度信息让距离摄像头更近的对象更容易被检测,利用中心权重图中中心权重大,四边权重小的中心注意力机制,让图像中心的对象更容易被检测,引入深度图实现对主体做深度特征增强,引入中心权重图对主体做中心注意力特征增强,不仅可以准确识别简单场景下的目标主体,更大大提高了复杂场景下的主体识别准确度,引入深度图可以解决传统目标检测方法对自然图像千变万化的目标鲁棒性较差的问题。简单场景是指主体单一,背景区域对比度不高的场景。
上述图像处理装置中各个模块的划分仅用于举例说明,在其他实施例中,可将图像处理装置按照需要划分为不同的模块,以完成上述图像处理装置的全部或部分功能。
图13为一个实施例中电子设备的内部结构示意图。如图13所示,该电子设备包括通过系统总线连接的处理器和存储器。其中,该处理器用于提供计算和控制能力,支撑整个电子设 备的运行。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种图像处理方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。该电子设备可以是手机、平板电脑或者个人数字助理或穿戴式设备等。
本申请实施例中提供的图像处理装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在终端或服务器上运行。该计算机程序构成的程序模块可存储在终端或服务器的存储器上。该计算机程序被处理器执行时,实现本申请实施例中所描述方法的操作。
本申请实施例还提供了一种计算机可读存储介质。一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当计算机可执行指令被一个或多个处理器执行时,使得处理器执行图像处理方法的操作。
一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行图像处理方法。
本申请实施例所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种图像处理方法,其特征在于,包括:获取第一分辨率的待处理图像;识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;分别对所述目标主体前景图和所述背景图进行超分辨率重建;及将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
- 根据权利要求1所述的方法,其特征在于,对所述目标主体前景图进行超分辨率重建,包括:通过图像重建模型提取所述目标主体前景图的特征,得到特征图,所述图像重建模型是预先根据主体前景图样本对进行训练得到的模型,所述主体前景图样本对中包括第一分辨率的主体前景图和第二分辨率的所述主体前景图;及通过所述图像重建模型对所述特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,所述第二分辨率大于所述第一分辨率。
- 根据权利要求2所述的方法,其特征在于,对所述背景图进行超分辨率重建,包括:通过插值算法对所述背景图进行超分辨率重建,得到第三分辨率的背景图,所述第三分辨率大于所述第一分辨率;所述将重建后的目标主体前景图和背景图进行融合,得到目标图像,包括:将所述第二分辨率的目标主体前景图和所述第三分辨率的背景图调整为相应的尺寸;及将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合,得到目标图像。
- 根据权利要求1所述的方法,其特征在于,所述图像处理方法应用于视频处理;所述第一分辨率的待处理图像为第一分辨率的视频中的每帧待处理图像;所述获取第一分辨率的待处理图像,包括:获取所述第一分辨率的视频中的每帧待处理图像;所述识别所述待处理图像中的目标主体,得到目标主体前景图和背景图,包括:识别所述视频中的每帧待处理图像中的目标主体,得到每帧待处理图像中的目标主体前景图和背景图;所述分别对所述目标主体前景图和所述背景图进行超分辨率重建,包括:分别对每帧待处理图像中的目标主体前景图和背景图进行超分辨率重建;所述将重建后的主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率,包括:将每帧待处理图像对应的重建后的目标主体前景图和背景图进行融合,得到每帧目标图像;及根据每帧目标图像生成目标视频,所述目标视频的分辨率大于所述第一分辨率。
- 根据权利要求1所述的方法,其特征在于,所述识别所述待处理图像中的目标主体,包括:生成与所述待处理图像对应的中心权重图,其中,所述中心权重图所表示的权重值从中心到边缘逐渐减小;将所述待处理图像和所述中心权重图输入到主体检测模型中,得到主体区域置信度图,其中,所述主体检测模型是预先根据同一场景的待处理图像、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型;及根据所述主体区域置信度图确定所述待处理图像中的目标主体。
- 根据权利要求5所述的方法,其特征在于,所述根据所述主体区域置信度图确定所述待处理图像中的目标主体,包括:对所述主体区域置信度图进行处理,得到主体掩膜图;检测所述待处理图像,确定所述待处理图像中的高光区域;及根据所述待处理图像中的高光区域与所述主体掩膜图,确定所述待处理图像中消除高光的目标主体。
- 根据权利要求6所述的方法,其特征在于,其特征在于,所述对所述主体区域置信度图进行处理,得到主体掩膜图,包括:对所述主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图,所述二值化掩膜图包括主体区域和背景区域;及对所述二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
- 根据权利要求7所述的方法,其特征在于,其特征在于,所述将重建后的目标主体前景图和背景图进行融合,得到目标图像,包括:将重建后的目标主体前景图与所述二值化掩膜图中的主体区域进行融合,将重建后的背景图和所述二值化掩膜图中背景区域进行融合,得到目标图像。
- 根据权利要求5所述的方法,其特征在于,所述方法还包括:获取与所述待处理图像对应的深度图;所述深度图包括TOF深度图、双目深度图和结构光深度图中至少一种;对所述待处理图像和深度图进行配准处理,得到同一场景配准后的待处理图像和深度图;所述将所述待处理图像和所述中心权重图输入到主体检测模型中,得到主体区域置信度图,包括:将所述配准后的待处理图像、所述深度图和所述中心权重图输入到主体检测模型中,得到主体区域置信度图;其中,所述主体检测模型是预先根据同一场景的待处理图像、深度图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。
- 一种图像处理装置,其特征在于,包括:获取模块,用于获取第一分辨率的待处理图像;识别模块,用于识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;重建模块,用于分别对所述目标主体前景图和所述背景图进行超分辨率重建;融合模块,用于将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
- 一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:获取第一分辨率的待处理图像;识别所述待处理图像中的目标主体,得到目标主体前景图和背景图;分别对所述目标主体前景图和所述背景图进行超分辨率重建;及将重建后的目标主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率。
- 根据权利要求11所述的电子设备,其特征在于,所述处理器执行对所述目标主体前景图进行超分辨率重建的步骤时,还执行如下步骤:通过图像重建模型提取所述目标主体前景图的特征,得到特征图,所述图像重建模型是预先根据主体前景图样本对进行训练得到的模型,所述主体前景图样本对中包括第一分辨率的主体前景图和第二分辨率的所述主体前景图;及通过所述图像重建模型对所述特征图进行超分辨率处理,得到第二分辨率的目标主体前景图,所述第二分辨率大于所述第一分辨率。
- 根据权利要求12所述的电子设备,其特征在于,所述处理器执行对所述背景图进行超分辨率重建的步骤时,还执行如下步骤:通过插值算法对所述背景图进行超分辨率重建,得到第三分辨率的背景图,所述第三分辨率大于所述第一分辨率;所述处理器执行所述将重建后的目标主体前景图和背景图进行融合,得到目标图像的步骤时,还执行如下步骤:将所述第二分辨率的目标主体前景图和所述第三分辨率的背景图调整为相应的尺寸;及将调整尺寸后的第二分辨率的目标主体前景图和第三分辨率的背景图进行融合,得到目标图像。
- 根据权利要求11所述的电子设备,其特征在于,所述电子设备应用于视频处理;所述第一分辨率的待处理图像为第一分辨率的视频中的每帧待处理图像;所述处理器执行所述获取第一分辨率的待处理图像的步骤时,还执行如下步骤:获取所述第一分辨率的视频中的每帧待处理图像;所述处理器执行所述识别所述待处理图像中的目标主体,得到目标主体前景图和背景图的步骤时,还执行如下步骤:识别所述视频中的每帧待处理图像中的目标主体,得到每帧待处理图像中的目标主体前景图和背景图;所述处理器执行所述分别对所述目标主体前景图和所述背景图进行超分辨率重建的步骤时,还执行如下步骤:分别对每帧待处理图像中的目标主体前景图和背景图进行超分辨率重建;所述处理器执行所述将重建后的主体前景图和背景图进行融合,得到目标图像,所述目标图像的分辨率大于所述第一分辨率的步骤时,还执行如下步骤:将每帧待处理图像对应的重建后的目标主体前景图和背景图进行融合,得到每帧目标图像;及根据每帧目标图像生成目标视频,所述目标视频的分辨率大于所述第一分辨率。
- 根据权利要求11所述的电子设备,其特征在于,所述处理器执行所述识别所述待处理图像中的目标主体的步骤时,还执行如下步骤:生成与所述待处理图像对应的中心权重图,其中,所述中心权重图所表示的权重值从中心到边缘逐渐减小;将所述待处理图像和所述中心权重图输入到主体检测模型中,得到主体区域置信度图,其中,所述主体检测模型是预先根据同一场景的待处理图像、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型;及根据所述主体区域置信度图确定所述待处理图像中的目标主体。
- 根据权利要求15所述的电子设备,其特征在于,所述处理器执行所述根据所述主体区域置信度图确定所述待处理图像中的目标主体的步骤时,还执行如下步骤:对所述主体区域置信度图进行处理,得到主体掩膜图;检测所述待处理图像,确定所述待处理图像中的高光区域;及根据所述待处理图像中的高光区域与所述主体掩膜图,确定所述待处理图像中消除高光的目标主体。
- 根据权利要求16所述的电子设备,其特征在于,所述处理器执行所述对所述主体区域置信度图进行处理,得到主体掩膜图的步骤时,还执行如下步骤:对所述主体区域置信度图进行自适应置信度阈值过滤处理,得到二值化掩膜图,所述二值化掩膜图包括主体区域和背景区域;及对所述二值化掩膜图进行形态学处理和引导滤波处理,得到主体掩膜图。
- 根据权利要求17所述的电子设备,其特征在于,所述处理器执行所述将重建后的目标主体前景图和背景图进行融合,得到目标图像的步骤时,还执行如下步骤:将重建后的目标主体前景图与所述二值化掩膜图中的主体区域进行融合,将重建后的背景图和所述二值化掩膜图中背景区域进行融合,得到目标图像。
- 根据权利要求15所述的电子设备,其特征在于,所述计算机程序被所述处理器执行时,还执行如下步骤:获取与所述待处理图像对应的深度图;所述深度图包括TOF深度图、双目深度图和结构光深度图中至少一种;对所述待处理图像和深度图进行配准处理,得到同一场景配准后的待处理图像和深度图;所述处理器执行所述将所述待处理图像和所述中心权重图输入到主体检测模型中,得到主体区域置信度图的步骤时,还执行如下步骤:将所述配准后的待处理图像、所述深度图和所述中心权重图输入到主体检测模型中,得到主体区域置信度图;其中,所述主体检测模型是预先根据同一场景的待处理图像、深度图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至9中任一项所述的图像处理方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910683492.1 | 2019-07-26 | ||
CN201910683492.1A CN110428366B (zh) | 2019-07-26 | 2019-07-26 | 图像处理方法和装置、电子设备、计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021017811A1 true WO2021017811A1 (zh) | 2021-02-04 |
Family
ID=68412750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/101817 WO2021017811A1 (zh) | 2019-07-26 | 2020-07-14 | 图像处理方法和装置、电子设备、计算机可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110428366B (zh) |
WO (1) | WO2021017811A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113362224A (zh) * | 2021-05-31 | 2021-09-07 | 维沃移动通信有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN114429664A (zh) * | 2022-01-29 | 2022-05-03 | 脸萌有限公司 | 视频生成方法以及视频生成模型的训练方法 |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428366B (zh) * | 2019-07-26 | 2023-10-13 | Oppo广东移动通信有限公司 | 图像处理方法和装置、电子设备、计算机可读存储介质 |
CN111047526B (zh) * | 2019-11-22 | 2023-09-26 | 北京达佳互联信息技术有限公司 | 一种图像处理方法、装置、电子设备及存储介质 |
CN112313944A (zh) * | 2019-11-28 | 2021-02-02 | 深圳市大疆创新科技有限公司 | 图像处理方法、装置、设备及存储介质 |
CN111091506A (zh) * | 2019-12-02 | 2020-05-01 | RealMe重庆移动通信有限公司 | 图像处理方法及装置、存储介质、电子设备 |
CN111161369B (zh) * | 2019-12-20 | 2024-04-23 | 上海联影智能医疗科技有限公司 | 图像重建存储方法、装置、计算机设备和存储介质 |
CN111145202B (zh) * | 2019-12-31 | 2024-03-08 | 北京奇艺世纪科技有限公司 | 模型生成方法、图像处理方法、装置、设备及存储介质 |
CN111163265A (zh) * | 2019-12-31 | 2020-05-15 | 成都旷视金智科技有限公司 | 图像处理方法、装置、移动终端及计算机存储介质 |
JP2021170284A (ja) * | 2020-04-17 | 2021-10-28 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及びプログラム |
CN111598776B (zh) * | 2020-04-29 | 2023-06-30 | Oppo广东移动通信有限公司 | 图像处理方法、图像处理装置、存储介质与电子设备 |
CN111553846B (zh) * | 2020-05-12 | 2023-05-26 | Oppo广东移动通信有限公司 | 超分辨率处理方法及装置 |
CN113763239A (zh) * | 2020-06-03 | 2021-12-07 | 广州虎牙科技有限公司 | 图像重建方法、装置、服务器及存储介质 |
WO2022011657A1 (zh) * | 2020-07-16 | 2022-01-20 | Oppo广东移动通信有限公司 | 图像处理方法及装置、电子设备及计算机可读存储介质 |
CN112001940B (zh) * | 2020-08-21 | 2023-04-07 | Oppo(重庆)智能科技有限公司 | 图像处理方法及装置、终端及可读存储介质 |
CN112085686A (zh) * | 2020-08-21 | 2020-12-15 | 北京迈格威科技有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN111932594B (zh) * | 2020-09-18 | 2023-12-19 | 西安拙河安见信息科技有限公司 | 一种基于光流的十亿像素视频对齐方法及装置、介质 |
CN112184554B (zh) * | 2020-10-13 | 2022-08-23 | 重庆邮电大学 | 一种基于残差混合膨胀卷积的遥感图像融合方法 |
CN112381717A (zh) * | 2020-11-18 | 2021-02-19 | 北京字节跳动网络技术有限公司 | 图像处理方法、模型训练方法、装置、介质及设备 |
CN112418167A (zh) * | 2020-12-10 | 2021-02-26 | 深圳前海微众银行股份有限公司 | 图像的聚类方法、装置、设备和存储介质 |
CN113066005A (zh) * | 2021-04-25 | 2021-07-02 | 广州虎牙科技有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN113240687A (zh) * | 2021-05-17 | 2021-08-10 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备和可读存储介质 |
CN114049254B (zh) * | 2021-10-29 | 2022-11-29 | 华南农业大学 | 低像素牛头图像重建识别方法、系统、设备及存储介质 |
CN114067122B (zh) * | 2022-01-18 | 2022-04-08 | 深圳市绿洲光生物技术有限公司 | 一种两级式二值化图像处理方法 |
CN114419120A (zh) * | 2022-01-26 | 2022-04-29 | Oppo广东移动通信有限公司 | 图像处理方法及装置、计算机可读存储介质和电子设备 |
CN114630129A (zh) * | 2022-02-07 | 2022-06-14 | 浙江智慧视频安防创新中心有限公司 | 一种基于智能数字视网膜的视频编解码方法和装置 |
CN114972020A (zh) * | 2022-04-13 | 2022-08-30 | 北京字节跳动网络技术有限公司 | 一种图像处理方法、装置、存储介质及电子设备 |
CN117440104B (zh) * | 2023-12-21 | 2024-03-29 | 北京遥感设备研究所 | 一种基于目标显著性特征的数据压缩重建方法 |
CN118590597A (zh) * | 2024-07-30 | 2024-09-03 | 深圳天健电子科技有限公司 | 一种基于人工智能技术的监控图像生成方法及装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102800085A (zh) * | 2012-06-21 | 2012-11-28 | 西南交通大学 | 一种在复杂图像中实现对主体目标图像检测及提取的方法 |
CN102842119A (zh) * | 2012-08-18 | 2012-12-26 | 湖南大学 | 一种基于抠像和边缘增强的快速文本图像超分辨率方法 |
US20140105484A1 (en) * | 2012-10-16 | 2014-04-17 | Samsung Electronics Co., Ltd. | Apparatus and method for reconstructing super-resolution three-dimensional image from depth image |
CN105741252A (zh) * | 2015-11-17 | 2016-07-06 | 西安电子科技大学 | 基于稀疏表示与字典学习的视频图像分级重建方法 |
US20160328828A1 (en) * | 2014-02-25 | 2016-11-10 | Graduate School At Shenzhen, Tsinghua University | Depth map super-resolution processing method |
CN110428366A (zh) * | 2019-07-26 | 2019-11-08 | Oppo广东移动通信有限公司 | 图像处理方法和装置、电子设备、计算机可读存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6941011B2 (en) * | 2002-06-27 | 2005-09-06 | Hewlett-Packard Development Company, Lp. | Method and system for image processing including mixed resolution, multi-channel color compression, transmission and decompression |
US10692220B2 (en) * | 2017-10-18 | 2020-06-23 | International Business Machines Corporation | Object classification based on decoupling a background from a foreground of an image |
CN108764370B (zh) * | 2018-06-08 | 2021-03-12 | Oppo广东移动通信有限公司 | 图像处理方法、装置、计算机可读存储介质和计算机设备 |
-
2019
- 2019-07-26 CN CN201910683492.1A patent/CN110428366B/zh active Active
-
2020
- 2020-07-14 WO PCT/CN2020/101817 patent/WO2021017811A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102800085A (zh) * | 2012-06-21 | 2012-11-28 | 西南交通大学 | 一种在复杂图像中实现对主体目标图像检测及提取的方法 |
CN102842119A (zh) * | 2012-08-18 | 2012-12-26 | 湖南大学 | 一种基于抠像和边缘增强的快速文本图像超分辨率方法 |
US20140105484A1 (en) * | 2012-10-16 | 2014-04-17 | Samsung Electronics Co., Ltd. | Apparatus and method for reconstructing super-resolution three-dimensional image from depth image |
US20160328828A1 (en) * | 2014-02-25 | 2016-11-10 | Graduate School At Shenzhen, Tsinghua University | Depth map super-resolution processing method |
CN105741252A (zh) * | 2015-11-17 | 2016-07-06 | 西安电子科技大学 | 基于稀疏表示与字典学习的视频图像分级重建方法 |
CN110428366A (zh) * | 2019-07-26 | 2019-11-08 | Oppo广东移动通信有限公司 | 图像处理方法和装置、电子设备、计算机可读存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113362224A (zh) * | 2021-05-31 | 2021-09-07 | 维沃移动通信有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN114429664A (zh) * | 2022-01-29 | 2022-05-03 | 脸萌有限公司 | 视频生成方法以及视频生成模型的训练方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110428366B (zh) | 2023-10-13 |
CN110428366A (zh) | 2019-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021017811A1 (zh) | 图像处理方法和装置、电子设备、计算机可读存储介质 | |
WO2021022983A1 (zh) | 图像处理方法和装置、电子设备、计算机可读存储介质 | |
US11457138B2 (en) | Method and device for image processing, method for training object detection model | |
US11704775B2 (en) | Bright spot removal using a neural network | |
WO2020259179A1 (zh) | 对焦方法、电子设备和计算机可读存储介质 | |
US10645368B1 (en) | Method and apparatus for estimating depth of field information | |
EP3937481A1 (en) | Image display method and device | |
CN108012080B (zh) | 图像处理方法、装置、电子设备及计算机可读存储介质 | |
CN110248096B (zh) | 对焦方法和装置、电子设备、计算机可读存储介质 | |
WO2021057474A1 (zh) | 主体对焦方法、装置、电子设备和存储介质 | |
WO2020152521A1 (en) | Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures | |
CN107862658B (zh) | 图像处理方法、装置、计算机可读存储介质和电子设备 | |
CN110276831B (zh) | 三维模型的建构方法和装置、设备、计算机可读存储介质 | |
US20220222830A1 (en) | Subject detecting method and device, electronic device, and non-transitory computer-readable storage medium | |
CN110349163B (zh) | 图像处理方法和装置、电子设备、计算机可读存储介质 | |
WO2019015477A1 (zh) | 图像矫正方法、计算机可读存储介质和计算机设备 | |
WO2019105304A1 (zh) | 图像白平衡处理方法、计算机可读存储介质和电子设备 | |
WO2022134718A1 (zh) | 图像处理方法、芯片及电子装置 | |
EP4139840A2 (en) | Joint objects image signal processing in temporal domain | |
CN107578372B (zh) | 图像处理方法、装置、计算机可读存储介质和电子设备 | |
CN110365897B (zh) | 图像修正方法和装置、电子设备、计算机可读存储介质 | |
CN109118427B (zh) | 图像光效处理方法和装置、电子设备、存储介质 | |
CN107770446B (zh) | 图像处理方法、装置、计算机可读存储介质和电子设备 | |
CN110992284A (zh) | 图像处理方法、图像处理装置、电子设备和计算机可读存储介质 | |
WO2022127491A1 (zh) | 图像处理方法及装置、存储介质、终端 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20847820 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20847820 Country of ref document: EP Kind code of ref document: A1 |