CN110428366B

CN110428366B - Image processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110428366B
Application number: CN201910683492.1A
Authority: CN
Inventors: 卓海杰
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2023-10-13
Anticipated expiration: 2039-07-26
Also published as: WO2021017811A1; CN110428366A

Abstract

The present application relates to an image processing method and apparatus, an electronic device, a computer-readable storage medium, the image processing method comprising: acquiring an image to be processed with a first resolution; identifying a target subject in the image to be processed to obtain a foreground image and a background image of the target subject; respectively carrying out super-resolution reconstruction on the target main body foreground image and the background image; and fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the resolution of the target image is larger than the first resolution, and the detail processing effect of image reconstruction can be improved.

Description

Image processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of video, and in particular, to an image processing method, an image processing device, an electronic device, and a computer readable storage medium.

Background

The super-resolution reconstruction technology aims to reconstruct a high-resolution image from a low-resolution image so that the reconstructed image is clearer. Some low resolution images can be reconstructed by super resolution reconstruction to achieve the effect desired by the user. The traditional super-resolution reconstruction technology generally performs unified super-resolution reconstruction processing on the whole image, and each area of the reconstructed image is indiscriminate and cannot give consideration to details of the image.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a computer readable storage medium, which can improve the detail processing effect of image reconstruction.

An image processing method, comprising:

acquiring an image to be processed with a first resolution;

identifying a target subject in the image to be processed to obtain a foreground image and a background image of the target subject;

respectively carrying out super-resolution reconstruction on the target main body foreground image and the background image;

and fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the resolution of the target image is larger than the first resolution.

An image processing apparatus comprising:

the acquisition module is used for acquiring the image to be processed with the first resolution;

the identification module is used for identifying a target main body in the image to be processed to obtain a target main body foreground image and a background image;

the reconstruction module is used for performing super-resolution reconstruction on the target main body foreground image and the background image respectively;

and the fusion module is used for fusing the reconstructed foreground image and background image of the target main body to obtain a target image, and the resolution of the target image is larger than the first resolution.

According to the image processing method and device, the electronic equipment and the computer readable storage medium, the target main body in the image to be processed is identified by acquiring the image to be processed with the first resolution, so that the target main body foreground image and the background image are obtained, super-resolution reconstruction is respectively carried out on the target main body foreground image and the background image, the reconstructed target main body foreground image and background image are fused, the target image is obtained, the resolution of the target image is larger than the first resolution, details of the image can be considered, and the detail processing effect of image reconstruction is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of the internal architecture of an electronic device in one embodiment;

FIG. 2 is a flow chart of an image processing method in one embodiment;

FIG. 3 is a schematic diagram of an image reconstruction model in one embodiment;

FIG. 4 is a block diagram of a cascade block in one embodiment;

FIG. 5 is a block diagram of a cascade block in another embodiment;

FIG. 6 is a flow diagram of super-resolution reconstruction of a background map in one embodiment;

FIG. 7 is a flow chart of an image processing method applied to a video processing scene in one embodiment;

FIG. 8 is a flow chart of identifying a target subject in the image to be processed in one embodiment;

FIG. 9 is a flow diagram of determining a target subject in an image to be processed according to a subject region confidence map, in one embodiment;

FIG. 10 is a schematic diagram showing the effect of subject recognition on an image to be processed in one embodiment;

FIG. 11 is a block diagram of an image processing method in one embodiment;

FIG. 12 is a block diagram showing the structure of an image processing apparatus in one embodiment;

fig. 13 is a schematic diagram showing an internal structure of an electronic device in another embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The image processing method in the embodiment of the application can be applied to electronic equipment. The electronic device may be a computer device with a camera, a personal digital assistant, a tablet computer, a smart phone, a wearable device, etc. When the camera in the electronic equipment shoots an image, automatic focusing can be carried out so as to ensure that the shot image is clear.

In one embodiment, the electronic device may include image processing circuitry, which may be implemented using hardware and/or software components, and may include various processing units defining an ISP (Image Signal Processing ) pipeline. FIG. 1 is a schematic diagram of an image processing circuit in one embodiment. As shown in fig. 1, for convenience of explanation, only aspects of the image processing technology related to the embodiment of the present application are shown.

As shown in fig. 1, the image processing circuit includes a first ISP processor 130, a second ISP processor 140, and a control logic 150. The first camera 110 includes one or more first lenses 112 and a first image sensor 114. The first image sensor 114 may include a color filter array (e.g., bayer filter), and the first image sensor 114 may obtain light intensity and wavelength information captured with each imaging pixel of the first image sensor 114 and provide a set of image data that may be processed by the first ISP processor 130. The second camera 120 includes one or more second lenses 122 and a second image sensor 124. The second image sensor 124 may include a color filter array (e.g., bayer filter), and the second image sensor 124 may obtain light intensity and wavelength information captured with each imaging pixel of the second image sensor 124 and provide a set of image data that may be processed by the second ISP processor 140.

The first image collected by the first camera 110 is transmitted to the first ISP processor 130 for processing, after the first ISP processor 130 processes the first image, statistical data of the first image (such as brightness of the image, contrast value of the image, color of the image, etc.) may be sent to the control logic 150, and the control logic 150 may determine the control parameters of the first camera 110 according to the statistical data, so that the first camera 110 may perform operations such as auto-focusing and auto-exposure according to the control parameters. The first image may be stored in the image memory 160 after being processed by the first ISP processor 130, and the first ISP processor 130 may also read the image stored in the image memory 160 to process the first image. In addition, the first image may be processed by the ISP processor 130 and then directly sent to the display 170 for display, or the display 170 may read the image in the image memory 160 for display.

Wherein the first ISP processor 130 processes the image data pixel by pixel in a plurality of formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the first ISP processor 130 may perform one or more image processing operations on the image data, collecting statistical information about the image data. Wherein the image processing operations may be performed with the same or different bit depth precision.

Image memory 160 may be part of a memory device, a storage device, or a separate dedicated memory within an electronic device, and may include DMA (Direct Memory Access ) features.

Upon receiving the interface from the first image sensor 114, the first ISP processor 130 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 160 for additional processing before being displayed. The first ISP processor 130 receives the processing data from the image memory 160 and performs image data processing in RGB and YCbCr color spaces on the processing data. The image data processed by the first ISP processor 130 may be output to a display 170 for viewing by a user and/or further processing by a graphics engine or GPU (Graphics Processing Unit, graphics processor). In addition, the output of the first ISP processor 130 may also be sent to the image memory 160, and the display 170 may read image data from the image memory 160. In one embodiment, image memory 160 may be configured to implement one or more frame buffers.

The statistics determined by the first ISP processor 130 may be sent to the control logic 150. For example, the statistics may include first image sensor 114 statistics such as auto-exposure, auto-white balance, auto-focus, flicker detection, black level compensation, first lens 112 shading correction, and the like. The control logic 150 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware) that may determine control parameters of the first camera 110 and control parameters of the first ISP processor 130 based on the received statistics. For example, the control parameters of the first camera 110 may include gain, integration time of exposure control, anti-shake parameters, flash control parameters, first lens 112 control parameters (e.g., focal length for focusing or zooming), combinations of these parameters, or the like. The ISP control parameters may include gain levels and color correction matrices for automatic white balancing and color adjustment (e.g., during RGB processing), as well as first lens 112 shading correction parameters.

Similarly, the second image collected by the second camera 120 is transmitted to the second ISP processor 140 for processing, and after the second ISP processor 140 processes the first image, statistical data (such as brightness of the image, contrast value of the image, color of the image, etc.) of the second image may be sent to the control logic 150, and the control logic 150 may determine the control parameters of the second camera 120 according to the statistical data, so that the second camera 120 may perform operations such as auto-focusing and auto-exposure according to the control parameters. The second image may be stored in the image memory 160 after being processed by the second ISP processor 140, and the second ISP processor 140 may also read the image stored in the image memory 160 to process the second image. In addition, the second image may be processed by the ISP processor 140 and then directly sent to the display 170 for display, or the display 170 may read the image in the image memory 160 for display. The second camera 120 and the second ISP processor 140 may also implement the processing as described for the first camera 110 and the first ISP processor 130.

In one embodiment, the first camera 110 may be a color camera and the second camera 120 may be a TOF (Time Of Flight) camera or a structured light camera. TOF depth map can be obtained by TOF camera, and structured light depth map can be obtained by structured light camera. The first camera 110 and the second camera 120 may be color cameras. And obtaining binocular depth maps through two color cameras. The first ISP processor 130 and the second ISP processor 140 may be the same ISP processor.

The first camera 110 and the second camera 120 acquire the same scene to obtain a to-be-processed image and a depth map with a first resolution respectively, and the to-be-processed image and the depth map with the first resolution are sent to the ISP processor. The ISP processor can register the image to be processed with the first resolution ratio and the depth map according to camera calibration parameters, and the vision is kept completely consistent; then, a center weight graph corresponding to the image to be processed with the first resolution is generated, wherein the weight value represented by the center weight graph is gradually reduced from the center to the edge; inputting the image to be processed with the first resolution and the center weight map into a trained main body detection model to obtain a main body region confidence map, and determining a target main body in the image to be processed with the first resolution according to the main body region confidence map; the image to be processed with the first resolution, the depth map and the center weight map can be input into a trained main body detection model to obtain a main body region confidence map, and then a target main body in the image to be processed with the first resolution is determined according to the main body region confidence map to obtain a target main body foreground map and a target main body background map. Then, the electronic equipment respectively carries out super-resolution reconstruction on the target main body foreground image and the background image, and fuses the reconstructed target main body foreground image and background image to obtain a target image, wherein the resolution of the target image is larger than the first resolution, so that the detail processing effect of the target main body can be improved, and meanwhile, the detail processing effect of image reconstruction can also be improved.

FIG. 2 is a flow chart of an image processing method in one embodiment. The image processing method in this embodiment will be described by taking the terminal or the server in fig. 1 as an example. As shown in fig. 2, the image processing method includes:

step 202, a to-be-processed image with a first resolution is acquired.

The first resolution refers to image resolution, the image resolution refers to the information amount stored in the image, and the image resolution refers to the number of pixel points in each inch of the image. The image to be processed can be obtained by shooting any scene through a camera, and can be a color image or a black-and-white image. The image to be processed can be stored locally by the electronic device, can be stored by other devices, can be stored from a network, can be photographed by the electronic device in real time, and is not limited to the above.

Specifically, the ISP processor or the central processor of the electronic device may obtain the image to be processed with the first resolution from a local or other device or a network, or obtain the image to be processed by shooting a scene with the first resolution through the camera.

And 204, identifying a target subject in the image to be processed to obtain a foreground image and a background image of the target subject.

Wherein the subject refers to various subjects such as humans, flowers, cats, dogs, cows, blue sky, clouds, background, etc. The target subject refers to a required subject, and can be selected according to requirements. Subject detection (salient object detection) refers to automatically processing regions of interest while selectively ignoring regions of no interest when facing a scene. The region of interest is referred to as the subject region. The target subject foreground image refers to an image of a target subject region in the image to be processed, and the background image refers to images of the rest regions except the target subject region in the image to be processed.

Specifically, the electronic device may input the image to be processed into a subject detection model, identify a target subject in the image to be processed through the subject detection model, and segment the image to be processed into a target subject foreground map and a background map. Further, the segmented binarized mask map may be output by a subject detection model.

And 206, respectively carrying out super-resolution reconstruction on the foreground image and the background image of the target main body.

The super-resolution reconstruction refers to reconstructing a low-resolution image or an image sequence to obtain a high-resolution image.

Specifically, after the electronic device obtains the target subject foreground map with the first resolution and the background map with the first resolution through the subject identification model, the target subject foreground map can be input into the image reconstruction model. And carrying out super-resolution reconstruction on the target main body foreground image through an image reconstruction model to obtain a reconstructed high-resolution target main body foreground image. And, the reconstructed target subject foreground map has a resolution greater than the first resolution. Then, the electronic device can reconstruct the background image with the first resolution in a super-resolution mode through a rapid super-resolution algorithm or an interpolation algorithm and the like, and a reconstructed background image with high resolution is obtained. And, the reconstructed background map has a resolution greater than the first resolution.

In this embodiment, the resolution of the reconstructed foreground image and the resolution of the background image of the target subject may be the same resolution or may be different resolutions.

And step 208, fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the resolution of the target image is larger than the first resolution.

Specifically, the electronic device performs fusion and splicing processing on the reconstructed foreground image and background image of the target main body, and the fused and spliced image is the target image. Likewise, the resolution of the reconstructed target image is greater than the first resolution of the image to be processed.

According to the image processing method, the target main body in the image to be processed is identified by acquiring the image to be processed with the first resolution, and the foreground image and the background image of the target main body are obtained. And respectively carrying out super-resolution reconstruction on the foreground image and the background image of the target main body, and carrying out different super-division processing on the foreground image and the background image of the target main body. And fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the resolution of the target image is larger than the first resolution, so that details of the image can be considered, and the detail processing effect of image reconstruction is improved.

In one embodiment, performing super-resolution reconstruction on the target subject foreground map includes: extracting features of the target main body foreground image through an image reconstruction model to obtain a feature image, wherein the image reconstruction model is a model which is obtained by training according to a main body foreground image pair in advance, and the main body foreground image pair comprises a main body foreground image with a first resolution and the main body foreground image with a second resolution; and performing super-resolution processing on the feature map through the image reconstruction model to obtain a target subject foreground map with second resolution, wherein the second resolution is larger than the first resolution.

The feature map refers to an image obtained by extracting features of an image to be processed.

Specifically, the electronic device may collect a plurality of subject foreground pattern pairs in advance, where each subject foreground pattern pair includes a subject foreground map of a first resolution and the subject foreground map of a second resolution. Inputting the main body foreground image with the first resolution into an untrained image reconstruction model for super-resolution reconstruction, comparing the main body foreground image output by the image reconstruction model with the main body foreground image with the second resolution, and adjusting the image reconstruction model according to the difference. And (3) training and adjusting repeatedly until the difference between the main body foreground image reconstructed by the image reconstruction model and the main body foreground image with the second resolution is smaller than a threshold value, and stopping training.

The electronic equipment inputs the target main body foreground image into a trained image reconstruction model, and the image reconstruction model can perform feature extraction on the target main body foreground image through a convolution layer to obtain a feature image corresponding to the target main body foreground image. And converting the channel information of the feature map into spatial information through the image reconstruction model to obtain a target subject foreground map with a second resolution, wherein the second resolution is larger than the first resolution.

According to the image processing method, the features of the target main body foreground image are extracted through the trained image reconstruction model, the feature image is obtained, super-resolution processing is conducted on the feature image through the image reconstruction model, the target main body foreground image with second resolution is obtained, the second resolution is larger than the first resolution, local super-resolution reconstruction processing can be conducted on the target main body foreground image, details of the target main body foreground image can be processed better, and accordingly definition of the target main body can be guaranteed.

As shown in fig. 3, a schematic diagram of an image reconstruction model in one embodiment. The image reconstruction model includes a convolution layer, a nonlinear mapping layer, and an upsampling layer. Residual units (Residual) in the nonlinear mapping layer are sequentially cascaded with the first convolution layer to obtain a cascade Block (cascade Block). The nonlinear mapping layer comprises a plurality of cascade blocks, and the cascade blocks and the second convolution layer are sequentially cascaded to form the nonlinear mapping layer. I.e. the arrow in fig. 3 is called global cascade connection. The nonlinear mapping layer is connected with the up-sampling layer, and the up-sampling layer converts channel information of the image into spatial information and outputs a high-resolution image.

And the electronic equipment inputs the target subject foreground image with the first resolution into a convolution layer of the image reconstruction model to perform feature extraction, so as to obtain a feature image. The feature map is input into a nonlinear mapping layer of an image reconstruction model, output is obtained through processing of a first cascade block, the feature map output by the convolution layer is spliced with the output of the first cascade block, and the spliced feature map is input into the first convolution layer for dimension reduction processing. And then inputting the feature map after the dimension reduction into a second cascade block for processing, splicing the feature map output by the convolution layer, the output of the first cascade block and the output of the second cascade block, and inputting the spliced feature map and the output of the first cascade block into a second first convolution layer for dimension reduction processing. Similarly, after the output of the Nth cascade block is obtained, the output of each cascade block before the Nth cascade block and the characteristic diagram of the output of the convolution layer are spliced, and after the splicing, the Nth first convolution layer is input to carry out dimension reduction processing until the output of the last first convolution layer in the nonlinear mapping layer is obtained. The first convolution layer in this embodiment may be a 1 x 1 point convolution.

The residual characteristic diagram output by the nonlinear mapping layer is input to an upsampling layer, the upsampling layer converts the residual characteristic diagram channel information into spatial information, for example, the super division multiplying power is multiplied by 4, the characteristic diagram channel input to the upsampling layer must be multiplied by 16 multiplied by 3, the channel information is converted into the spatial information after passing through the upsampling layer, namely, the upsampling layer finally outputs a three-channel color diagram with the size of 4 times.

In one embodiment, each cascade block has a structure as shown in fig. 4, and one cascade block includes three residual units and three first convolution layers, where the residual units are sequentially cascaded with the first convolution layers. The residual units are connected together through local cascade connection, and the local cascade connection function is the same as the global cascade connection function. And taking the characteristic diagram output by the convolution layer as the input of the cascade block, processing the characteristic diagram by a first residual error unit to obtain the output, splicing the characteristic diagram output by the convolution layer and the output of the first residual error unit, and inputting the spliced characteristic diagram to the first convolution layer for dimension reduction processing. Similarly, after the output of the nth residual unit is obtained, the output of each residual unit before the nth residual unit and the feature map of the output of the convolution layer are spliced, and the spliced output is input into the nth first convolution layer for dimension reduction until the output of the last first convolution layer in one cascade block is obtained. It should be noted that, the first convolution layers in this embodiment are all first convolution layers in one concatenated block, and the first convolution layers may be 1×1 point convolutions.

In one embodiment, as shown in FIG. 5, the 1X 1 point volume corresponding to each residual unit in FIG. 4 may be replaced with a combination of group convolution plus 1X 1 point convolution to reduce the number of parameters in the process. It will be appreciated that the number of cascade blocks and first convolution layers in the image reconstruction model is not limited, and the number of residual units and first convolution layers in each cascade block is not limited and can be adjusted according to different requirements.

In one embodiment, as shown in fig. 6, performing super-resolution reconstruction on the background map includes:

step 602, performing super-resolution reconstruction on the background image through the interpolation algorithm to obtain a background image with a third resolution, wherein the third resolution is larger than the first resolution.

Among them, interpolation algorithms include, but are not limited to, nearest neighbor interpolation, bilinear interpolation, bicubic interpolation, and the like.

Specifically, the electronic device may perform super-resolution reconstruction on the background image with the first resolution through at least one of a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm and a bicubic interpolation algorithm, to obtain a reconstructed background image with a third resolution, where the third resolution is greater than the first resolution.

In this embodiment, the electronic device may further perform super-resolution reconstruction on the background image with the first resolution by using a fast super-resolution algorithm, so as to obtain a reconstructed background image with the third resolution.

The method for fusing the reconstructed foreground image and background image of the target main body to obtain a target image comprises the following steps:

step 604, the foreground map of the target subject at the second resolution and the background map at the third resolution are adjusted to the corresponding sizes.

Specifically, the electronic device may determine the size of the target subject foreground map of the second resolution, and adjust the size of the background map of the third resolution according to the size of the target subject foreground map of the second resolution, so that the reconstructed target subject foreground map and the reconstructed background map have the same size.

In this embodiment, the electronic device may also adjust the size of the reconstructed target subject foreground map according to the size of the reconstructed background map, so that the size of the reconstructed target subject foreground map and the size of the reconstructed background map are the same.

In this embodiment, the electronic device may adjust both the size of the reconstructed foreground image and the size of the background image of the target main body, so that the size of the reconstructed foreground image and the size of the background image of the target main body reach the same target size.

Step 606, fusing the second resolution foreground image of the target subject after the size adjustment and the third resolution background image to obtain the target image.

The image fusion refers to the process of image processing and computer technology of image data of the same image acquired by a multi-source channel, and the process of extracting beneficial information in the channel to the maximum extent and synthesizing high-quality images.

Specifically, the electronic device may fuse the resized target subject foreground map of the second resolution with the third resolution background map. The electronic equipment can process the reconstructed target main body foreground image and background image through a poisson fusion algorithm and the like to obtain a target image.

According to the image processing method, super-resolution reconstruction is carried out on the background image through the interpolation algorithm, the background image with the third resolution is obtained, the foreground image of the target main body with the second resolution and the background image with the third resolution are adjusted to be corresponding sizes, and images with different resolutions and different sizes can be adjusted to be the same size. And fusing the target subject foreground image with the second resolution after the size adjustment and the background image with the third resolution to obtain a complete reconstructed image, thereby obtaining a target image.

In one embodiment, the electronic device may train the image reconstruction model in advance according to the background pattern book. The background sample pair is two identical background images, one background image is a marked high-resolution background image, an unmarked low-resolution background image is input into an untrained image reconstruction model for reconstruction processing, and the reconstructed background image is compared with the marked high-resolution background image so as to continuously adjust parameters of the image reconstruction model, and training is stopped until a threshold value is met. Then, the electronic equipment can input the background image of the image to be processed into a trained image reconstruction model, and super-resolution reconstruction is carried out on the background image through the trained image reconstruction model to obtain a reconstructed background image. The reconstructed background map has a resolution greater than the first resolution.

In one embodiment, as shown in fig. 7, the image processing method is applied to video processing; the first-resolution image to be processed is each frame of image to be processed in the first-resolution video.

Specifically, the image processing method is applied to video processing, by which a low-resolution video image can be reconstructed into a high-resolution image. When the image processing method is applied to video processing, the electronic equipment can take the resolution of the video to be processed as the first resolution, and the image to be processed of the first resolution is the image to be processed of each frame in the video.

The acquiring the image to be processed with the first resolution comprises the following steps:

step 702, obtaining an image to be processed of each frame in a video with a first resolution.

Specifically, the electronic device may obtain the video with the first resolution from a local or other device or a network, or may record the video through the electronic device. The electronic device may obtain each frame of the video at the first resolution for the image to be processed.

The identifying the target subject in the image to be processed to obtain a foreground image and a background image of the target subject comprises the following steps:

step 704, identifying a target subject in each frame of to-be-processed image in the video, and obtaining a foreground image and a background image of the target subject in each frame of to-be-processed image.

Then, the electronic device may input each frame of the image to be processed into a subject detection model, identify a target subject in each frame of the image to be processed through the subject detection model, and divide each frame of the image to be processed into a foreground image and a background image of the target subject. Further, a binarized mask map corresponding to the segmentation of each frame of the image to be processed can be output through the main body detection model.

The super-resolution reconstruction of the foreground map and the background map of the target main body respectively comprises the following steps:

and step 706, respectively carrying out super-resolution reconstruction on the foreground image and the background image of the target main body in each frame of the image to be processed.

Specifically, after the electronic device obtains the foreground image and the background image of the target subject in each frame of the image to be processed through the subject identification model, the foreground image of the target subject in each frame of the image to be processed can be input into the image reconstruction model. And carrying out super-resolution reconstruction on the target main body foreground image in each frame of the image to be processed through an image reconstruction model to obtain a high-resolution target main body foreground image after the reconstruction of the target main body foreground image of each frame of the image to be processed. And the resolution of the reconstructed target subject foreground map is greater than the first resolution. Then, the electronic equipment can reconstruct the background image in each frame of the image to be processed in a super-resolution mode through a rapid super-resolution algorithm or an interpolation algorithm and the like, and a reconstructed high-resolution background image of each frame of the image to be processed is obtained. And the resolution of the reconstructed background images is larger than the first resolution.

In this embodiment, the reconstructed target main body foreground images of each frame have the same resolution, and the reconstructed background images of each frame have the same resolution.

In this embodiment, the resolutions of the reconstructed target main body foreground image and the reconstructed background image of each frame are the same resolution.

The method for fusing the reconstructed main body foreground image and background image to obtain a target image, wherein the resolution of the target image is larger than the first resolution, comprises the following steps:

step 708, fusing the reconstructed foreground image and background image of the target subject corresponding to each frame of the image to be processed to obtain each frame of the target image.

Specifically, the electronic device may establish a mapping relationship between the image to be processed, the reconstructed foreground image and the background image of the target subject. And then, the electronic equipment performs fusion and splicing processing on the reconstructed target main body foreground image and background image with the mapping relation to obtain each frame of target image. Similarly, the resolution of each frame of target image obtained after reconstruction is larger than the first resolution of the corresponding frame of image to be processed.

Step 710, generating a target video according to each frame of target image, wherein the resolution of the target video is larger than the first resolution.

Specifically, the electronic device may fuse and superimpose each frame of target image according to the sequence of the images to be processed of each frame, so as to obtain a high-resolution video, that is, a target video. The resolution of the target video is greater than the first resolution, and the resolution of each frame of target image in the target video is greater than the first resolution.

The image processing method is applied to video processing scenes. The method comprises the steps of obtaining each frame of to-be-processed image in a video with a first resolution, identifying a target subject in each frame of to-be-processed image in the video to obtain a target subject foreground image and a background image in each frame of to-be-processed image, respectively carrying out super-resolution reconstruction on the target subject foreground image and the background image in each frame of to-be-processed image, fusing the reconstructed target subject foreground image and background image corresponding to each frame of to-be-processed image to obtain each frame of target image, generating a target video according to each frame of target image, wherein the resolution of the target video is larger than the first resolution, and reconstructing a video with a low resolution into a video with a high resolution. Different super-resolution reconstruction processing is respectively carried out on the foreground image and the background image of the target main body, so that the processing effect on image details can be improved.

In one embodiment, as shown in fig. 8, the identifying the target subject in the image to be processed includes:

step 802, generating a center weight map corresponding to the image to be processed, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.

The center weight graph is a graph for recording weight values of all pixel points in an image to be processed. The weight values recorded in the center weight graph gradually decrease from the center to the four sides, namely, the center weight is the largest, and gradually decreases to the four sides. And representing the weight value from the image center pixel point to the image edge pixel point of the image to be processed through the center weight graph to gradually reduce.

The ISP processor or the central processor may generate a corresponding center weight map according to the size of the image to be processed. The weight value represented by the center weight map gradually decreases from the center to the four sides. The center weight map may be generated using a gaussian function, or using a first order equation, or a second order equation. The gaussian function may be a two-dimensional gaussian function.

Step 804, inputting the image to be processed and the center weight map into a main body detection model to obtain a main body region confidence map, wherein the main body detection model is a model which is obtained by training in advance according to the image to be processed, the center weight map and the corresponding marked main body mask map of the same scene.

The main body detection model is obtained by acquiring a large amount of training data in advance, and inputting the training data into the main body detection model containing initial network weights for training. Each group of training data comprises an image to be processed, a center weight graph and a marked main mask graph corresponding to the same scene. The image to be processed and the center weight map are used as input of a training main body detection model, and the annotated main body mask (mask) map is used as a real value (ground trunk) expected to be output by the training main body detection model. The main body mask diagram is an image filter template for identifying a main body in an image, and can shield other parts of the image and screen the main body in the image. The subject detection model may be trained to be able to identify and detect various subjects, such as humans, flowers, cats, dogs, background, etc.

Specifically, the ISP processor or the central processing unit may input the image to be processed and the center weight map into the subject detection model, and detect the image to obtain a subject region confidence map. The body region confidence map is a probability for recording which identifiable body a body belongs to, for example, a probability that a certain pixel belongs to a person is 0.8, a probability of flowers is 0.1, and a probability of a background is 0.1.

Step 806, determining a target subject in the image to be processed according to the subject region confidence map.

Specifically, the ISP processor or the central processing unit may select, according to the confidence level map of the main body region, a main body with the highest confidence level or the next highest confidence level as the main body in the image to be processed, and if one main body exists, take the main body as the target main body; if there are a plurality of subjects, one or more of the subjects may be selected as a target subject as needed.

According to the image processing method, the image to be processed is obtained, after the center weight diagram corresponding to the image to be processed is generated, the image to be processed and the center weight diagram are input into the corresponding main body detection model to be detected, the main body region confidence coefficient diagram can be obtained, the target main body in the image to be processed can be determined and obtained according to the main body region confidence coefficient diagram, the object in the center of the image can be detected more easily by using the center weight diagram, and the target main body in the image to be processed can be identified more accurately by using the main body detection model which is obtained by training the image to be processed, the center weight diagram, the main body mask diagram and the like.

In one embodiment, as shown in fig. 9, the determining the target subject in the image to be processed according to the subject region confidence map includes:

and step 902, processing the confidence map of the main body region to obtain a main body mask map.

Specifically, some points with low confidence coefficient and scattered points exist in the confidence coefficient map of the main body region, and filtering processing can be performed on the confidence coefficient map of the main body region through an ISP processor or a central processing unit to obtain a main body mask map. The filtering process may employ configuring a confidence threshold, filtering pixels in the confidence map of the subject region having confidence values below the confidence threshold. The confidence threshold can be an adaptive confidence threshold, a fixed threshold or a threshold corresponding to regional configuration.

Step 904, detecting the image to be processed, and determining a highlight region in the image to be processed.

The highlight region is a region where the brightness value is larger than the brightness threshold.

Specifically, the ISP processor or the central processing unit performs highlight detection on an image to be processed, screens to obtain a target pixel point with a brightness value larger than a brightness threshold value, and adopts a connected domain to process the target pixel point to obtain a highlight region.

Step 906, determining a target subject for eliminating high light in the image to be processed according to the high light region in the image to be processed and the subject mask map.

Specifically, the ISP processor or the central processing unit may perform differential calculation or logical and calculation on the highlight region in the image to be processed and the main mask map to obtain the target main body for eliminating the highlight in the image to be processed.

In this embodiment, filtering is performed on the confidence level map of the main body region to obtain a main body mask map, so that reliability of the confidence level map of the main body region is improved, an image to be processed is detected to obtain a highlight region, and then the highlight region is processed with the main body mask map, so that a target main body with no highlight is obtained, and the highlight region affecting the main body recognition precision are processed by adopting a filter independently, so that the accuracy and the precision of main body recognition are improved.

In one embodiment, the processing the subject region confidence map to obtain a subject mask map includes: performing self-adaptive confidence threshold filtering processing on the main body region confidence map to obtain a binarization mask map, wherein the binarization mask map comprises a main body region and a background region; morphological processing and guided filtering processing are carried out on the binarized mask map, and a main mask map is obtained.

Specifically, after filtering the confidence map of the main body region according to the self-adaptive confidence threshold, the ISP processor or the central processing unit uses 1 for the confidence value of the reserved pixel point, and uses 0 for the confidence value of the removed pixel point, so as to obtain a binarization mask map.

Morphological treatments may include corrosion and swelling. The binary mask diagram can be corroded and then expanded to remove noise; and performing guided filtering treatment on the binarized mask map subjected to morphological treatment to realize edge filtering operation, so as to obtain a main mask map with edge extraction.

The morphological processing and the guided filtering processing can ensure that the noise of the obtained main mask image is less or no, and the edge is softer.

In one embodiment, the binarized mask map includes a subject region and a background region, and the fusing the reconstructed foreground map and background map of the target subject to obtain a target image includes: and fusing the reconstructed target main body foreground image with a main body region in the binarization mask image, and fusing the reconstructed background image with a background region in the binarization mask image to obtain a target image.

Specifically, the binarized mask map includes a main body region and a background region, the main body region may be white, and the background region may be black. The electronic device fuses the reconstructed target main body foreground image with the main body area in the binary mask image, namely with the black part, fuses the reconstructed background image with the background area in the binary mask image and with the black part, and accordingly a target image is obtained.

In one embodiment, the method further comprises: acquiring a depth map corresponding to the image to be processed; the depth map comprises at least one of a TOF depth map, a binocular depth map and a structured light depth map; and carrying out registration processing on the image to be processed and the depth map to obtain the image to be processed and the depth map after the same scene registration.

The depth map is a map containing depth information. And shooting the same scene by a depth camera or a binocular camera to obtain a corresponding depth map. The depth camera may be a structured light camera or a TOF camera. The depth map may be at least one of a structured light depth map, a TOF depth map, and a binocular depth map.

Specifically, the electronic device can shoot the same scene through the ISP processor or the central processing unit to obtain an image to be processed and a corresponding depth map, and then the image to be processed and the depth map are registered by adopting camera calibration parameters to obtain the registered image to be processed and the depth map.

In other embodiments, when a depth map cannot be captured, an automatically generated simulated depth map may be generated. The depth value of each pixel point in the simulation depth map may be a preset value. In addition, the depth values of the pixels in the simulation depth map may correspond to different preset values.

In one embodiment, the inputting the image to be processed and the center weight map into a subject detection model to obtain a subject region confidence map includes: inputting the registered image to be processed, the depth map and the center weight map into a main body detection model to obtain a main body region confidence map; the main body detection model is a model which is obtained by training in advance according to an image to be processed, a depth map, a center weight map and a corresponding marked main body mask map of the same scene.

The main body detection model is obtained by acquiring a large amount of training data in advance, and inputting the training data into the main body detection model containing initial network weights for training. Each group of training data comprises an image to be processed, a depth map, a center weight map and a marked main mask map corresponding to the same scene. The image to be processed and the center weight map are used as inputs of a training main body detection model, and the marked main body mask map is used as a real value expected to be output by the training main body detection model. The main body mask diagram is an image filter template for identifying a main body in an image, and can shield other parts of the image and screen the main body in the image. The subject detection model may be trained to be able to identify and detect various subjects, such as humans, flowers, cats, dogs, background, etc.

In this embodiment, the depth map and the center weight map are used as the input of the main body detection model, the object closer to the camera can be detected more easily by using the depth information of the depth map, the object in the center of the image is detected more easily by using the center attention mechanism with large center weight and small four sides weight in the center weight map, the main body is enhanced by introducing the depth map, the main body is enhanced by introducing the center weight map, the main body identification accuracy in a simple scene can be accurately identified, the main body identification accuracy in a complex scene is greatly improved, and the problem that the main body robustness of a traditional target detection method to a natural image is poor is solved by introducing the depth map. A simple scene is a scene with a single main body and low background area contrast.

FIG. 10 is a schematic diagram showing the effect of subject identification on an image to be processed in one embodiment. As shown in fig. 10, the image to be processed is an RGB image 1002, a butterfly exists in the RGB image 1002, the RGB image is input into a main body detection model to obtain a main body region confidence image 1004, filtering and binarizing the main body region confidence image 1004 to obtain a binarized mask image 1006, and morphological processing and guided filtering are performed on the binarized mask image 1006 to achieve edge enhancement, so as to obtain a main body mask image 1008.

In one embodiment, there is provided an image processing method including:

step (a 1), obtaining an image to be processed with a first resolution.

And (a 2) generating a center weight map corresponding to the image to be processed, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.

And (a 3) inputting the image to be processed and the center weight graph into a main body detection model to obtain a main body region confidence graph, wherein the main body detection model is a model which is obtained by training in advance according to the image to be processed, the center weight graph and the corresponding marked main body mask graph of the same scene.

And (a 4) performing self-adaptive confidence threshold filtering processing on the confidence map of the main body region to obtain a binarized mask map, wherein the binarized mask map comprises the main body region and the background region.

And (a 5) performing morphological processing and guided filtering processing on the binarized mask map to obtain a main mask map.

And (a 6) detecting the image to be processed and determining a highlight region in the image to be processed.

And (a 7) determining a target subject for eliminating the highlight in the image to be processed according to the highlight region in the image to be processed and the subject mask map, and obtaining a target subject foreground map and a target subject background map.

And (a 8) extracting the characteristics of the target main body foreground image through an image reconstruction model to obtain a characteristic image, wherein the image reconstruction model is a model which is obtained by training according to a main body foreground pattern pair in advance, and the main body foreground pattern pair comprises a main body foreground image with a first resolution and the main body foreground image with a second resolution.

And (a 9) performing super-resolution processing on the feature map through the image reconstruction model to obtain a target subject foreground map with a second resolution, wherein the second resolution is larger than the first resolution.

And (a 10) reconstructing the background image in super resolution through the interpolation algorithm to obtain a background image in third resolution, wherein the third resolution is larger than the first resolution.

And (a 11) adjusting the foreground image of the target main body with the second resolution and the background image with the third resolution to corresponding sizes.

And (a 12) fusing the target subject foreground image with the second resolution after the size adjustment with the subject region in the binarization mask image, and fusing the background image with the third resolution after the size adjustment with the background region in the binarization mask image to obtain a target image.

According to the image processing method, the main body detection model is used for carrying out main body recognition on the image to be processed with the first resolution, so that the foreground image and the background image of the target main body can be obtained rapidly and accurately. The super-resolution reconstruction processing is carried out on the target main body foreground image through the image reconstruction model, so that details of the target main body foreground image can be better processed, and the reconstructed details of the target main body foreground image are clearer. And the super-resolution reconstruction is carried out on the background image through an interpolation algorithm, so that the definition of the foreground image of the target main body is ensured, and the speed of the super-resolution reconstruction is considered. And adjusting the reconstructed foreground images and background images of the target main body with different resolutions to be the same size, and fusing the reconstructed foreground images and background images with corresponding areas in the binarized mask image to obtain a target image. According to the scheme, the problem that when the traditional super-resolution reconstruction is carried out, processing of each region of the picture is indiscriminate, and the details and the efficiency of the image cannot be considered in the reconstruction is solved.

As shown in fig. 11, a schematic diagram of an image processing method in one embodiment is shown. And the electronic equipment inputs the image to be processed with the first resolution into a main body detection model to obtain a foreground image and a background image of the target main body. And performing super-resolution reconstruction processing on the foreground image of the target main body through an image reconstruction model formed by the cascade residual error network, and performing super-resolution reconstruction on the background image through an interpolation algorithm. And fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the resolution of the target image is larger than the first resolution.

It should be understood that, although the steps in the flowcharts of fig. 2-9 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps of fig. 2-9 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps or stages of other steps.

Fig. 12 is a block diagram of the structure of an image processing apparatus of an embodiment. As shown in fig. 12, includes: an acquisition module 1202, an identification module 1204, a reconstruction module 1206, and a fusion module 1208.

An acquiring module 1202 is configured to acquire an image to be processed with a first resolution.

And the recognition module 1204 is used for recognizing the target subject in the image to be processed to obtain a foreground image and a background image of the target subject.

A reconstruction module 1206 is configured to perform super-resolution reconstruction on the foreground map and the background map of the target subject, respectively.

The fusion module 1208 is configured to fuse the reconstructed foreground image and the reconstructed background image of the target subject to obtain a target image, where the resolution of the target image is greater than the first resolution.

According to the image processing device, the target main body in the image to be processed is identified by acquiring the image to be processed with the first resolution, and the foreground image and the background image of the target main body are obtained. And respectively carrying out super-resolution reconstruction on the foreground image and the background image of the target main body, and carrying out different super-division processing on the foreground image and the background image of the target main body. And fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the resolution of the target image is larger than the first resolution, so that details of the image can be considered, and the detail processing effect of image reconstruction is improved.

In one embodiment, the reconstruction module 1206 is further configured to: extracting features of the target main body foreground image through an image reconstruction model to obtain a feature image, wherein the image reconstruction model is a model which is obtained by training according to a main body foreground image pair in advance, and the main body foreground image pair comprises a main body foreground image with a first resolution and the main body foreground image with a second resolution; and performing super-resolution processing on the feature map through the image reconstruction model to obtain a target subject foreground map with second resolution, wherein the second resolution is larger than the first resolution.

According to the image processing device, the features of the target main body foreground image are extracted through the trained image reconstruction model, the feature image is obtained, the super-resolution processing is carried out on the feature image through the image reconstruction model, the target main body foreground image with the second resolution is obtained, the second resolution is larger than the first resolution, local super-resolution reconstruction processing can be carried out on the target main body foreground image, details of the target main body foreground image can be processed better, and therefore the definition of the target main body can be guaranteed.

In one embodiment, the reconstruction module 1206 is further configured to: performing super-resolution reconstruction on the background image through the interpolation algorithm to obtain a background image with third resolution, wherein the third resolution is larger than the first resolution; adjusting the foreground image of the target main body with the second resolution and the background image with the third resolution to corresponding sizes; and fusing the target subject foreground image with the second resolution after the size adjustment and the background image with the third resolution to obtain a target image.

In the image processing apparatus of this embodiment, the interpolation algorithm is used to reconstruct the background image in super-resolution to obtain the background image in the third resolution, and the foreground image of the target subject in the second resolution and the background image in the third resolution are adjusted to be corresponding sizes, so that images in different resolutions and different sizes can be adjusted to be the same size. And fusing the target subject foreground image with the second resolution after the size adjustment and the background image with the third resolution to obtain a complete reconstructed image, thereby obtaining a target image.

In one embodiment, the image processing method is applied to video processing; the image to be processed with the first resolution is an image to be processed of each frame in the video with the first resolution;

the acquisition module 1202 is further configured to: and acquiring each frame of to-be-processed image in the video with the first resolution.

The identification module 1204 is further configured to: and identifying a target subject in each frame of image to be processed in the video, and obtaining a foreground image and a background image of the target subject in each frame of image to be processed.

The reconstruction module 1206 is also configured to: and respectively carrying out super-resolution reconstruction on a foreground image and a background image of the target main body in each frame of image to be processed.

The fusion module 1208 is further configured to: fusing the reconstructed foreground image and background image of the target main body corresponding to each frame of image to be processed to obtain each frame of target image; a target video is generated from each frame of target image, the target video having a resolution greater than the first resolution.

The image processing device is applied to video processing scenes. The method comprises the steps of obtaining each frame of to-be-processed image in a video with a first resolution, identifying a target subject in each frame of to-be-processed image in the video to obtain a target subject foreground image and a background image in each frame of to-be-processed image, respectively carrying out super-resolution reconstruction on the target subject foreground image and the background image in each frame of to-be-processed image, fusing the reconstructed target subject foreground image and background image corresponding to each frame of to-be-processed image to obtain each frame of target image, generating a target video according to each frame of target image, wherein the resolution of the target video is larger than the first resolution, and reconstructing a video with a low resolution into a video with a high resolution. Different super-resolution reconstruction processing is respectively carried out on the foreground image and the background image of the target main body, so that the processing effect on image details can be improved.

In one embodiment, the identification module 1204 is further to: generating a center weight graph corresponding to the image to be processed, wherein the weight value represented by the center weight graph gradually decreases from the center to the edge; inputting the image to be processed and the center weight graph into a main body detection model to obtain a main body region confidence graph, wherein the main body detection model is a model which is obtained by training in advance according to the image to be processed, the center weight graph and the corresponding marked main body mask graph of the same scene; and determining a target subject in the image to be processed according to the subject region confidence map.

The image processing device in this embodiment obtains the image to be processed, generates the center weight map corresponding to the image to be processed, then inputs the image to be processed and the center weight map into the corresponding main body detection model to detect, and can obtain the main body region confidence map, and can determine and obtain the target main body in the image to be processed according to the main body region confidence map, and can make the object in the center of the image easier to detect by using the center weight map, and can more accurately identify the target main body in the image to be processed by using the trained main body detection model which is obtained by using the image to be processed, the center weight map, the main body mask map and the like.

In one embodiment, the identification module 1204 is further to: processing the confidence level map of the main body region to obtain a main body mask map; detecting the image to be processed, and determining a highlight region in the image to be processed; and determining a target subject for eliminating the highlight in the image to be processed according to the highlight region in the image to be processed and the subject mask map.

In one embodiment, the identification module 1204 is further to: performing self-adaptive confidence threshold filtering processing on the main body region confidence map to obtain a binarization mask map, wherein the binarization mask map comprises a main body region and a background region; morphological processing and guided filtering processing are carried out on the binarized mask map, so that a main mask map is obtained;

the fusion module 1208 is further configured to: and fusing the reconstructed target main body foreground image with a main body region in the binarization mask image, and fusing the reconstructed background image with a background region in the binarization mask image to obtain a target image.

In one embodiment, the acquisition module 1202 is further configured to: acquiring a depth map corresponding to the image to be processed; the depth map comprises at least one of a TOF depth map, a binocular depth map and a structured light depth map; and carrying out registration processing on the image to be processed and the depth map to obtain the image to be processed and the depth map after the same scene registration.

The identification module 1204 is further configured to: inputting the registered image to be processed, the depth map and the center weight map into a main body detection model to obtain a main body region confidence map; the main body detection model is a model which is obtained by training in advance according to an image to be processed, a depth map, a center weight map and a corresponding marked main body mask map of the same scene.

The above-described division of the respective modules in the image processing apparatus is merely for illustration, and in other embodiments, the image processing apparatus may be divided into different modules as needed to accomplish all or part of the functions of the above-described image processing apparatus.

Fig. 13 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 13, the electronic device includes a processor and a memory connected through a system bus. Wherein the processor is configured to provide computing and control capabilities to support operation of the entire electronic device. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program is executable by a processor for implementing an image processing method provided in the following embodiments. The internal memory provides a cached operating environment for operating system computer programs in the non-volatile storage medium. The electronic device may be a cell phone, tablet computer or personal digital assistant or wearable device, etc.

The implementation of each module in the image processing apparatus provided in the embodiment of the present application may be in the form of a computer program. The computer program may run on a terminal or a server. Program modules of the computer program may be stored in the memory of the terminal or server. Which when executed by a processor, performs the steps of the method described in the embodiments of the application.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform steps of an image processing method.

A computer program product comprising instructions which, when run on a computer, cause the computer to perform an image processing method.

Any reference to memory, storage, database, or other medium used by embodiments of the application may include non-volatile and/or volatile memory. Suitable nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed with a first resolution;

extracting features of the target main body foreground image through an image reconstruction model to obtain a feature image, wherein the image reconstruction model is a model which is obtained by training according to a main body foreground pattern pair in advance, and the main body foreground pattern pair comprises a main body foreground image with a first resolution and the main body foreground image with a second resolution;

performing super-resolution processing on the feature map through the image reconstruction model to obtain a target subject foreground map with a second resolution, wherein the second resolution is larger than the first resolution;

Performing super-resolution reconstruction on the background image through an interpolation algorithm to obtain a background image with third resolution, wherein the third resolution is larger than the first resolution;

2. The method according to claim 1, wherein the fusing the reconstructed foreground map and background map of the target subject to obtain the target image includes:

adjusting the target subject foreground image with the second resolution and the background image with the third resolution to corresponding sizes;

and fusing the target subject foreground image with the second resolution after the size adjustment and the background image with the third resolution to obtain a target image.

3. The method according to claim 1, wherein the image processing method is applied to video processing; the image to be processed with the first resolution is an image to be processed of each frame in the video with the first resolution;

the obtaining the image to be processed with the first resolution comprises the following steps:

acquiring each frame of image to be processed in the video with the first resolution;

Identifying a target subject in each frame of image to be processed in the video, and obtaining a foreground image and a background image of the target subject in each frame of image to be processed;

the step of performing super-resolution processing on the feature map through the image reconstruction model to obtain a target subject foreground map with a second resolution includes:

extracting the characteristics of the target main body foreground images of each frame of the image to be processed through an image reconstruction model to obtain characteristic images respectively corresponding to the target main body foreground images;

performing super-resolution processing on the feature images corresponding to the target main body foreground images respectively through the image reconstruction model to obtain target main body foreground images with second resolution corresponding to each frame of the image to be processed respectively;

performing super-resolution reconstruction on the background image through an interpolation algorithm to obtain a background image with a third resolution, wherein the method comprises the following steps:

performing super-resolution reconstruction on the background image in the image to be processed of each frame through an interpolation algorithm to obtain background images with third resolution corresponding to the image to be processed of each frame, wherein the third resolution is larger than the first resolution;

fusing the reconstructed main body foreground image and background image to obtain a target image, wherein the resolution of the target image is larger than the first resolution, and the method comprises the following steps:

Fusing the reconstructed foreground image and background image of the target main body corresponding to each frame of image to be processed to obtain each frame of target image;

and generating a target video according to each frame of target image, wherein the resolution of the target video is larger than the first resolution.

4. The method of claim 1, wherein the identifying the target subject in the image to be processed comprises:

generating a center weight graph corresponding to the image to be processed, wherein the weight value represented by the center weight graph gradually decreases from the center to the edge;

inputting the image to be processed and the center weight map into a main body detection model to obtain a main body region confidence map, wherein the main body detection model is a model which is obtained by training in advance according to the image to be processed, the center weight map and the corresponding marked main body mask map of the same scene;

and determining a target subject in the image to be processed according to the subject region confidence level diagram.

5. The method of claim 4, wherein the determining a target subject in the image to be processed from the subject region confidence map comprises:

processing the main body region confidence map to obtain a main body mask map;

Detecting the image to be processed and determining a highlight region in the image to be processed;

and determining a target subject for eliminating the highlight in the image to be processed according to the highlight region in the image to be processed and the subject mask map.

6. The method of claim 5, wherein processing the subject region confidence map to obtain a subject mask map comprises:

performing self-adaptive confidence threshold filtering processing on the main body region confidence map to obtain a binarization mask map, wherein the binarization mask map comprises a main body region and a background region;

performing morphological processing and guided filtering processing on the binarization mask map to obtain a main mask map;

fusing the reconstructed foreground image and background image of the target main body to obtain a target image, wherein the method comprises the following steps:

and fusing the reconstructed target main body foreground image with a main body region in the binarization mask image, and fusing the reconstructed background image with a background region in the binarization mask image to obtain a target image.

7. The method according to claim 4, wherein the method further comprises:

acquiring a depth map corresponding to the image to be processed; the depth map comprises at least one of a TOF depth map, a binocular depth map and a structured light depth map;

Registering the image to be processed and the depth map to obtain the image to be processed and the depth map after the same scene registration;

inputting the image to be processed and the center weight map into a main body detection model to obtain a main body region confidence map, wherein the method comprises the following steps of:

inputting the registered image to be processed, the depth map and the center weight map into a main body detection model to obtain a main body region confidence map; the main body detection model is a model which is obtained by training in advance according to an image to be processed, a depth map, a center weight map and a corresponding marked main body mask map of the same scene.

8. An image processing apparatus, comprising:

the reconstruction module is used for extracting the characteristics of the target main body foreground image through an image reconstruction model to obtain a characteristic image, wherein the image reconstruction model is a model which is obtained by training according to a main body foreground pattern pair in advance, and the main body foreground pattern pair comprises a main body foreground image with a first resolution and the main body foreground image with a second resolution; performing super-resolution processing on the feature map through the image reconstruction model to obtain a target subject foreground map with a second resolution, wherein the second resolution is larger than the first resolution; performing super-resolution reconstruction on the background image through an interpolation algorithm to obtain a background image with third resolution, wherein the third resolution is larger than the first resolution;

9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the image processing method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the image processing method according to any one of claims 1 to 7.