WO2023137914A1

WO2023137914A1 - Image processing method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023137914A1
Application number: PCT/CN2022/090713
Authority: WO
Inventors: 郑喜民; 翟尤; 舒畅; 陈又新
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-01-18
Filing date: 2022-04-29
Publication date: 2023-07-27
Also published as: CN114399454A

Abstract

Embodiments of the present application relate to the technical field of image processing, and provide an image processing method and apparatus, an electronic device, and a storage medium. The method comprises: obtaining an original image to be processed; performing preliminary matting processing on the original image by means of a backbone network of a pre-trained matting model to obtain an initial foreground image; performing local refinement processing on an edge area of the initial foreground image by means of a fine-tuning network of the matting model to obtain a target foreground image; performing super-resolution reconstruction processing on the target foreground image by means of a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than that of the target foreground image; and performing image fusion on the standard foreground image and a preset background image to obtain a target image. The embodiments of the present application can improve the image quality of the matted target image.

Description

Image processing method, device, electronic device and storage medium

This application claims the priority of the Chinese patent application with the application number 202210057041.9 and the invention title "image processing method, device, electronic equipment and storage medium" submitted to the China Patent Office on January 18, 2022, the entire contents of which are incorporated in this application by reference.

technical field

The present application relates to the technical fields of artificial intelligence and image processing, and in particular to an image processing method, device, electronic equipment, and storage medium.

Background technique

Currently, many methods often rely on masked datasets to learn matting, such as context-aware matting, indexed matting, sampling-based matting, and opacity propagation-based matting, etc.

technical problem

The following is the technical problem of the prior art realized by the inventor: the performance of the existing method depends on the quality of the marking, which often results in lower image quality after matting. Therefore, how to provide an image processing method that can improve the image quality after matting has become a technical problem to be solved urgently.

technical solution

In the first aspect, the embodiment of the present application proposes an image processing method, the method comprising:

Get the original image to be processed;

Preliminary matting processing is performed on the original image through the backbone network of the pre-trained matting model to obtain an initial foreground image;

performing local refinement processing on the edge area of the initial foreground image through the fine-tuning network of the cutout model to obtain the target foreground image;

performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

Image fusion is performed on the standard foreground image and the preset background image to obtain a target image.

In the second aspect, the embodiment of the present application proposes an image processing device, the device includes:

The original image acquisition module is used to acquire the original image to be processed;

The preliminary image matting module is used to perform initial image matting processing on the original image through the backbone network of the preset image matting model to obtain an initial foreground image;

A local refinement module, configured to perform local refinement processing on the edge region of the initial foreground image through the fine-tuning network of the matting model to obtain a target foreground image;

A super-resolution reconstruction module, configured to perform super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

The image fusion module is used to perform image fusion on the standard foreground image and the preset background image to obtain the target image.

In a third aspect, an embodiment of the present application proposes an electronic device, the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory. When the program is executed by the processor, an image processing method is implemented, wherein the image processing method includes: obtaining an original image to be processed; performing preliminary matting processing on the original image through a pre-trained backbone network of a matting model to obtain an initial foreground image; Carry out local thinning processing to obtain the target foreground image; carry out super-resolution reconstruction processing on the target foreground image by a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image; image fusion is performed on the standard foreground image and the preset background image to obtain the target image.

In a fourth aspect, an embodiment of the present application proposes a storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement an image processing method, wherein the image processing method includes: obtaining an original image to be processed; performing preliminary matting processing on the original image through a pre-trained backbone network of a matting model to obtain an initial foreground image; The target foreground image; the target foreground image is subjected to super-resolution reconstruction processing through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image; image fusion is performed on the standard foreground image and the preset background image to obtain the target image.

Beneficial effect

The image processing method, device, electronic device and storage medium proposed in the present application can obtain a foreground image with a better matting effect through a matting model. Furthermore, by performing super-resolution reconstruction on the target foreground image through the pre-trained image reconstruction model, a clearer standard foreground image can be obtained, and the matting effect is enhanced visually. Finally, image fusion is performed on the standard foreground image and the preset background image, so that the target image has a higher resolution, thereby improving the image quality.

Description of drawings

The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.

Fig. 1 is a flow chart of the image processing method provided by the embodiment of the present application;

Fig. 2 is another flow chart of the image processing method provided by the embodiment of the present application;

Fig. 3 is the flowchart of step S102 in Fig. 1;

Fig. 4 is the flowchart of step S103 in Fig. 1;

Fig. 5 is another flow chart of the image processing method provided by the embodiment of the present application;

Fig. 6 is the flowchart of step S104 in Fig. 1;

Fig. 7 is the flowchart of step S105 in Fig. 1;

FIG. 8 is a schematic structural diagram of an image processing device provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.

Embodiments of the present invention

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the flow chart. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

First, analyze some nouns involved in this application:

Image matting means that for a given picture, the network can automatically extract the foreground part and delete the background part. It is a common method in the field of image enhancement.

Image Fusion refers to the process of image processing and computer technology on the image data of the same target collected by multi-source channels to maximize the extraction of beneficial information in each channel, and finally synthesize high-quality images to improve the utilization of image information, improve the accuracy and reliability of computer interpretation, and improve the spatial resolution and spectral resolution of the original image, which is conducive to monitoring. Image fusion refers to the process of combining multiple images into one image according to certain fusion rules after preprocessing such as denoising and registration. The fused image can describe the target more clearly and accurately, which is more suitable for the subsequent processing of the image. (multi-sensor image fusion (visible light image and infrared image fusion), single sensor multi-focus image fusion).

There are three basic principles to be followed in image fusion:

1) The fused image should contain obvious salient information of all source images;

2) The fused image cannot add any artificial information;

3) Information that is not of interest in the source image, such as noise, should be suppressed as much as possible from the fusion image.

According to the principle of information extraction level from low to high, it can be divided into three categories: pixel-level image fusion, feature-level image fusion and decision-level image fusion.

Pixel-level fusion directly fuses the pixel-based features of the source image according to certain fusion rules, and finally generates a fusion image. It retains the most original information of the source image and the highest fusion accuracy, but this type of method also has the disadvantages of the largest amount of information, high requirements for hardware equipment and registration, long calculation time and poor real-time processing.

Feature-level image fusion is the process of firstly performing simple preprocessing on the source image, then extracting feature information such as corners, edges, and shapes of the source image through a certain model, and selecting through appropriate fusion rules, and then selecting and fusing these feature information according to certain fusion rules, and finally generating a fusion image. The fusion object of this type of fusion method is the feature information of the source image, so the requirements for image registration are not as strict as those for pixel-level fusion. At the same time, this type of method extracts the characteristic information of the source image, compresses the detailed information of the image, enhances its own real-time processing ability, and provides the required characteristic information for decision analysis as much as possible. Compared with the previous level image fusion method, the accuracy of the feature level image fusion method is average.

The decision-level image is a process in which each source image has independently completed its own decision-making tasks such as classification and recognition before fusion. The fusion process is a process in which a global optimal decision is generated by comprehensively analyzing the results of each independent decision in the front and then a fused image is formed accordingly. This fusion method has the advantages of high flexibility, small communication volume, best real-time performance, strong fault tolerance and strong anti-interference ability. However, decision-level image fusion needs to make decisions and judgments on each image separately, resulting in too many processing tasks before the final fusion and high preprocessing costs in the early stage.

Traditional non-learning based matting algorithms need to manually label the three-color image and solve the alpha mask in the unknown region of the three-color image. Currently, many methods often rely on masked datasets to learn matting, such as context-aware matting, indexed matting, sampling-based matting, and opacity propagation-based matting, etc. The performance of these methods depends on the quality of the labeling, which tends to result in lower image quality after matting. Therefore, how to provide an image processing method that can improve the image quality after matting has become a technical problem to be solved urgently.

Based on this, embodiments of the present application provide an image processing method, device, electronic device, and storage medium, aiming at improving the image quality of a matted target image.

The image processing method, device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the image processing method in the embodiments of the present application is described.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

The image processing method provided in the embodiment of the present application relates to the technical field of artificial intelligence. The image processing method provided in the embodiment of the present application may be applied to a terminal, may also be applied to a server, and may also be software running on the terminal or the server. In some embodiments, the terminal can be a smart phone, tablet computer, notebook computer, desktop computer, etc.; the server can be configured as an independent physical server, or can be configured as a server cluster or distributed system composed of multiple physical servers, and can also be configured as a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms;

Fig. 1 is an optional flow chart of the image processing method provided by the embodiment of the present application. The method in Fig. 1 may include but not limited to steps S101 to S105.

Step S101, acquiring the original image to be processed;

Step S102, performing preliminary matting processing on the original image through the backbone network of the pre-trained matting model to obtain an initial foreground image;

In step S103, the edge area of the initial foreground image is locally refined through the fine-tuning network of the matting model to obtain the target foreground image;

Step S104, performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

Step S105, performing image fusion on the standard foreground image and the preset background image to obtain the target image.

In steps S101 to S105 shown in the embodiment of the present application, the pre-trained backbone network of the matting model is used to perform preliminary matting processing on the original image to obtain the initial foreground image; the fine-tuning network of the matting model is used to locally refine the edge area of the initial foreground image to obtain the target foreground image. In this way, the foreground image with better matting effect can be obtained through the matting model. Furthermore, the pre-trained image reconstruction model is used to perform super-resolution reconstruction on the target foreground image to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than that of the target foreground image, and a clearer standard foreground image can be obtained, which strengthens the matting effect from the visual effect. Finally, the standard foreground image and the preset background image are fused to obtain the target image, which makes the target image have a higher resolution, thereby improving the image quality.

In step S101 of some embodiments, the original image to be processed may be a three-dimensional image; in some embodiments, the three-dimensional image may be obtained by computer tomography (Computed Tomo-graphy, -CT), and in another embodiment, the three-dimensional image may also be obtained by magnetic resonance imaging (Magnetic Resonance Imaging, MRI).

In some medical application scenarios, the above-mentioned original image to be processed may be a medical image, and the type of object contained in the original image is a lesion, that is, a part of the body where a lesion occurs. Medical imaging refers to internal tissues obtained in a non-invasive manner for medical treatment or medical research, such as CT (Computed Tomography, computerized tomography), MRI (Magnetic Resonance Imaging, magnetic resonance imaging), US (ultrasonic, ultrasound), X-ray images, and images generated by medical instruments with optical photography lights.

Please refer to FIG. 2 , in some embodiments, before step S102, the image processing method further includes pre-training the matting model, specifically including but not limited to steps S201 to S207:

Step S201, acquiring a sample image, wherein the resolution of the sample image is lower than that of a preset reference image;

Step S202, inputting the sample image into the cutout model;

Step S203, performing convolution processing on the sample image through the backbone network to obtain a sample image matrix, and performing feature extraction on the sample image matrix to obtain a predicted foreground value of the sample;

Step S204, performing preliminary matting processing on the sample image through the backbone network and the predicted foreground value of the sample to obtain the sample foreground image;

Step S205, obtaining the sample edge prediction value of each sample pixel in the sample foreground map through the fine-tuning network;

Step S206, according to the size relationship between the sample edge prediction value and the preset edge prediction threshold, determine the number of sample edge pixel points;

Step S207, optimize the loss function of the cutout model according to the number of sample edge pixels, so as to update the cutout model.

Specifically, in step S201, the sample image can be obtained by computer tomography (Computed Tomo-graphy, CT) or magnetic resonance imaging (Magnetic Resonance Imaging, MRI), wherein the resolution of the sample image is lower than the resolution of the preset reference image, that is, the sample image is a low-resolution image.

Further, step S202 is executed to input the sample image into the matting model.

It should be noted that the matting model can include the open-source matting network Background Matting V2. The matting model is mainly composed of two parts, namely a backbone network and a fine-tuning network. The backbone network is an adjusted and deformed residual network. The backbone network includes 3 convolutional layers (namely, the first convolutional layer, the second convolutional layer, and the third convolutional layer). The convolution kernel size of each convolutional layer is set to 3×3, and the backbone network contains six input channels.

Further, step S203 and S204 are executed, and the sample image is convoluted through the first convolutional layer of the backbone network to obtain a sample image matrix equal in size to the sample image, and the matrix values of the sample image matrix include 0 and 1, wherein 0 represents the background and 1 represents the foreground. The feature extraction of the sample image matrix is performed through the second convolutional layer, and all matrix values with a value of 1 are obtained, and these matrix values with a value of 1 are included in the same set, and the matrix values in this set are the predicted foreground values of the sample. Through the third convolutional layer of the backbone network, the pixel values with a predicted foreground value of 1 are extracted from the original image, and the image formed by these pixel values is the sample foreground image.

Further, by executing steps S205 and S206, since the sample edge prediction information of each sample pixel point can be calculated when the sample image is preliminarily matted through the backbone network, the sample edge prediction value contained in the sample edge prediction information can be obtained, and the degree to which the sample pixel point belongs to the edge can be identified through the sample edge prediction value. By setting the edge prediction threshold in advance, the sample edge prediction value is compared with the edge prediction threshold, so as to filter the sample pixels in the edge area of the sample foreground image. If the sample edge prediction value is less than or equal to the edge prediction threshold, it indicates that the sample pixel point belongs to the sample foreground image; if the sample edge prediction value is greater than the edge prediction threshold value, it indicates that the sample pixel point does not belong to the sample foreground image, and the sample pixel point is used as the sample edge pixel point, thereby statistically determining the number of sample edge pixel points.

Finally, step S207 is performed to compare the number of sample edge pixels with the preset threshold number of sample edge pixels, calculate the model loss of the cutout model, and backpropagate the model loss. For example, backpropagation can be performed according to the loss function to update the cutout model by optimizing the loss function, mainly to update the internal parameters of the cutout model (that is, loss parameters). It can be understood that conventional backpropagation principles may be applied to the backpropagation principle, which is not limited in this embodiment of the present application. By continuously repeating the above process, until the number of sample edge pixels is less than or equal to the threshold of the number of sample edge pixels, or the number of iterations meets the preset number of times, the loss function optimization of the cutout model is completed, and the update of the cutout model is stopped.

Referring to FIG. 3, in some embodiments, step S102 may include but not limited to include steps S301 to S303:

Step S301, performing convolution processing on the original image to obtain the original image matrix;

Step S302, performing feature extraction on the original image matrix to obtain predicted foreground values;

Step S303, performing preliminary matting processing on the original image according to the predicted foreground value to obtain an initial foreground image.

Specifically, in step S301, the original image is input into the matting model, and the original image is convoluted through the first convolution layer of the backbone network of the matting model to obtain an original image matrix that is equal in size to the original image. The matrix values of the original image matrix include 0 and 1, where 0 represents the background and 1 represents the foreground. It should be noted that the equal size here means that both the width and the height of the original image matrix are the same as those of the original image.

In step S302, feature extraction is performed on the original image matrix through the second convolutional layer to obtain all matrix values with a value of 1, and these matrix values with a value of 1 are included in the same set, and the matrix values in this set are predicted foreground values.

In step S303, the pixel values with predicted foreground values of 1 are extracted from the original image through the third convolutional layer of the backbone network, and the image formed by these pixel values is the initial foreground image, so as to realize the preliminary image matting process on the original image and obtain the initial foreground image.

Referring to FIG. 4, in some embodiments, step S103 may include but not limited to include steps S401 to S403:

Step S401, obtaining the edge prediction value of each pixel in the initial foreground image;

Step S402, according to the size relationship between the edge prediction value and the preset edge prediction threshold, determine the edge pixel points of the initial foreground image;

Step S403, filter the edge pixels of the initial foreground image to obtain the target foreground image.

Specifically, in order to improve the matting accuracy, it is necessary to further finely divide the pixels in the edge area of the initial foreground image that are difficult to distinguish by matting. Firstly, step S401 is executed. When the original image is preliminarily matted through the backbone network, the edge prediction information of each pixel can be calculated. Therefore, in the process of local refinement of the initial foreground image, the edge prediction value contained in the edge prediction information can be obtained, and the extent to which the pixel belongs to the edge can be identified through the edge prediction value.

Further, step S402 and step S403 are executed, by setting the edge prediction threshold in advance, comparing the edge prediction value with the edge prediction threshold, thereby filtering the pixels in the edge region. For example, the preset edge prediction threshold may be 0.5, 0.3 and so on. If the edge prediction value is less than or equal to the edge prediction threshold, it indicates that the pixel belongs to the initial foreground image; if the edge prediction value is greater than the edge prediction threshold, it indicates that the pixel does not belong to the initial foreground image, and the pixel is regarded as an edge pixel, and the edge pixel is removed to realize the filtering and impurity removal of the pixels of the initial foreground image, and the image composed of the remaining pixels is used as the target foreground image, so as to realize the local refinement of the initial foreground image and improve the image quality of the target foreground image.

Referring to FIG. 5, in some embodiments, before step S104, the image processing method further includes pre-training an image reconstruction model, specifically including but not limited to steps S501 to S506:

Step S501, acquiring a sample image, wherein the resolution of the sample image is lower than that of a preset reference image;

Step S502, performing preliminary matting processing and local refinement processing on the sample image to obtain a sample foreground image;

Step S503, inputting the sample foreground image into the initial model;

Step S504, perform super-resolution reconstruction processing on the sample foreground image through the generation network of the initial model, and generate a sample intermediate foreground image corresponding to the sample foreground image, and the resolution of the sample intermediate foreground image is higher than that of the sample foreground image;

Step S505, calculate the similarity between the sample intermediate foreground image and the reference sample foreground image through the discriminant network of the initial model, and obtain the similarity probability value;

Step S506, optimizing the loss function of the initial model according to the similarity probability value to update the initial model to obtain an image reconstruction model.

Specifically, by executing step S501, the sample image can be obtained by computer tomography (Computed Tomo-graphy, CT) or magnetic resonance imaging (Magnetic Resonance Imaging, MRI), wherein the resolution of the sample image is lower than the resolution of the preset reference image, that is, the sample image is a low-resolution image.

Further, step S502 is executed to perform preliminary matting processing and local optimization processing on the sample image through the backbone network of the pre-trained matting model and the fine-tuning network to obtain the sample foreground image. The specific process is the same as the above-mentioned matting process of the original image, and will not be repeated here.

Further, step S503 is executed to input the sample image into the initial model.

It should be noted that the initial model is the SRGAN network, which is a generative confrontation network for super-resolution reconstruction. The SRGAN network mainly includes two parts, the generator and the discriminator. The generator is mainly used to convert the input image into a high-definition image, and the discriminator is mainly used to judge whether the generated high-definition image is true or false. The generated high-definition image and the reference image are about to be similarly calculated.

Further, step S504 is executed, and the low-resolution sample foreground image can be converted into a higher-resolution sample intermediate foreground image through the generating function in the generating network, wherein the generating function of the generating network can be expressed as shown in formula (1):

Among them, G() represents the sample intermediate foreground image, I _HR represents the high-resolution reference foreground image, I _LR represents the low-resolution sample foreground image, I _SR represents other losses, such as perceptual loss, etc., n=1, 2..., n represents each image, and the results of each image are accumulated and then divided by the total number of images.

Further, step S505 is executed to compare the sample intermediate foreground image with the reference foreground image through the discriminant network. In order to make the contrast difference as small as possible, the sample intermediate foreground image can be continuously optimized by calculating the true similarity probability of the sample intermediate foreground image, so that the sample intermediate foreground image is as identical as possible to the reference foreground image. When making a comparison, the difference between the two can be judged by calculating the MSE (mean square error) of the reference foreground image and the intermediate foreground image; the calculation formula is shown in formula (2):

Among them, min means that the model loss of the generative network is the smallest, and max means that the model loss of the discriminant network is the largest; D means the discriminative network, G means the generative network, and D(G(I _LR )) means that the discriminative network judges the authenticity of the sample intermediate foreground image generated by the generative network, and obtains the true similarity value of the sample intermediate foreground image, and continuously optimizes the sample intermediate foreground image according to the similarity probability value.

Finally, step S506 is performed to calculate the model loss of the initial model according to the similarity probability value, that is, the loss value, and then use the gradient descent method to backpropagate the loss value, feed the loss value back to the initial model, modify the model parameters of the initial model, and repeat the above process until the loss value meets the preset iteration condition, wherein the preset iteration condition is that the number of iterations can reach the preset value, or the variance of the loss function change is smaller than the preset threshold. When the loss value satisfies the preset iteration condition, the backpropagation can be stopped, and the final model parameter can be used as the final model parameter, and the update of the initial model can be stopped to obtain the image reconstruction model.

Through the above process, an image reconstruction model for performing image reconstruction on a low-resolution image to generate a high-resolution image can be obtained, and the image reconstruction model can achieve the purpose of improving image quality by means of super-resolution reconstruction.

Referring to FIG. 6, in some embodiments, step S104 also includes but is not limited to steps S601 to S602:

Step S601, performing super-resolution reconstruction processing on the target foreground image through the generation network of the image reconstruction model to obtain an intermediate foreground image;

In step S602, the intermediate foreground image is optimized through the discrimination network of the image reconstruction model and the preset reference foreground image to obtain a standard foreground image.

Specifically, in step S601, the low-resolution target foreground image can be converted into a higher-resolution intermediate foreground image through the generating function in the generating network, where the generating function of the generating network can be expressed as shown in formula (3):

Among them, G() represents the intermediate foreground image, I _HR represents the high-resolution reference foreground image, I _LR represents the low-resolution target foreground image, I _SR represents other losses, such as perceptual loss, etc., n=1, 2..., n represents each image, and the results of each image are accumulated and then divided by the total number of images.

In step S602 of some embodiments, the intermediate foreground image is compared with the reference foreground image through the discrimination network. In order to make the contrast difference as small as possible, the intermediate foreground image can be continuously optimized by calculating the probability that the intermediate foreground image is true, so that the intermediate foreground image is as identical as the reference foreground image as possible. When making a comparison, the difference between the two can be judged by calculating the MSE (mean square error) of the reference foreground image and the intermediate foreground image; the calculation formula is shown in formula (4):

Among them, min means that the model loss of the generative network is the smallest, and max means that the model loss of the discriminant network is the largest; D means the discriminative network, G means the generative network, and D(G(I _LR )) means that the discriminative network judges whether the intermediate foreground image generated by the generating network is true or not, and obtains the similarity probability value of the intermediate foreground image as true, and continuously optimizes the intermediate foreground image according to the similarity probability value until the similarity probability value is greater than or equal to the preset similarity probability threshold, and outputs the standard foreground image, and the similarity between the output standard foreground image and the reference foreground image can meet the requirements.

Through the above method, super-resolution reconstruction can be conveniently performed on the target foreground image, so that the target foreground image has a higher resolution and image quality can be improved.

Referring to FIG. 7, in some embodiments, step S105 may also include but not limited to include steps S701 to S703:

Step S701, performing feature extraction on the standard foreground image to obtain the foreground feature value, and performing feature extraction on the background image to obtain the background feature value;

Step S702, performing XOR calculation on the preset channel bitmap according to the foreground feature value and the background feature value to obtain the target channel bitmap;

Step S703, performing image fusion on the standard foreground image and the background image according to the target channel bitmap to obtain the target image.

Specifically, step S701 is first performed to perform feature extraction on the standard foreground image and the background image through the sigmoid function, transform the foreground feature value of the standard foreground image to between 0 and 1, and transform the background feature value of the background image to between 0 and 1. Wherein, consistent with the representation of the aforementioned foreground predicted value, here the foreground feature value of the pixel on the standard foreground map is represented as 1, and the background feature value of the pixel point on the background map is represented as 0.

Further, step S702 is executed to construct an alpha channel bitmap in advance, and the size of the channel bitmap is the same as that of the original image, that is, the height, width and number of channels of the channel bitmap are the same as the original image. According to the foreground feature value and the background feature value, XOR calculation of 0 and 1 is performed on the alpha channel bitmap, that is, a value of 0 or 1 is marked on the position corresponding to each pixel point on the alpha channel bitmap, and the value is used to indicate whether the pixel point at this position is displayed, and the target channel bitmap can be obtained through this process.

Finally, step S703 is executed, when performing image fusion on the standard foreground image and the background image, according to the mark on the target channel bitmap, it can be conveniently determined whether to display the pixels of the standard foreground image or the pixels of the background image on the new image, and finally obtain the target image.

In the embodiment of the present application, the original image to be processed is acquired; the original image is preliminarily matted through the backbone network of the pre-trained matting model to obtain the initial foreground image; the edge area of the initial foreground image is locally refined through the fine-tuning network of the matting model to obtain the target foreground image. In this way, the foreground image with better matting effect can be obtained through the matting model. Furthermore, the pre-trained image reconstruction model is used to perform super-resolution reconstruction on the target foreground image to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than that of the target foreground image, and a clearer standard foreground image can be obtained, which strengthens the matting effect from the visual effect. Finally, the standard foreground image and the preset background image are fused to obtain the target image, which makes the target image have a higher resolution, thereby improving the image quality.

Please refer to FIG. 8, the embodiment of the present application also provides an image processing device, which can realize the above image processing method, and the image processing device includes:

An original image acquisition module 801, configured to acquire an original image to be processed;

The preliminary image matting module 802 is used to perform initial image matting processing on the original image through the backbone network of the preset matting model to obtain an initial foreground image;

The local refinement module 803 is used to perform local refinement processing on the edge area of the initial foreground image through the fine-tuning network of the matting model to obtain the target foreground image;

A super-resolution reconstruction module 804, configured to perform super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

The image fusion module 805 is configured to perform image fusion on the standard foreground image and the preset background image to obtain the target image.

The specific implementation manner of the image processing device is basically the same as the specific embodiment of the above image processing method, and will not be repeated here.

The embodiment of the present application also provides an electronic device. The electronic device includes: a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory. When the program is executed by the processor, the above image processing method is implemented. The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.

Please refer to FIG. 9. FIG. 9 illustrates a hardware structure of an electronic device in another embodiment. The electronic device includes:

The processor 901 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), microprocessor, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to implement the technical solutions provided by the embodiments of the present application;

The memory 902 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a static storage device, a dynamic storage device, or a random access memory (RandomAccessMemory, RAM). The memory 902 can store an operating system and other application programs. When the technical solutions provided by the embodiments of this specification are implemented through software or firmware, the relevant program codes are stored in the memory 902, and are invoked by the processor 901 to execute an image processing method, wherein the image processing method includes: obtaining the original image to be processed; performing preliminary matting processing on the original image through the backbone network of the pre-trained matting model to obtain the initial foreground image; Carry out super-resolution reconstruction processing on the target foreground image through the pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image; image fusion is performed on the standard foreground image and the preset background image to obtain the target image;

The input/output interface 903 is used to realize information input and output;

The communication interface 904 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);

bus 905, for transferring information between various components of the device (such as processor 901, memory 902, input/output interface 903 and communication interface 904);

The processor 901 , the memory 902 , the input/output interface 903 and the communication interface 904 are connected to each other within the device through the bus 905 .

An embodiment of the present application also provides a storage medium, which is a computer-readable storage medium for computer-readable storage. The computer-readable storage medium may be non-volatile or volatile. The storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement an image processing method, wherein the image processing method includes: obtaining an original image to be processed; performing preliminary matting processing on the original image through a backbone network of a pre-trained matting model to obtain an initial foreground image; performing local refinement processing on an edge region of the initial foreground image through a fine-tuning network of the matting model to obtain a target foreground image; performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, The resolution of the foreground image is higher than that of the target foreground image; image fusion is performed on the standard foreground image and the preset background image to obtain the target image.

As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are to illustrate the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation to the technical solutions provided by the embodiments of the present application. Those skilled in the art know that with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.

Those skilled in the art can understand that the technical solutions shown in FIGS. 1-7 do not limit the embodiments of the present application, and may include more or fewer steps than those shown in the illustrations, or combine some steps, or different steps.

The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.

Claims

An image processing method, wherein the method includes:

Get the original image to be processed;

Preliminary matting processing is performed on the original image through the backbone network of the pre-trained matting model to obtain an initial foreground image;

performing local refinement processing on the edge area of the initial foreground image through the fine-tuning network of the cutout model to obtain the target foreground image;

performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

Image fusion is performed on the standard foreground image and the preset background image to obtain a target image.
The image processing method according to claim 1, wherein the preliminary matting process is performed on the original image through the backbone network of the preset matting model to obtain an initial foreground image, including:

Carrying out convolution processing on the original image to obtain an original image matrix;

performing feature extraction on the original image matrix to obtain predicted foreground values;

Preliminary image matting processing is performed on the original image according to the predicted foreground value to obtain the initial foreground image.
The image processing method according to claim 1, wherein the fine-tuning network of the matting model performs local refinement processing on the edge region of the initial foreground image to obtain the target foreground image, comprising:

Obtain the edge prediction value of each pixel in the initial foreground image;

Determine the edge pixel points of the initial foreground image according to the magnitude relationship between the edge prediction value and a preset edge prediction threshold;

The edge pixels of the initial foreground image are filtered to obtain the target foreground image.
The image processing method according to claim 1, wherein, performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, comprising:

Performing super-resolution reconstruction processing on the target foreground image through the generation network of the image reconstruction model to obtain an intermediate foreground image;

The intermediate foreground image is optimized through the discrimination network of the image reconstruction model and a preset reference foreground image to obtain a standard foreground image.
The image processing method according to any one of claims 1 to 4, wherein the image fusion of the standard foreground image and the preset background image to obtain the target image includes:

performing feature extraction on the standard foreground image to obtain a foreground feature value, and performing feature extraction on the background image to obtain a background feature value;

performing XOR calculation on a preset channel bitmap according to the foreground feature value and the background feature value to obtain a target channel bitmap;

performing image fusion on the standard foreground image and the background image according to the target channel bitmap to obtain a target image.
The image processing method according to any one of claims 1 to 4, wherein, before performing preliminary matting processing on the original image through the backbone network of the preset matting model to obtain an initial foreground image, the method further includes pre-training the matting model, including:

acquiring a sample image, wherein the resolution of the sample image is lower than that of a preset reference image;

inputting the sample image into the matting model;

performing convolution processing on the sample image through the backbone network to obtain a sample image matrix, and performing feature extraction on the sample image matrix to obtain a sample predicted foreground value;

performing preliminary matting processing on the sample image through the backbone network and the predicted foreground value of the sample to obtain a sample foreground image;

Obtaining the sample edge prediction value of each sample pixel in the sample foreground map through the fine-tuning network;

Determine the number of sample edge pixels according to the size relationship between the sample edge prediction value and a preset edge prediction threshold;

Optimizing the loss function of the image matting model according to the number of edge pixels of the sample, so as to update the image matting model.
The image processing method according to any one of claims 1 to 4, wherein, before performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, the method further includes pre-training the image reconstruction model, including:

acquiring a sample image, wherein the resolution of the sample image is lower than that of a preset reference image;

performing preliminary matting processing and local refinement processing on the sample image to obtain a sample foreground image;

inputting the sample foreground map into the initial model;

performing super-resolution reconstruction processing on the sample foreground image through the generation network of the initial model, and generating a sample intermediate foreground image corresponding to the sample foreground image, and the resolution of the sample intermediate foreground image is higher than that of the sample foreground image;

Performing similarity calculation on the sample intermediate foreground map and the reference sample foreground map through the discriminant network of the initial model to obtain a similar probability value;

Optimizing the loss function of the initial model according to the similarity probability value to update the initial model to obtain the image reconstruction model.
An image processing device, wherein the device includes:

The original image acquisition module is used to acquire the original image to be processed;

The preliminary image matting module is used to perform initial image matting processing on the original image through the backbone network of the preset image matting model to obtain an initial foreground image;

A local refinement module, configured to perform local refinement processing on the edge region of the initial foreground image through the fine-tuning network of the matting model to obtain a target foreground image;

A super-resolution reconstruction module, configured to perform super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

The image fusion module is used to perform image fusion on the standard foreground image and the preset background image to obtain the target image.
An electronic device, wherein the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory, and when the program is executed by the processor, an image processing method is implemented, wherein the image processing method includes:

Get the original image to be processed;

Preliminary matting processing is performed on the original image through the backbone network of the pre-trained matting model to obtain an initial foreground image;

performing local refinement processing on the edge area of the initial foreground image through the fine-tuning network of the cutout model to obtain the target foreground image;

performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

Image fusion is performed on the standard foreground image and the preset background image to obtain a target image.
The electronic device according to claim 9, wherein the preliminary matting process is performed on the original image through the backbone network of the preset matting model to obtain an initial foreground image, comprising:

Carrying out convolution processing on the original image to obtain an original image matrix;

performing feature extraction on the original image matrix to obtain predicted foreground values;

Preliminary image matting processing is performed on the original image according to the predicted foreground value to obtain the initial foreground image.
The electronic device according to claim 9, wherein the fine-tuning network of the matting model performs local refinement processing on the edge area of the initial foreground image to obtain a target foreground image, comprising:

Obtain the edge prediction value of each pixel in the initial foreground image;

Determine the edge pixel points of the initial foreground image according to the magnitude relationship between the edge prediction value and a preset edge prediction threshold;

The edge pixels of the initial foreground image are filtered to obtain the target foreground image.
The electronic device according to claim 9, wherein, performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, comprising:

Performing super-resolution reconstruction processing on the target foreground image through the generation network of the image reconstruction model to obtain an intermediate foreground image;

The intermediate foreground image is optimized through the discrimination network of the image reconstruction model and a preset reference foreground image to obtain a standard foreground image.
The electronic device according to any one of claims 9 to 12, wherein the image fusion of the standard foreground image and the preset background image to obtain the target image includes:

performing feature extraction on the standard foreground image to obtain a foreground feature value, and performing feature extraction on the background image to obtain a background feature value;

performing XOR calculation on a preset channel bitmap according to the foreground feature value and the background feature value to obtain a target channel bitmap;

performing image fusion on the standard foreground image and the background image according to the target channel bitmap to obtain a target image.
The electronic device according to any one of claims 9 to 12, wherein, before performing preliminary matting processing on the original image through the backbone network of the preset matting model to obtain an initial foreground image, the method further includes pre-training the matting model, including:

acquiring a sample image, wherein the resolution of the sample image is lower than that of a preset reference image;

inputting the sample image into the matting model;

performing convolution processing on the sample image through the backbone network to obtain a sample image matrix, and performing feature extraction on the sample image matrix to obtain a sample predicted foreground value;

performing preliminary matting processing on the sample image through the backbone network and the predicted foreground value of the sample to obtain a sample foreground image;

Obtaining the sample edge prediction value of each sample pixel in the sample foreground map through the fine-tuning network;

Determine the number of sample edge pixels according to the size relationship between the sample edge prediction value and a preset edge prediction threshold;

Optimizing the loss function of the image matting model according to the number of edge pixels of the sample, so as to update the image matting model.
A storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, wherein the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement an image processing method, wherein the image processing method includes:

Get the original image to be processed;

Preliminary matting processing is performed on the original image through the backbone network of the pre-trained matting model to obtain an initial foreground image;

performing local refinement processing on the edge area of the initial foreground image through the fine-tuning network of the cutout model to obtain the target foreground image;

performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, wherein the resolution of the standard foreground image is higher than the resolution of the target foreground image;

Image fusion is performed on the standard foreground image and the preset background image to obtain a target image.
The storage medium according to claim 15, wherein the preliminary matting process is performed on the original image through the backbone network of the preset matting model to obtain an initial foreground image, comprising:

Carrying out convolution processing on the original image to obtain an original image matrix;

performing feature extraction on the original image matrix to obtain predicted foreground values;

Preliminary image matting processing is performed on the original image according to the predicted foreground value to obtain the initial foreground image.
The storage medium according to claim 15, wherein the fine-tuning network of the matting model performs local refinement processing on the edge area of the initial foreground image to obtain the target foreground image, comprising:

Obtain the edge prediction value of each pixel in the initial foreground image;

Determine the edge pixel points of the initial foreground image according to the magnitude relationship between the edge prediction value and a preset edge prediction threshold;

The edge pixels of the initial foreground image are filtered to obtain the target foreground image.
The storage medium according to claim 15, wherein, performing super-resolution reconstruction processing on the target foreground image through a pre-trained image reconstruction model to obtain a standard foreground image, comprising:

Performing super-resolution reconstruction processing on the target foreground image through the generation network of the image reconstruction model to obtain an intermediate foreground image;

The intermediate foreground image is optimized through the discrimination network of the image reconstruction model and a preset reference foreground image to obtain a standard foreground image.
The storage medium according to any one of claims 15 to 18, wherein the image fusion of the standard foreground image and the preset background image to obtain the target image includes:

performing feature extraction on the standard foreground image to obtain a foreground feature value, and performing feature extraction on the background image to obtain a background feature value;

performing XOR calculation on a preset channel bitmap according to the foreground feature value and the background feature value to obtain a target channel bitmap;

performing image fusion on the standard foreground image and the background image according to the target channel bitmap to obtain a target image.
The storage medium according to any one of claims 15 to 18, wherein, before performing preliminary matting processing on the original image through the backbone network of the preset matting model to obtain an initial foreground image, the method further includes pre-training the matting model, including:

acquiring a sample image, wherein the resolution of the sample image is lower than that of a preset reference image;

inputting the sample image into the matting model;

performing convolution processing on the sample image through the backbone network to obtain a sample image matrix, and performing feature extraction on the sample image matrix to obtain a sample predicted foreground value;

performing preliminary matting processing on the sample image through the backbone network and the predicted foreground value of the sample to obtain a sample foreground image;

Obtaining the sample edge prediction value of each sample pixel in the sample foreground map through the fine-tuning network;

Determine the number of sample edge pixels according to the size relationship between the sample edge prediction value and a preset edge prediction threshold;

Optimizing the loss function of the image matting model according to the number of edge pixels of the sample, so as to update the image matting model.