CN117437135A

CN117437135A - Image background switching method, device and medium

Info

Publication number: CN117437135A
Application number: CN202311155515.4A
Authority: CN
Inventors: 徐子贤; 刘娈琦; 王迎勋; 王香; 王宇婷; 孙鹏飞; 胡艳羽
Original assignee: Qilu Institute of Technology
Current assignee: Qilu Institute of Technology
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2024-01-23

Abstract

The invention provides a background switching method, a device and a medium, wherein the method comprises the following steps: dividing a first input image, and sequencing to obtain a background range in the image; the background range is used as a condition to be transmitted into a generation model, and a background image is generated; detecting the image saliency of the second input image, extracting a saliency area of an image foreground, and generating a saliency map; extracting a foreground image of the second input image according to the saliency map; and fusing the background image and the foreground image to generate a third image. The method improves the naturalness and the accuracy of the background switching effect, and improves the flexibility and the efficiency of the background switching, thereby meeting the different requirements of users on the image background switching.

Description

Image background switching method, device and medium

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image background switching method, device and medium based on image significance detection and generation model.

Background

Image background switching techniques can help you quickly change the background of an image, thereby achieving various interesting effects. For example, you can put their own image in a beautiful landscape, or use interesting backgrounds on social media to beautify your image, or use new backgrounds to protect their own privacy when meeting online.

In the prior art, a deep learning segmentation network, such as U-Net and Mask R-CNN, is generally used to accurately separate the foreground and the background in the image, so that the background change is more natural, and compared with manual editing or other traditional technologies, the background switching can be realized more quickly and accurately by using the segmentation network, and more creative space is provided. The segmentation network can realize better foreground and background segmentation in many complex scenes by learning a large amount of data with labels. However, the method using the deep learning segmentation network has the problems of low segmentation accuracy, high computation complexity, large training data volume and the like. In terms of segmentation accuracy, although these deep learning methods can achieve a good segmentation effect in many scenes, when processing images with complex textures, color distributions, or large variations in illumination conditions, the foreground and the background may not be accurately segmented. In terms of computational complexity, existing deep learning methods typically require a significant amount of computational resources, including high performance processors and a significant amount of memory. The problem may cause too long processing time in some application scenarios with limited hardware conditions or requiring real-time processing, which affects user experience. In terms of training data requirements, deep learning methods require a significant amount of tagged data to train, and acquiring and tagging such data requires significant human and time investment, not only increasing development costs, but also limiting the versatility and usability of such methods. In addition, if the background picture provided by the prior art contains foreground objects, the prior art cannot be applied correctly, so that distortion and unnatural phenomena occur to the switching result.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image background switching method, an image background switching device and an image background switching medium.

In order to achieve the above object, an aspect of the present invention provides an image background switching method, including:

dividing a first input image, and sequencing to obtain a background range in the image;

the background range is used as a condition to be transmitted into a generation model, and a background image is generated;

detecting the image saliency of the second input image, extracting a saliency area of an image foreground, and generating a saliency map;

extracting a foreground image of the second input image according to the saliency map;

and fusing the background image and the foreground image to generate a third image.

Optionally, the first input image is segmented and then sequenced to obtain a background range in the image, which includes:

dividing a first input image to obtain a plurality of super pixel block nodes;

extracting key features of each super pixel block node, and storing the key features as feature vectors;

generating a foreground query node according to the key features;

ordering all the super pixel block nodes according to a specific dimension of the feature vector by taking the foreground query node as a condition;

and selecting pixels at the edge part of the image as background query nodes to obtain the background range in the image.

Optionally, dividing the first input image into a plurality of super pixel block nodes by adopting a SLIC algorithm,

extracting the average color of each super pixel block node and the position of the node as the key characteristics;

generating a foreground query node according to the key features, including:

determining the closeness between any two super pixel block nodes according to the key characteristics,

wherein t is _i Representing closeness between the super pixel block node i and any other super pixel block node j, c= { C _i I=1, 2, …, n }, representing the average color set of all the super pixel block nodes, p= { P _i I=1, 2, …, n } represents the set of node positions of all the super pixel block nodes, and epsilon represents a very small positive number for avoiding the denominator being 0.

And selecting a super pixel block node with highest closeness as the optimal foreground query node.

Optionally, determining an ordering relationship between any two super pixel block nodes is:

wherein, the weight between any two super pixel nodes is determinedThe meaning is as follows:σ＝max||c _i -c _j ||-min||c _i -c _j i, σ is the standard deviation of the distance and pixel value between the super pixel block nodes in the color space; f (f) _i ，f _j Feature vectors, d, respectively representing the ith and jth superpixel block nodes in the graph _ii ，d _jj Representing the number of edges representing the i and j-th nodes to which they are connected, μ=0.1 for controlling the smoothness constraint, y _i Indicating whether the super pixel block node is a seed point, if so, y _i =1 otherwise y _i ＝0。

Alternatively, a CGAN conditional generation countermeasure network is employed as the generation model.

Optionally, performing image saliency detection on the second input image by using a saliency detection model based on deep learning, wherein the saliency detection model comprises an encoder and a decoder;

when the second input image is an image containing a significant object, performing decoding operation on the first feature image obtained by the encoder through the decoder, and predicting to generate the significant image;

when the second input image is an image which does not contain a significant object, calculating a second feature image obtained by the encoder and a background pixel feature image with the same size as the second feature image, determining BCE loss between the second feature image and the background pixel feature image, closing a relevant background channel in the encoder, and calculating the loss as follows:

wherein n represents the number of pictures in the training dataset, wherein F _mt Representing background pixel feature map, F _l Representing the coding feature map.

Optionally, the encoder comprises a triple remodelling structure for reconstructing the signature and transmitting to the next encoding layer.

Optionally, before the third image is generated, post-processing is performed on the image fused by the background image and the foreground image, so as to remove unnatural edges and adjust color tone.

The invention also provides an image background switching device, which adopts the image background switching method and comprises the following steps:

the background extraction module is used for sorting the segmented first input image to obtain a background range in the image;

the background generation module is used for transmitting the background range as a condition into a generation model to generate a background image;

the saliency detection module is used for detecting the image saliency of the second input image, extracting a saliency area of the image foreground and generating a saliency map;

the foreground generation module is used for extracting a foreground image of the second input image according to the saliency map;

and the fusion module is used for fusing the background image and the foreground image to generate a third image.

The invention also provides a readable storage medium, wherein the readable storage medium stores a program or an instruction, and the program or the instruction realizes the steps of the image background switching method when being executed by a processor, and can achieve the same technical effect.

The advantages of the invention are as follows:

the image background switching method provided by the invention adopts a mode of combining image significance detection and model generation, and specifically obtains the background range in the image by dividing and sequencing a first input image; the background range is used as a condition to be transmitted into a generating model, and a background image is generated; meanwhile, performing image saliency detection on the second input image, extracting a saliency area of an image foreground, and generating a saliency map; extracting a foreground image of the second input image according to the saliency map; and finally, fusing the background image and the foreground image to generate a third image. The method realizes high-efficiency and automatic image background switching. Meanwhile, the generated model can generate a background image similar to the foreground salient region according to the characteristics of the foreground salient region, so that the switched image is more natural, and the attractive degree and the sense of reality are higher.

Drawings

Fig. 1 is a schematic overall flow chart of an image background switching method provided by the invention;

fig. 2 shows an effect schematic of step S1;

fig. 3 shows an effect schematic of generating a background image by steps S1, S2;

fig. 4 shows a schematic structural diagram of the significance detection model.

FIG. 5 shows three-layer remodeling results obtained by an encoder;

FIG. 6 shows a comparison of the effect of comparing the background extraction method provided in steps S1-S2 with prior art methods;

FIG. 7 shows an effect diagram of a foreground image generated by the saliency detection of the present invention;

fig. 8 shows a background switching image obtained by fusing the background image of fig. 6 with the foreground image of fig. 7;

FIG. 9 is a block diagram of an image background switching device;

wherein:

300-an image background switching device;

301-a background extraction module;

302-a background generation module;

303-a significance detection module;

304-a foreground generation module;

305-a fusion module;

S1-S5: and (3) step (c).

Detailed Description

In order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.

As described above, in the prior art, the method of using the deep learning segmentation network has the problems of low segmentation accuracy, high computation complexity, large training data volume, and the like. In contrast, the invention provides an image background switching method based on image saliency detection and a generated model, which can better identify main objects and backgrounds in images by utilizing an image saliency detection technology in the aspect of improving segmentation precision, and improves the segmentation precision of the foreground and the background under complex conditions by improving segmentation results; in the aspect of reducing the computational complexity, the invention designs a new algorithm, and more accurate feature distribution results are obtained through clustering and analysis of foreground and background features; in the aspect of reducing the requirement of training data, the invention can realize the training of the detection framework under the condition of not depending on a large amount of labeled training data. In addition, in the aspect of solving the limitation of background pictures, the invention generates model prediction and generates background pixels of any input image, so that a picture containing a foreground object can be used as a new background, the application range of background switching is expanded, and the breakthrough provides more convenient application for users. The image background switching method based on the image saliency detection and generation model provided by the invention will be described in detail below.

As shown in fig. 1, fig. 1 shows an overall flowchart of the image background switching method.

An image background switching method at least comprises the following steps:

s1, dividing and sequencing a first input image to obtain a background range in the image.

In this embodiment, first, the SLIC algorithm is used to divide the first input image into a plurality of super-pixel block nodes, where the super-pixel block nodes are a set of pixels with similar colors and textures in the first input image.

In extracting the background, each data point has various features associated with it, such as color, texture, speed of movement, etc. Some of these features are more critical to the identification of the context, and these associated features can be better extracted by modifying the dynamic parameters. The present embodiment extracts the average color and sum of each super pixel block node by extracting the key feature of each super pixel block node and storing the key feature as a feature vectorThe position of the node is used as a key feature, and C= { C _i I=1, 2, …, n } represents the average color set of all super pixel block nodes, p= { P _i I=1, 2, …, n } represents the set of node positions for all super pixel block nodes.

Then, after the feature vector is determined, a popular ordering algorithm is further performed on the feature vector, namely, all super-pixel block nodes are ordered according to a specific dimension of the feature vector. The existing popular sorting algorithm takes super pixel block nodes of the boundary part as background query nodes to screen background areas, so that the salient areas are obtained through counter selection. In this embodiment, an existing popular ranking algorithm is improved, a foreground query node is added, specifically, for the foreground query node, the closeness between any two super pixel block nodes is determined according to key features, a super pixel block node with the highest closeness is selected as an optimal foreground query node, and the closeness is expressed as:

wherein t is _i Representing closeness between the super pixel block node i and any other super pixel block node j, epsilon represents a very small positive number, and in practice epsilon=0.01 is preferable to avoid denominator being 0.

Further, after the foreground query node is determined, the foreground query node is used as a final sparse graph determination condition, and specifically, a dynamic parameter is adopted to calculate the ordering relation between any two super pixel block nodes, namely, the more accurate background range is obtained through ordering the optimal seed points.

The ordering relationship between any two super pixel block nodes is expressed as:

the weight between any two super pixel nodes is defined as:σ＝max||c _i -c _j ||-min||c _i -c _j i, σ is the standard deviation of the distance and pixel value between the super pixel block nodes in the color space; f (f) _i ，f _j Feature vectors, d, respectively representing the ith and jth superpixel block nodes in the graph _ii ，d _jj Representing the number of edges representing the i and j-th nodes to which they are connected, μ=0.1 for controlling the smoothness constraint, y _i Indicating whether the super pixel block node is a seed point, if so, y _i =1 otherwise y _i ＝0。

And finally, selecting pixels at the edge part of the image as background query nodes to obtain a background range in the image. As shown in fig. 2, fig. 2 shows an effect schematic of this step S1.

S2, taking the background range as a condition and transmitting the condition into a generation model to generate a background image.

In a specific implementation, after determining the background range, the background range feature pixels are used as conditions to be transmitted into a generation model, and a complete background image is generated through prediction. Specifically, the embodiment adopts CGAN conditional generation countermeasure network as the generation model. As shown in fig. 3, fig. 3 shows an effect diagram of generating a background image by steps S1, S2.

S3, performing image saliency detection on the second input image, extracting a saliency area of an image foreground, and generating a saliency map.

For the foreground part, the embodiment adopts a significance detection model based on deep learning to carry out image significance detection on the second input image so as to extract a significance region of the image foreground, generate a significance map and further extract a foreground image by using the significance map.

In this embodiment, a significance detection model based on deep learning is adopted, and the model framework is composed of an encoder and a decoder, wherein the encoder is used for compressing input image information into a low-dimensional vector, extracting feature information of an image, and is usually implemented by a convolutional neural network. The decoder functions to decode the low-dimensional vector output from the encoder into a saliency map of the same size as the input image. The decoder typically performs a step-wise amplification using a deconvolution operation, while feature fusion is performed with the corresponding convolutional layer of the encoder, restores the low-dimensional vector to the size of the original image, and outputs a saliency map.

In a specific implementation, after the existing saliency detection method detects the target, supervised learning is performed through manually labeled pixel-by-pixel labels, wherein the salient object is labeled as '1', the background pixel is labeled as '0', and the two labels are trained simultaneously. In this embodiment, in the saliency detection process, the foreground and the background are respectively trained, the saliency detection module is divided into two sub-modules, and the foreground and the background are respectively trained, as shown in fig. 4, and fig. 4 shows a schematic structural diagram of the saliency detection model. The method comprises the following steps:

when the second input image is an image containing a significant object, the first feature image obtained by the encoder is decoded by the decoder, the significant image is predicted and generated, and a rough significant detection result is generated by the semantic information of depth coding and the residual edge information, and the partial result is supervised in a subsequent reconstruction module and regularization coefficient. As shown in the upper graph of fig. 4, after the image x containing the salient object has passed through the tertiary remodelling structure of the encoder, a first feature map F is generated _x Then, the decoding operation is directly performed by a decoder to generate a saliency map M.

When the second input image is an image without significant objects, calculating a second feature image obtained by the encoder and a background pixel feature image with the same size as the second feature image, determining BCE loss between the second feature image and the background pixel feature image, closing related background channels in the encoder, and generating a second feature image F after the image xl without significant objects passes through a three-time remodelling structure of the encoder as shown in the lower part of the diagram in fig. 4 _l Calculate a second feature map F _l Background pixel feature map F of the same size as the second feature map _mt The loss between the two is that the background pixel characteristic diagram is the full '0' characteristic diagram, and the loss calculation is as followsThe following steps:

In this embodiment, the encoder includes a triple remodelling structure, which is used to reconstruct a feature map and transmit the feature map to a next coding layer, and the remodelling results of the three layers are subjected to a visualization process to obtain three feature maps E1, E2 and E3, as shown in fig. 5, it can be found through observation of the feature maps that the significance detection method provided by the invention has a very strong capability in closing a background channel, and successfully saves edge information of an image on the premise of not affecting semantic information of deep coding.

S4, extracting a foreground image of the second input image according to the saliency map.

S5, fusing the background image and the foreground image to generate a third image.

In this embodiment, after the background image is obtained in the steps S1-S2 and the foreground image is obtained in the steps S3-S4, the background image and the foreground image are further fused, and the fused image is post-processed to remove unnatural edges and adjust color tone, so that the processed image is more natural, and a final third image after background switching is generated. The third image has the same background image as the first input image and the same foreground image as the second input image, and the third image is a new image obtained by fusing the foreground image in the second input image with the background image in the first input image, namely, the background switching is realized.

In summary, in this embodiment, when extracting the background, the existing popular ranking algorithm is improved, and by ranking the optimal seed points, a more accurate background range is obtained. As shown in fig. 6, fig. 6 is a graph comparing the effect of the background extraction method provided in steps S1-S2 of the present invention with that of the prior art, and the first column is an input image, the second column is a background image extracted by other algorithm, and the third column is a background image extracted by the algorithm of the present invention. As is apparent from fig. 6, when the background image is extracted by the improved popular ranking algorithm of the present invention, noise in the image is better processed, complex textures and patterns are accurately identified, which not only locates the boundary between the background and the foreground more accurately, but also ensures continuity and consistency of the background, which makes the extracted background look more natural and the improvement is obvious.

In addition, the invention improves the existing saliency detection method when foreground is extracted, and trains the foreground and the background in the image respectively, which not only provides more accurate boundary definition, but also can better identify objects similar to the background in color, texture or shape but very remarkable in reality. As shown in fig. 7, fig. 7 shows an effect diagram of a foreground image to be generated by the saliency detection of the present invention, the first column, seen from left to right, shows the original input image, which is a pure image without any processing, which provides a reference for comparison and analysis later. These input images may contain various background elements and objects, some of which may be more visually noticeable. The second column is the results produced by the significance detection model proposed by the present invention. The image at this stage shows how the model identifies and highlights those objects or regions that are most prominent in the original image. The results of these saliency tests may represent salient regions in the image with different colors or intensities, enabling the viewer to see explicitly which parts are considered salient by the model. The third column of images is further processed based on the significance detection result of the second column, and the extracted foreground object is displayed. In this step, the background elements are removed or blurred, leaving only those objects identified as significant. This extraction will generally ensure that the foreground object is well-defined and morphologically intact, allowing it to be clearly displayed in any new background or application.

Fig. 8 shows an image obtained by fusing the background image extracted from the third column in fig. 6 with the foreground image extracted from the third column in fig. 7, wherein the fused image is very natural, the transition between the foreground and the background is smooth, no obvious seam or dissonance exists, the background switching work is successfully completed, and a high-quality image with strong sense of reality is provided for the user.

Therefore, the image background switching method provided by the invention adopts a mode of combining image significance detection and generation model, and realizes high-efficiency and automatic image background switching. Meanwhile, the generated model can generate a background image similar to the foreground salient region according to the characteristics of the foreground salient region, so that the switched image is more natural, and the attractive degree and the sense of reality are higher.

In addition, the above embodiment of the present invention may be applied to a terminal device having an image background switching method function, where the terminal device may include a personal terminal, an upper computer terminal, and the like, and the embodiment of the present invention is not limited thereto. The terminal can support Windows, android (android), IOS, windowsPhone and other operating systems.

Referring to fig. 9, fig. 9 shows an image background switching device 300, which is applicable to a personal terminal and a host terminal device, and can implement the image background switching method shown in fig. 1.

The image background switching device 300, adopting the image background switching method provided in the above embodiment, at least includes:

the background extraction module 301 is configured to segment the first input image and sequence the segmented first input image to obtain a background range in the image;

the background generation module 302 is configured to transmit a background range as a condition into the generation model, and generate a background image;

the saliency detection module 303 is configured to perform image saliency detection on the second input image, extract a saliency region of an image foreground, and generate a saliency map;

a foreground generation module 304, configured to extract a foreground image of the second input image according to the saliency map;

and a fusion module 305, configured to fuse the background image with the foreground image, and generate a third image.

Further, it should be understood that in the image background switching apparatus 300 according to the embodiment of the present application, only the above-described division of each functional module is illustrated, and in practical application, the above-described allocation of functions may be performed by different functional modules as needed, that is, the image background switching apparatus 300 may be divided into functional modules different from the above-illustrated modules to perform all or part of the above-described functions.

In addition, the embodiment of the application also provides an electronic device, which comprises a processor, a memory, and a program or an instruction stored in the memory and capable of running on the processor, wherein the program or the instruction realizes the steps of the image background switching method when being executed by the processor, and the same technical effects can be achieved.

The processor is a processor in the electronic device in the above embodiment. Readable storage media include computer readable storage media such as Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disks, and the like.

In addition, the embodiment of the application further provides a readable storage medium, and the readable storage medium stores a program or an instruction, which when executed by a processor, implements the steps of the image background switching method, and can achieve the same technical effects.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be applied, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. An image background switching method, comprising:

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the first input image is segmented and then sequenced to obtain a background range in the image, wherein the background range comprises:

dividing a first input image to obtain a plurality of super pixel block nodes;

generating a foreground query node according to the key features;

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

dividing the first input image into a plurality of super pixel block nodes by adopting a SLIC algorithm,

generating a foreground query node according to the key features, including:

wherein t is _i Representing closeness between the super pixel block node i and any other super pixel block node j, c= { C _i I=1, 2, …, n }, representing the average color set of all the super pixel block nodes, p= { P _i I=1, 2, …, n } represents the set of node positions of all the super pixel block nodes, and epsilon represents a very small positive number.

4. The method of claim 3, wherein the step of,

determining the ordering relation between any two super pixel block nodes as follows:

5. The method of claim 1, wherein a CGAN condition generation countermeasure network is employed as the generation model.

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

performing image saliency detection on the second input image by adopting a saliency detection model based on deep learning, wherein the saliency detection model comprises an encoder and a decoder;

7. The method of claim 6, wherein the step of providing the first layer comprises,

the encoder contains a triple remodelling structure for reconstructing the signature and transmitting to the next encoding layer.

8. The method of claim 1, wherein the step of determining the position of the substrate comprises,

and before the third image is generated, carrying out post-processing on the image fused by the background image and the foreground image, removing unnatural edges and adjusting the tone.

9. An image background switching apparatus, characterized by adopting the image background switching method according to any one of claims 1 to 8, comprising:

10. A readable storage medium having stored thereon a program or instructions, which when executed by a processor, realizes the steps of the image background switching method of any of claims 1-8.