CN111292337B

CN111292337B - Image background replacement method, device, equipment and storage medium

Info

Publication number: CN111292337B
Application number: CN202010071578.1A
Authority: CN
Inventors: 何帅; 叶海佳; 王文斓
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2024-03-01
Anticipated expiration: 2040-01-21
Also published as: CN111292337A

Abstract

The embodiment of the invention discloses an image background replacement method, an image background replacement device, image background replacement equipment and a storage medium. The method comprises the following steps: acquiring a current video frame, and selecting a target portrait area in the current video frame; acquiring an initial mask corresponding to a target portrait area of the current video frame; performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask; and replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame. According to the technical scheme, the calculated amount of the portrait segmentation of the real-time video is reduced, the portrait segmentation precision is improved, and the portrait background replacement of the real-time video is realized.

Description

Image background replacement method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to an image background replacement method, an image background replacement device, image background replacement equipment and a storage medium.

Background

Currently, in order to replace the portrait background in an image, it is necessary to accurately separate the portrait instance from the original background of the image. The existing separation methods mainly comprise two methods: human image semantic segmentation and human image foreground matting.

Portrait semantic segmentation understands images from the semantic hierarchy, separates partial pixels classified as people in semantic classes from pixels classified as background parts, obtains a mask of 0 and 1, and separates a Portrait partial region from the background through the mask. However, the method is based on graph theory algorithm, the resolution ratio is increased to cause the calculated amount to be exponentially increased, and the optimization function naturally tends to be low-level in characteristics, so that the accuracy of human image segmentation is low, and the method is not suitable for carrying out human image background replacement on real-time video.

The method can improve the image segmentation precision to the pixel level relative to the image semantic segmentation, but the calculated amount is correspondingly more than that of the image semantic segmentation technology, and the method is not suitable for carrying out image background replacement on the real-time video.

Disclosure of Invention

The embodiment of the invention provides an image background replacement method, an image background replacement device, image background replacement equipment and a storage medium, which are used for reducing the calculation amount of human image segmentation on a real-time video, improving the human image segmentation precision and realizing human image background replacement on the real-time video.

In a first aspect, an embodiment of the present invention provides an image background replacing method, including:

acquiring a current video frame, and selecting a target portrait area in the current video frame;

acquiring an initial mask corresponding to a target portrait area of a current video frame;

performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask;

and replacing the background of the current video frame with the new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.

In a second aspect, an embodiment of the present invention further provides an image background replacing apparatus, including:

the region selection module is used for acquiring a current video frame and selecting a target portrait region in the current video frame;

the mask acquisition module is used for acquiring an initial mask corresponding to a target portrait area of the current video frame through a portrait segmentation model;

the mask processing module is used for carrying out segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask;

and the background replacing module is used for replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.

In a third aspect, an embodiment of the present invention further provides an apparatus, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image background replacement method provided by any embodiment of the present invention.

In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image background replacement method provided by any of the embodiments of the present invention.

According to the technical scheme, the current video frame is obtained, and the target portrait area is selected from the current video frame; acquiring an initial mask corresponding to a target portrait area of a current video frame; performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask; the background of the current video frame is replaced by the new background according to the target mask, and a synthesized frame corresponding to the current video frame is generated, so that the problems that in the prior art, the accuracy of the portrait segmentation is low, the calculated amount is large, the portrait background replacement of the real-time video is not suitable, the calculated amount of the portrait segmentation of the real-time video is reduced, the portrait segmentation precision is improved, and the portrait background replacement of the real-time video is realized.

Drawings

FIG. 1a is a flow chart of an image background replacement method in accordance with a first embodiment of the present invention;

FIG. 1b is a schematic diagram of a convolutional neural network topology according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram of an image background replacing device in a second embodiment of the present invention;

fig. 3 is a schematic structural view of an apparatus according to a third embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1a is a flowchart of an image background replacement method according to a first embodiment of the present invention, where the present embodiment is applicable to the case of performing a portrait background replacement on a real-time video, the method may be performed by an image background replacement device, which may be implemented by hardware and/or software, and may be generally integrated in a device that provides an image background replacement service. As shown in fig. 1a, the method comprises:

step 110, obtaining a current video frame, and selecting a target portrait area in the current video frame.

The embodiment is applied to a live video scene, wherein a current video frame refers to a data frame corresponding to a current moment in a live video, and the current video frame comprises a portrait of a host. In order to replace a new background for the current video frame, the anchor portrait needs to be segmented from the current video frame and then fused with the new background to realize the replacement of the image background. Considering that in a given 1080p video frame, a character instance is often only in a small block area with an intermediate frame, and the motion amplitude of a character between adjacent video frames is limited, in order to reduce the calculation amount of image segmentation, a target image area including a complete image can be selected in the current video frame, and the image is segmented by performing image processing on the target image area, without performing image processing on the whole video frame.

Optionally, obtaining the current video frame and selecting the target portrait area in the current video frame may include: acquiring a historical target mask of a historical video frame adjacent to the current video frame, and determining vertex coordinates of a minimum circumscribed rectangle of a completely covered portrait according to coordinate values of pixel points corresponding to pixel values with a value of 1 in the historical target mask; determining a minimum circumscribed rectangle according to the vertex coordinates, and taking the minimum circumscribed rectangle as a portrait area of a historical video frame; and estimating a target portrait area of the current video frame according to the portrait area of the historical video frame.

In this embodiment, a tracking manner of a region of interest (Region of Interest, ROI) is adopted, and a target portrait region in a current video frame is estimated according to a portrait region of a historical video frame adjacent to the current video frame, so that a subsequent algorithm only performs image processing on the target portrait region.

Alternatively, the process of acquiring the target portrait area may be as follows: and determining each target pixel point corresponding to the portrait in the historical video frame according to the pixel value with the value of 1 in the historical target mask of the historical video frame, selecting the maximum value x1 and the minimum value x2 of the horizontal coordinate and the maximum value y1 and the minimum value y2 of the vertical coordinate from the coordinate values of the target pixel points, determining the vertex coordinates of the target portrait region to be (x 1, y 1), (x 2, y 1), (x 1, y 2) and (x 2, y 2) respectively, wherein the rectangle corresponding to the vertex is the portrait region of the historical video frame, and then enabling the portrait region of the historical video frame to be directly used as the target portrait region according to the estimated portrait motion trend, or carrying out proper movement or adjustment on the portrait region of the historical video frame to be used as the target portrait region.

Optionally, after estimating the target portrait area of the current video frame according to the portrait area of the historical video frame, the method may further include: and carrying out smoothing treatment on the vertex coordinates of the target portrait area according to a second-order exponential smoothing algorithm to obtain a smoothed target portrait area.

In this embodiment, in order to stabilize the frame of the target image area, prevent the influence of abrupt change and severe jitter on the subsequent algorithm, and trade-off between smoothness and sensitivity of image segmentation, it is necessary to use a second-order exponential smoothing algorithm to smooth the vertex coordinates of the target image area. Let x be a certain vertex coordinate of the target portrait area at time t _t The vertex coordinate after the smoothing is T _t The second order exponential smoothing formula is:

in this embodiment, the target image region obtained by the ROI tracking technology can reduce the image region for image segmentation, accelerate the image segmentation speed, and improve the segmentation efficiency, and simultaneously, the second-order exponential smoothing algorithm is adopted to perform secondary smoothing on the target image region, so that the image segmentation is more accurate.

Step 120, obtaining an initial mask corresponding to the target portrait area of the current video frame.

In this embodiment, the mask is a binary image composed of 0 and 1, wherein a pixel value with a value of 1 corresponds to the target portrait area, a pixel value with a value of 0 corresponds to the background area, and the portrait can be separated from the current video frame by multiplying the mask with the target portrait area.

Optionally, acquiring the initial mask corresponding to the target portrait area of the current video frame may include: and performing separation convolution processing, channel mapping compression processing and interpolation convolution processing on the target portrait region of the current video frame through a convolution neural network to obtain an initial mask corresponding to the target portrait region.

As shown in fig. 1b, the present embodiment obtains a continuous value initial mask for portrait segmentation by inputting a target portrait area of a current video frame to a convolutional neural network. In order to balance the precision and the calculated amount of human image segmentation, the conventional convolutional neural network is improved by methods of separating convolution, channel mapping compression, interpolation convolution and the like, so that the situation that the difference of human image segmentation results in adjacent video frames is large and the visual sense of a user is poor is avoided.

In this embodiment, the split convolution is used to reduce the dimension of the multidimensional convolution. For example, for three dimensions of length, width and responsibility control channel number, the calculation amount can be reduced by convolving the length, width and then convolving the color space channel number. And the channel mapping compression is used for compressing the channel to be convolved, convolving the compressed channel, decompressing the convolved channel and reducing the calculated amount by reducing the number of the channel to be convolved. And the interpolation convolution is used for improving the image segmentation accuracy. As shown in fig. 1b, in the 4-layer separation convolution of the convolutional neural network, the resolution of the image of the 4-layer separation convolution is the lowest, only the fuzzy outline of the person can be obtained, the details of the person can not be basically obtained, at this time, the details of the person obtained by up-sampling the 2-layer separation convolution can be fused into the up-sampling result of the 4-layer separation convolution, and then the image matting with higher precision can be obtained.

In the embodiment, the improved convolutional neural network has less calculated amount and model parameter amount for image segmentation, less resource consumption and can be deployed in a PC host for broadcasting in real time. On the other hand, the convolutional neural network has better robustness on human image segmentation, and can cope with uncertain interference caused by illumination change, lens movement and other factors, so as to obtain more stable composition output. In the last aspect, the self-adaptive matting of the convolutional neural network can replace the green curtain composition to a certain extent, and the limited degree of the use scene for realizing the portrait segmentation is reduced.

Of course, the present embodiment is not limited to the use of convolutional neural networks for image segmentation, and other image segmentation models with required calculation amounts and accuracy are also applicable to the present embodiment.

And 130, performing segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask.

In this embodiment, although the image segmentation accuracy of the convolutional neural network is already high, the following problems still remain: firstly, although the overall accuracy of the segmentation result is high, local samples still have flaws, such as free misclassification areas, and secondly, the segmentation result only considers the current video frame and has the problem of inter-frame jitter. The present embodiment solves the above-described problem by performing a division optimization process and an inter-frame smoothing process on the initial mask.

Optionally, the performing a segmentation optimization process on the initial mask may include: acquiring at least two connected domains according to each pixel value in the initial mask; determining a connected domain with the largest area of the at least two connected domains as a target connected domain, or determining a connected domain with the area larger than a threshold value as a target connected domain; and maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to the set target value.

The present embodiment removes free dirty blocks by connected domain analysis. Specifically, the pixel points corresponding to the pixel values with the same value in the initial mask can be connected into at least two connected domains, and since the area of the connected domain of the character is the largest, according to the specific service scene, only the connected domain with the largest area or the connected domain with the area larger than the threshold value can be reserved, other connected domains are deleted and mapped into the initial mask, namely, the pixel value corresponding to the reserved connected domain is kept unchanged, and the pixel value corresponding to the deleted connected domain is updated to the set target value. The target value may be 0 or another positive number close to 0, for example, 0.001.

Optionally, performing the inter-frame smoothing on the initial mask may include: performing differential operation on the current video frame and the adjacent historical video frames, and determining a motion area in the current video frame; multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in the historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the second value to each pixel value of the first mask; multiplying each pixel value corresponding to the non-motion area in the initial mask by a second value to be used as a second mask; multiplying each pixel value corresponding to the non-motion area in the historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of the second mask; wherein the first value is greater than the second value.

In this embodiment, since the output of the current video frame in the convolutional neural network is unstable, inter-frame jitter problem, especially edge jitter problem, may occur, and therefore when obtaining the mask corresponding to the current video frame, the initial mask needs to be processed according to the historical video frame, so that the finally obtained target mask is smooth over the whole historical sequence.

Optionally, the convolution neural network output results among the multiple frames can be guided to be fused according to the motion area obtained by the traditional motion detection algorithm, so as to obtain the target mask. The background tends to be constant in video frames, considering that only people in the video frames are moving. Therefore, the difference operation can be performed between the current video frame and the adjacent historical video frames, and most of the obtained difference values have 1 pixel values corresponding to the region where the person is located, so that the region corresponding to the 1 pixel values in the mask is used as the motion region.

In this embodiment, the first value may be 0.9, and the second value may be 0.1, and since the character is moving, the target mask corresponding to the moving area is more dependent on the current video frame, and the target mask corresponding to the non-moving area is more dependent on the historical video frame. That is, the original mask corresponding to the motion region where the person is located is replaced with the original mask in which the pixel value corresponding to the motion region is multiplied by 0.9+ respectively, and the pixel value corresponding to the motion region is multiplied by 0.1 respectively, and the original mask corresponding to the non-motion region, that is, the background region, is replaced with the original mask in which the pixel value corresponding to the non-motion region is multiplied by 0.1+ respectively, and the pixel value corresponding to the non-motion region is multiplied by 0.9 respectively, so as to obtain the target mask corresponding to the current video frame.

In fact, when the initial mask is subjected to inter-frame smoothing, all the historical video frames before the current video frame can be considered, at this time, the closer the historical video frame is to the current video frame, the greater the specific gravity of the historical target mask in the target mask of the current video frame is, the farther the historical target mask is from the current video frame, and the smaller the specific gravity of the historical target mask in the target mask of the current video frame is.

And 140, replacing the background of the current video frame with the new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.

Optionally, replacing the background in the current video frame with the new background according to the target mask, and generating the composite frame corresponding to the current video frame may include: i+ (1- α) B according to the formula i=α + (1- α) _new Replacing the background in the current video frame with a new background to generate a synthesized frame corresponding to the current video frame; wherein alpha is a target mask, I is a current video frame, B _new For a video frame that includes a newly placed background, i is a composite frame corresponding to the current video frame.

In this embodiment, the object mask subjected to the segmentation optimization and the inter-frame smoothing is a continuous value mask with a pixel value of 0 to 1, wherein the pixel value corresponding to the portrait is 1, the pixel value corresponding to the background is 0, and the pixels between 0 and 1The values are most present in the edge area, such as a person hair or other transparent material. I+ (1- α) B according to the formula i=α + (1- α) _new When background replacement is carried out, a portrait in the current video frame is obtained by using alpha-I, and (1-alpha) B is used _new And obtaining new background in other video frames, and combining the new background and the new background to obtain a replaced combined frame.

Example two

Fig. 2 is a schematic structural diagram of an image background replacing device in a second embodiment of the present invention, and the embodiment is applicable to a situation of performing portrait background replacement on a real-time video. As shown in fig. 2, the image background replacing apparatus includes:

the region selection module 210 is configured to obtain a current video frame, and select a target portrait region in the current video frame;

the mask obtaining module 220 is configured to obtain, through the portrait segmentation model, an initial mask corresponding to a target portrait area of the current video frame;

the mask processing module 230 is configured to perform segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask;

the background replacing module 240 is configured to replace the background of the current video frame with a new background according to the target mask, and generate a composite frame corresponding to the current video frame.

Optionally, the area selection module 210 is specifically configured to: acquiring a historical target mask of a historical video frame adjacent to the current video frame, and determining vertex coordinates of a minimum circumscribed rectangle of the completely covered portrait according to coordinate values of pixel points corresponding to pixel values with a value of 1 in the historical target mask; determining a minimum circumscribed rectangle according to the vertex coordinates, and taking the minimum circumscribed rectangle as a portrait area of a historical video frame; and estimating a target portrait area of the current video frame according to the portrait area of the historical video frame.

Optionally, the area selection module 210 is further configured to: after estimating a target portrait area of the current video frame according to the portrait area of the historical video frame, carrying out smoothing treatment on vertex coordinates of the target portrait area according to a second-order exponential smoothing algorithm to obtain a smoothed target portrait area.

Optionally, the mask acquiring module 220 is specifically configured to: and performing separation convolution processing, channel mapping compression processing and interpolation convolution processing on the target portrait region of the current video frame through a convolution neural network to obtain an initial mask corresponding to the target portrait region.

Optionally, the mask processing module 230 includes: the first processing module is used for acquiring at least two connected domains according to each pixel value in the initial mask; determining a connected domain with the largest area of the at least two connected domains as a target connected domain, or determining a connected domain with the area larger than a threshold value as a target connected domain; and maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to the set target value.

Optionally, the mask processing module 230 includes: the second processing module is used for performing differential operation on the current video frame and the adjacent historical video frames and determining a motion area in the current video frame; multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in the historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the second value to each pixel value of the first mask; multiplying each pixel value corresponding to the non-motion area in the initial mask by a second value to be used as a second mask; multiplying each pixel value corresponding to the non-motion area in the historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of the second mask; wherein the first value is greater than the second value.

Optionally, the background replacement module 240 is specifically configured to: i+ (1- α) B according to the formula i=α + (1- α) _new Replacing the background in the current video frame with a new background to generate a synthesized frame corresponding to the current video frame; wherein alpha is a target mask, I is a current video frame, B _new For a video frame that includes a newly placed background, i is a composite frame corresponding to the current video frame.

The image background replacing device provided by the embodiment of the invention can execute the image background replacing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.

Example III

Fig. 3 is a schematic structural view of an apparatus according to a third embodiment of the present invention. Fig. 3 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 3 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 3, device 12 is in the form of a general purpose computing device. Components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, commonly referred to as a "hard disk drive"). Although not shown in fig. 3, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with device 12, and/or any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, via network adapter 20. As shown, network adapter 20 communicates with other modules of device 12 over bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, to implement an image background replacement method provided by an embodiment of the present invention.

Namely: an image background replacement method is realized, which comprises the following steps:

Example IV

The fourth embodiment of the present invention also discloses a computer storage medium having stored thereon a computer program which when executed by a processor implements an image background replacement method comprising:

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. An image background replacement method, comprising:

acquiring an initial mask corresponding to a target portrait area of the current video frame;

replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame;

performing segmentation optimization processing on the initial mask, including:

acquiring at least two connected domains according to each pixel value in the initial mask;

determining the connected domain with the largest area in the at least two connected domains as a target connected domain, or determining the connected domain with the area larger than a threshold value as a target connected domain;

maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to be a set target value;

performing inter-frame smoothing on the initial mask, including:

performing differential operation on the current video frame and the adjacent historical video frames, and determining a motion area in the current video frame;

multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in a historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the multiplied value to each pixel value of a first mask;

multiplying each pixel value corresponding to the non-motion area in the initial mask with a second value to be used as a second mask; multiplying each pixel value corresponding to a non-motion area in a historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of a second mask;

wherein the first value is greater than the second value.

2. The method of claim 1, wherein obtaining a current video frame and selecting a target portrait region in the current video frame comprises:

acquiring a historical target mask of a historical video frame adjacent to a current video frame, and determining vertex coordinates of a minimum circumscribed rectangle of a completely covered portrait according to coordinate values of pixel points corresponding to pixel values with a value of 1 in the historical target mask;

determining a minimum circumscribed rectangle as a portrait area of the historical video frame according to the vertex coordinates;

and estimating a target portrait area of the current video frame according to the portrait area of the historical video frame.

3. The method of claim 2, further comprising, after estimating a target portrait area for a current video frame from portrait areas for the historical video frames:

and carrying out smoothing treatment on the vertex coordinates of the target portrait area according to a second-order exponential smoothing algorithm to obtain a smoothed target portrait area.

4. The method of claim 1, wherein obtaining an initial mask corresponding to a target portrait area of the current video frame comprises:

and performing separation convolution processing, channel mapping compression processing and interpolation convolution processing on the target portrait region of the current video frame through a convolution neural network to obtain an initial mask corresponding to the target portrait region.

5. The method of any of claims 1-4, wherein replacing the background in the current video frame with a new background according to the target mask generates a composite frame corresponding to the current video frame, comprising:

i+ (1- α) B according to the formula i=α + (1- α) _new Replacing the background in the current video frame with a new background to generate a synthesized frame corresponding to the current video frame;

wherein, alpha is a target mask, I is a current video frame, B _new For a video frame that includes a newly placed background, i is a composite frame corresponding to the current video frame.

6. An image background replacement apparatus, comprising:

the mask acquisition module is used for acquiring an initial mask corresponding to the target portrait area of the current video frame through a portrait segmentation model;

the background replacing module is used for replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame;

the mask processing module comprises: the first processing module is used for acquiring at least two connected domains according to each pixel value in the initial mask; determining a connected domain with the largest area of the at least two connected domains as a target connected domain, or determining a connected domain with the area larger than a threshold value as a target connected domain; maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to be a set target value;

a mask processing module, comprising: the second processing module is used for performing differential operation on the current video frame and the adjacent historical video frames and determining a motion area in the current video frame; multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in the historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the second value to each pixel value of the first mask; multiplying each pixel value corresponding to the non-motion area in the initial mask by a second value to be used as a second mask; multiplying each pixel value corresponding to the non-motion area in the historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of the second mask; wherein the first value is greater than the second value.

7. An electronic device, the device comprising:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image background replacement method of any of claims 1-5.

8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the image background replacement method according to any of claims 1-5.