CN111292337B - Image background replacement method, device, equipment and storage medium - Google Patents
Image background replacement method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111292337B CN111292337B CN202010071578.1A CN202010071578A CN111292337B CN 111292337 B CN111292337 B CN 111292337B CN 202010071578 A CN202010071578 A CN 202010071578A CN 111292337 B CN111292337 B CN 111292337B
- Authority
- CN
- China
- Prior art keywords
- video frame
- mask
- target
- current video
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000011218 segmentation Effects 0.000 claims abstract description 38
- 238000009499 grossing Methods 0.000 claims abstract description 28
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 9
- 238000000926 separation method Methods 0.000 claims description 8
- 239000002131 composite material Substances 0.000 claims description 6
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 4
- 238000003709 image segmentation Methods 0.000 description 17
- 238000013527 convolutional neural network Methods 0.000 description 10
- 230000003287 optical effect Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 101000989950 Otolemur crassicaudatus Hemoglobin subunit alpha-A Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000012780 transparent material Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Studio Circuits (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses an image background replacement method, an image background replacement device, image background replacement equipment and a storage medium. The method comprises the following steps: acquiring a current video frame, and selecting a target portrait area in the current video frame; acquiring an initial mask corresponding to a target portrait area of the current video frame; performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask; and replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame. According to the technical scheme, the calculated amount of the portrait segmentation of the real-time video is reduced, the portrait segmentation precision is improved, and the portrait background replacement of the real-time video is realized.
Description
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to an image background replacement method, an image background replacement device, image background replacement equipment and a storage medium.
Background
Currently, in order to replace the portrait background in an image, it is necessary to accurately separate the portrait instance from the original background of the image. The existing separation methods mainly comprise two methods: human image semantic segmentation and human image foreground matting.
Portrait semantic segmentation understands images from the semantic hierarchy, separates partial pixels classified as people in semantic classes from pixels classified as background parts, obtains a mask of 0 and 1, and separates a Portrait partial region from the background through the mask. However, the method is based on graph theory algorithm, the resolution ratio is increased to cause the calculated amount to be exponentially increased, and the optimization function naturally tends to be low-level in characteristics, so that the accuracy of human image segmentation is low, and the method is not suitable for carrying out human image background replacement on real-time video.
The method can improve the image segmentation precision to the pixel level relative to the image semantic segmentation, but the calculated amount is correspondingly more than that of the image semantic segmentation technology, and the method is not suitable for carrying out image background replacement on the real-time video.
Disclosure of Invention
The embodiment of the invention provides an image background replacement method, an image background replacement device, image background replacement equipment and a storage medium, which are used for reducing the calculation amount of human image segmentation on a real-time video, improving the human image segmentation precision and realizing human image background replacement on the real-time video.
In a first aspect, an embodiment of the present invention provides an image background replacing method, including:
acquiring a current video frame, and selecting a target portrait area in the current video frame;
acquiring an initial mask corresponding to a target portrait area of a current video frame;
performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask;
and replacing the background of the current video frame with the new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.
In a second aspect, an embodiment of the present invention further provides an image background replacing apparatus, including:
the region selection module is used for acquiring a current video frame and selecting a target portrait region in the current video frame;
the mask acquisition module is used for acquiring an initial mask corresponding to a target portrait area of the current video frame through a portrait segmentation model;
the mask processing module is used for carrying out segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask;
and the background replacing module is used for replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.
In a third aspect, an embodiment of the present invention further provides an apparatus, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image background replacement method provided by any embodiment of the present invention.
In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image background replacement method provided by any of the embodiments of the present invention.
According to the technical scheme, the current video frame is obtained, and the target portrait area is selected from the current video frame; acquiring an initial mask corresponding to a target portrait area of a current video frame; performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask; the background of the current video frame is replaced by the new background according to the target mask, and a synthesized frame corresponding to the current video frame is generated, so that the problems that in the prior art, the accuracy of the portrait segmentation is low, the calculated amount is large, the portrait background replacement of the real-time video is not suitable, the calculated amount of the portrait segmentation of the real-time video is reduced, the portrait segmentation precision is improved, and the portrait background replacement of the real-time video is realized.
Drawings
FIG. 1a is a flow chart of an image background replacement method in accordance with a first embodiment of the present invention;
FIG. 1b is a schematic diagram of a convolutional neural network topology according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of an image background replacing device in a second embodiment of the present invention;
fig. 3 is a schematic structural view of an apparatus according to a third embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1a is a flowchart of an image background replacement method according to a first embodiment of the present invention, where the present embodiment is applicable to the case of performing a portrait background replacement on a real-time video, the method may be performed by an image background replacement device, which may be implemented by hardware and/or software, and may be generally integrated in a device that provides an image background replacement service. As shown in fig. 1a, the method comprises:
step 110, obtaining a current video frame, and selecting a target portrait area in the current video frame.
The embodiment is applied to a live video scene, wherein a current video frame refers to a data frame corresponding to a current moment in a live video, and the current video frame comprises a portrait of a host. In order to replace a new background for the current video frame, the anchor portrait needs to be segmented from the current video frame and then fused with the new background to realize the replacement of the image background. Considering that in a given 1080p video frame, a character instance is often only in a small block area with an intermediate frame, and the motion amplitude of a character between adjacent video frames is limited, in order to reduce the calculation amount of image segmentation, a target image area including a complete image can be selected in the current video frame, and the image is segmented by performing image processing on the target image area, without performing image processing on the whole video frame.
Optionally, obtaining the current video frame and selecting the target portrait area in the current video frame may include: acquiring a historical target mask of a historical video frame adjacent to the current video frame, and determining vertex coordinates of a minimum circumscribed rectangle of a completely covered portrait according to coordinate values of pixel points corresponding to pixel values with a value of 1 in the historical target mask; determining a minimum circumscribed rectangle according to the vertex coordinates, and taking the minimum circumscribed rectangle as a portrait area of a historical video frame; and estimating a target portrait area of the current video frame according to the portrait area of the historical video frame.
In this embodiment, a tracking manner of a region of interest (Region of Interest, ROI) is adopted, and a target portrait region in a current video frame is estimated according to a portrait region of a historical video frame adjacent to the current video frame, so that a subsequent algorithm only performs image processing on the target portrait region.
Alternatively, the process of acquiring the target portrait area may be as follows: and determining each target pixel point corresponding to the portrait in the historical video frame according to the pixel value with the value of 1 in the historical target mask of the historical video frame, selecting the maximum value x1 and the minimum value x2 of the horizontal coordinate and the maximum value y1 and the minimum value y2 of the vertical coordinate from the coordinate values of the target pixel points, determining the vertex coordinates of the target portrait region to be (x 1, y 1), (x 2, y 1), (x 1, y 2) and (x 2, y 2) respectively, wherein the rectangle corresponding to the vertex is the portrait region of the historical video frame, and then enabling the portrait region of the historical video frame to be directly used as the target portrait region according to the estimated portrait motion trend, or carrying out proper movement or adjustment on the portrait region of the historical video frame to be used as the target portrait region.
Optionally, after estimating the target portrait area of the current video frame according to the portrait area of the historical video frame, the method may further include: and carrying out smoothing treatment on the vertex coordinates of the target portrait area according to a second-order exponential smoothing algorithm to obtain a smoothed target portrait area.
In this embodiment, in order to stabilize the frame of the target image area, prevent the influence of abrupt change and severe jitter on the subsequent algorithm, and trade-off between smoothness and sensitivity of image segmentation, it is necessary to use a second-order exponential smoothing algorithm to smooth the vertex coordinates of the target image area. Let x be a certain vertex coordinate of the target portrait area at time t t The vertex coordinate after the smoothing is T t The second order exponential smoothing formula is:
in this embodiment, the target image region obtained by the ROI tracking technology can reduce the image region for image segmentation, accelerate the image segmentation speed, and improve the segmentation efficiency, and simultaneously, the second-order exponential smoothing algorithm is adopted to perform secondary smoothing on the target image region, so that the image segmentation is more accurate.
Step 120, obtaining an initial mask corresponding to the target portrait area of the current video frame.
In this embodiment, the mask is a binary image composed of 0 and 1, wherein a pixel value with a value of 1 corresponds to the target portrait area, a pixel value with a value of 0 corresponds to the background area, and the portrait can be separated from the current video frame by multiplying the mask with the target portrait area.
Optionally, acquiring the initial mask corresponding to the target portrait area of the current video frame may include: and performing separation convolution processing, channel mapping compression processing and interpolation convolution processing on the target portrait region of the current video frame through a convolution neural network to obtain an initial mask corresponding to the target portrait region.
As shown in fig. 1b, the present embodiment obtains a continuous value initial mask for portrait segmentation by inputting a target portrait area of a current video frame to a convolutional neural network. In order to balance the precision and the calculated amount of human image segmentation, the conventional convolutional neural network is improved by methods of separating convolution, channel mapping compression, interpolation convolution and the like, so that the situation that the difference of human image segmentation results in adjacent video frames is large and the visual sense of a user is poor is avoided.
In this embodiment, the split convolution is used to reduce the dimension of the multidimensional convolution. For example, for three dimensions of length, width and responsibility control channel number, the calculation amount can be reduced by convolving the length, width and then convolving the color space channel number. And the channel mapping compression is used for compressing the channel to be convolved, convolving the compressed channel, decompressing the convolved channel and reducing the calculated amount by reducing the number of the channel to be convolved. And the interpolation convolution is used for improving the image segmentation accuracy. As shown in fig. 1b, in the 4-layer separation convolution of the convolutional neural network, the resolution of the image of the 4-layer separation convolution is the lowest, only the fuzzy outline of the person can be obtained, the details of the person can not be basically obtained, at this time, the details of the person obtained by up-sampling the 2-layer separation convolution can be fused into the up-sampling result of the 4-layer separation convolution, and then the image matting with higher precision can be obtained.
In the embodiment, the improved convolutional neural network has less calculated amount and model parameter amount for image segmentation, less resource consumption and can be deployed in a PC host for broadcasting in real time. On the other hand, the convolutional neural network has better robustness on human image segmentation, and can cope with uncertain interference caused by illumination change, lens movement and other factors, so as to obtain more stable composition output. In the last aspect, the self-adaptive matting of the convolutional neural network can replace the green curtain composition to a certain extent, and the limited degree of the use scene for realizing the portrait segmentation is reduced.
Of course, the present embodiment is not limited to the use of convolutional neural networks for image segmentation, and other image segmentation models with required calculation amounts and accuracy are also applicable to the present embodiment.
And 130, performing segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask.
In this embodiment, although the image segmentation accuracy of the convolutional neural network is already high, the following problems still remain: firstly, although the overall accuracy of the segmentation result is high, local samples still have flaws, such as free misclassification areas, and secondly, the segmentation result only considers the current video frame and has the problem of inter-frame jitter. The present embodiment solves the above-described problem by performing a division optimization process and an inter-frame smoothing process on the initial mask.
Optionally, the performing a segmentation optimization process on the initial mask may include: acquiring at least two connected domains according to each pixel value in the initial mask; determining a connected domain with the largest area of the at least two connected domains as a target connected domain, or determining a connected domain with the area larger than a threshold value as a target connected domain; and maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to the set target value.
The present embodiment removes free dirty blocks by connected domain analysis. Specifically, the pixel points corresponding to the pixel values with the same value in the initial mask can be connected into at least two connected domains, and since the area of the connected domain of the character is the largest, according to the specific service scene, only the connected domain with the largest area or the connected domain with the area larger than the threshold value can be reserved, other connected domains are deleted and mapped into the initial mask, namely, the pixel value corresponding to the reserved connected domain is kept unchanged, and the pixel value corresponding to the deleted connected domain is updated to the set target value. The target value may be 0 or another positive number close to 0, for example, 0.001.
Optionally, performing the inter-frame smoothing on the initial mask may include: performing differential operation on the current video frame and the adjacent historical video frames, and determining a motion area in the current video frame; multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in the historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the second value to each pixel value of the first mask; multiplying each pixel value corresponding to the non-motion area in the initial mask by a second value to be used as a second mask; multiplying each pixel value corresponding to the non-motion area in the historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of the second mask; wherein the first value is greater than the second value.
In this embodiment, since the output of the current video frame in the convolutional neural network is unstable, inter-frame jitter problem, especially edge jitter problem, may occur, and therefore when obtaining the mask corresponding to the current video frame, the initial mask needs to be processed according to the historical video frame, so that the finally obtained target mask is smooth over the whole historical sequence.
Optionally, the convolution neural network output results among the multiple frames can be guided to be fused according to the motion area obtained by the traditional motion detection algorithm, so as to obtain the target mask. The background tends to be constant in video frames, considering that only people in the video frames are moving. Therefore, the difference operation can be performed between the current video frame and the adjacent historical video frames, and most of the obtained difference values have 1 pixel values corresponding to the region where the person is located, so that the region corresponding to the 1 pixel values in the mask is used as the motion region.
In this embodiment, the first value may be 0.9, and the second value may be 0.1, and since the character is moving, the target mask corresponding to the moving area is more dependent on the current video frame, and the target mask corresponding to the non-moving area is more dependent on the historical video frame. That is, the original mask corresponding to the motion region where the person is located is replaced with the original mask in which the pixel value corresponding to the motion region is multiplied by 0.9+ respectively, and the pixel value corresponding to the motion region is multiplied by 0.1 respectively, and the original mask corresponding to the non-motion region, that is, the background region, is replaced with the original mask in which the pixel value corresponding to the non-motion region is multiplied by 0.1+ respectively, and the pixel value corresponding to the non-motion region is multiplied by 0.9 respectively, so as to obtain the target mask corresponding to the current video frame.
In fact, when the initial mask is subjected to inter-frame smoothing, all the historical video frames before the current video frame can be considered, at this time, the closer the historical video frame is to the current video frame, the greater the specific gravity of the historical target mask in the target mask of the current video frame is, the farther the historical target mask is from the current video frame, and the smaller the specific gravity of the historical target mask in the target mask of the current video frame is.
And 140, replacing the background of the current video frame with the new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.
Optionally, replacing the background in the current video frame with the new background according to the target mask, and generating the composite frame corresponding to the current video frame may include: i+ (1- α) B according to the formula i=α + (1- α) new Replacing the background in the current video frame with a new background to generate a synthesized frame corresponding to the current video frame; wherein alpha is a target mask, I is a current video frame, B new For a video frame that includes a newly placed background, i is a composite frame corresponding to the current video frame.
In this embodiment, the object mask subjected to the segmentation optimization and the inter-frame smoothing is a continuous value mask with a pixel value of 0 to 1, wherein the pixel value corresponding to the portrait is 1, the pixel value corresponding to the background is 0, and the pixels between 0 and 1The values are most present in the edge area, such as a person hair or other transparent material. I+ (1- α) B according to the formula i=α + (1- α) new When background replacement is carried out, a portrait in the current video frame is obtained by using alpha-I, and (1-alpha) B is used new And obtaining new background in other video frames, and combining the new background and the new background to obtain a replaced combined frame.
According to the technical scheme, the current video frame is obtained, and the target portrait area is selected from the current video frame; acquiring an initial mask corresponding to a target portrait area of a current video frame; performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask; the background of the current video frame is replaced by the new background according to the target mask, and a synthesized frame corresponding to the current video frame is generated, so that the problems that in the prior art, the accuracy of the portrait segmentation is low, the calculated amount is large, the portrait background replacement of the real-time video is not suitable, the calculated amount of the portrait segmentation of the real-time video is reduced, the portrait segmentation precision is improved, and the portrait background replacement of the real-time video is realized.
Example two
Fig. 2 is a schematic structural diagram of an image background replacing device in a second embodiment of the present invention, and the embodiment is applicable to a situation of performing portrait background replacement on a real-time video. As shown in fig. 2, the image background replacing apparatus includes:
the region selection module 210 is configured to obtain a current video frame, and select a target portrait region in the current video frame;
the mask obtaining module 220 is configured to obtain, through the portrait segmentation model, an initial mask corresponding to a target portrait area of the current video frame;
the mask processing module 230 is configured to perform segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask;
the background replacing module 240 is configured to replace the background of the current video frame with a new background according to the target mask, and generate a composite frame corresponding to the current video frame.
According to the technical scheme, the current video frame is obtained, and the target portrait area is selected from the current video frame; acquiring an initial mask corresponding to a target portrait area of a current video frame; performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask; the background of the current video frame is replaced by the new background according to the target mask, and a synthesized frame corresponding to the current video frame is generated, so that the problems that in the prior art, the accuracy of the portrait segmentation is low, the calculated amount is large, the portrait background replacement of the real-time video is not suitable, the calculated amount of the portrait segmentation of the real-time video is reduced, the portrait segmentation precision is improved, and the portrait background replacement of the real-time video is realized.
Optionally, the area selection module 210 is specifically configured to: acquiring a historical target mask of a historical video frame adjacent to the current video frame, and determining vertex coordinates of a minimum circumscribed rectangle of the completely covered portrait according to coordinate values of pixel points corresponding to pixel values with a value of 1 in the historical target mask; determining a minimum circumscribed rectangle according to the vertex coordinates, and taking the minimum circumscribed rectangle as a portrait area of a historical video frame; and estimating a target portrait area of the current video frame according to the portrait area of the historical video frame.
Optionally, the area selection module 210 is further configured to: after estimating a target portrait area of the current video frame according to the portrait area of the historical video frame, carrying out smoothing treatment on vertex coordinates of the target portrait area according to a second-order exponential smoothing algorithm to obtain a smoothed target portrait area.
Optionally, the mask acquiring module 220 is specifically configured to: and performing separation convolution processing, channel mapping compression processing and interpolation convolution processing on the target portrait region of the current video frame through a convolution neural network to obtain an initial mask corresponding to the target portrait region.
Optionally, the mask processing module 230 includes: the first processing module is used for acquiring at least two connected domains according to each pixel value in the initial mask; determining a connected domain with the largest area of the at least two connected domains as a target connected domain, or determining a connected domain with the area larger than a threshold value as a target connected domain; and maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to the set target value.
Optionally, the mask processing module 230 includes: the second processing module is used for performing differential operation on the current video frame and the adjacent historical video frames and determining a motion area in the current video frame; multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in the historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the second value to each pixel value of the first mask; multiplying each pixel value corresponding to the non-motion area in the initial mask by a second value to be used as a second mask; multiplying each pixel value corresponding to the non-motion area in the historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of the second mask; wherein the first value is greater than the second value.
Optionally, the background replacement module 240 is specifically configured to: i+ (1- α) B according to the formula i=α + (1- α) new Replacing the background in the current video frame with a new background to generate a synthesized frame corresponding to the current video frame; wherein alpha is a target mask, I is a current video frame, B new For a video frame that includes a newly placed background, i is a composite frame corresponding to the current video frame.
The image background replacing device provided by the embodiment of the invention can execute the image background replacing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Example III
Fig. 3 is a schematic structural view of an apparatus according to a third embodiment of the present invention. Fig. 3 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 3 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 3, device 12 is in the form of a general purpose computing device. Components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, commonly referred to as a "hard disk drive"). Although not shown in fig. 3, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with device 12, and/or any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, device 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, via network adapter 20. As shown, network adapter 20 communicates with other modules of device 12 over bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, to implement an image background replacement method provided by an embodiment of the present invention.
Namely: an image background replacement method is realized, which comprises the following steps:
acquiring a current video frame, and selecting a target portrait area in the current video frame;
acquiring an initial mask corresponding to a target portrait area of a current video frame;
performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask;
and replacing the background of the current video frame with the new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.
Example IV
The fourth embodiment of the present invention also discloses a computer storage medium having stored thereon a computer program which when executed by a processor implements an image background replacement method comprising:
acquiring a current video frame, and selecting a target portrait area in the current video frame;
acquiring an initial mask corresponding to a target portrait area of a current video frame;
performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask;
and replacing the background of the current video frame with the new background according to the target mask, and generating a synthesized frame corresponding to the current video frame.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
Claims (8)
1. An image background replacement method, comprising:
acquiring a current video frame, and selecting a target portrait area in the current video frame;
acquiring an initial mask corresponding to a target portrait area of the current video frame;
performing segmentation optimization treatment and inter-frame smoothing treatment on the initial mask to obtain a target mask;
replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame;
performing segmentation optimization processing on the initial mask, including:
acquiring at least two connected domains according to each pixel value in the initial mask;
determining the connected domain with the largest area in the at least two connected domains as a target connected domain, or determining the connected domain with the area larger than a threshold value as a target connected domain;
maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to be a set target value;
performing inter-frame smoothing on the initial mask, including:
performing differential operation on the current video frame and the adjacent historical video frames, and determining a motion area in the current video frame;
multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in a historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the multiplied value to each pixel value of a first mask;
multiplying each pixel value corresponding to the non-motion area in the initial mask with a second value to be used as a second mask; multiplying each pixel value corresponding to a non-motion area in a historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of a second mask;
wherein the first value is greater than the second value.
2. The method of claim 1, wherein obtaining a current video frame and selecting a target portrait region in the current video frame comprises:
acquiring a historical target mask of a historical video frame adjacent to a current video frame, and determining vertex coordinates of a minimum circumscribed rectangle of a completely covered portrait according to coordinate values of pixel points corresponding to pixel values with a value of 1 in the historical target mask;
determining a minimum circumscribed rectangle as a portrait area of the historical video frame according to the vertex coordinates;
and estimating a target portrait area of the current video frame according to the portrait area of the historical video frame.
3. The method of claim 2, further comprising, after estimating a target portrait area for a current video frame from portrait areas for the historical video frames:
and carrying out smoothing treatment on the vertex coordinates of the target portrait area according to a second-order exponential smoothing algorithm to obtain a smoothed target portrait area.
4. The method of claim 1, wherein obtaining an initial mask corresponding to a target portrait area of the current video frame comprises:
and performing separation convolution processing, channel mapping compression processing and interpolation convolution processing on the target portrait region of the current video frame through a convolution neural network to obtain an initial mask corresponding to the target portrait region.
5. The method of any of claims 1-4, wherein replacing the background in the current video frame with a new background according to the target mask generates a composite frame corresponding to the current video frame, comprising:
i+ (1- α) B according to the formula i=α + (1- α) new Replacing the background in the current video frame with a new background to generate a synthesized frame corresponding to the current video frame;
wherein, alpha is a target mask, I is a current video frame, B new For a video frame that includes a newly placed background, i is a composite frame corresponding to the current video frame.
6. An image background replacement apparatus, comprising:
the region selection module is used for acquiring a current video frame and selecting a target portrait region in the current video frame;
the mask acquisition module is used for acquiring an initial mask corresponding to the target portrait area of the current video frame through a portrait segmentation model;
the mask processing module is used for carrying out segmentation optimization processing and inter-frame smoothing processing on the initial mask to obtain a target mask;
the background replacing module is used for replacing the background of the current video frame with a new background according to the target mask, and generating a synthesized frame corresponding to the current video frame;
the mask processing module comprises: the first processing module is used for acquiring at least two connected domains according to each pixel value in the initial mask; determining a connected domain with the largest area of the at least two connected domains as a target connected domain, or determining a connected domain with the area larger than a threshold value as a target connected domain; maintaining the pixel value corresponding to the target communication area in the initial mask unchanged, and updating the pixel value corresponding to the non-target communication area in the initial mask to be a set target value;
a mask processing module, comprising: the second processing module is used for performing differential operation on the current video frame and the adjacent historical video frames and determining a motion area in the current video frame; multiplying each pixel value corresponding to the motion area in the initial mask by a first numerical value to be used as a first mask; multiplying each pixel value corresponding to the motion area in the historical target mask of the historical video frame by a second value respectively, and correspondingly accumulating the second value to each pixel value of the first mask; multiplying each pixel value corresponding to the non-motion area in the initial mask by a second value to be used as a second mask; multiplying each pixel value corresponding to the non-motion area in the historical target mask of the historical video frame by a first numerical value respectively, and correspondingly accumulating the first numerical value to each pixel value of the second mask; wherein the first value is greater than the second value.
7. An electronic device, the device comprising:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the image background replacement method of any of claims 1-5.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the image background replacement method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010071578.1A CN111292337B (en) | 2020-01-21 | 2020-01-21 | Image background replacement method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010071578.1A CN111292337B (en) | 2020-01-21 | 2020-01-21 | Image background replacement method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111292337A CN111292337A (en) | 2020-06-16 |
CN111292337B true CN111292337B (en) | 2024-03-01 |
Family
ID=71026731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010071578.1A Active CN111292337B (en) | 2020-01-21 | 2020-01-21 | Image background replacement method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111292337B (en) |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111768416B (en) * | 2020-06-19 | 2024-04-19 | Oppo广东移动通信有限公司 | Photo cropping method and device |
CN111754528B (en) * | 2020-06-24 | 2024-07-12 | Oppo广东移动通信有限公司 | Portrait segmentation method, device, electronic equipment and computer readable storage medium |
CN113473239B (en) * | 2020-07-15 | 2023-10-13 | 青岛海信电子产业控股股份有限公司 | Intelligent terminal, server and image processing method |
CN112132769B (en) * | 2020-08-04 | 2024-09-24 | 绍兴埃瓦科技有限公司 | Image fusion method and device and computer equipment |
CN112037227B (en) * | 2020-09-09 | 2024-02-20 | 脸萌有限公司 | Video shooting method, device, equipment and storage medium |
CN118264891A (en) * | 2020-09-15 | 2024-06-28 | 上海传英信息技术有限公司 | Image processing method, terminal and computer storage medium |
CN112351291A (en) * | 2020-09-30 | 2021-02-09 | 深圳点猫科技有限公司 | Teaching interaction method, device and equipment based on AI portrait segmentation |
CN112258436B (en) * | 2020-10-21 | 2024-09-13 | 华为技术有限公司 | Training method and device for image processing model, image processing method and model |
CN112330579B (en) * | 2020-10-30 | 2024-06-14 | 中国平安人寿保险股份有限公司 | Video background replacement method, device, computer equipment and computer readable medium |
CN112465843A (en) * | 2020-12-22 | 2021-03-09 | 深圳市慧鲤科技有限公司 | Image segmentation method and device, electronic equipment and storage medium |
CN112712525A (en) * | 2020-12-23 | 2021-04-27 | 北京华宇信息技术有限公司 | Multi-party image interaction system and method |
CN112686907A (en) * | 2020-12-25 | 2021-04-20 | 联想(北京)有限公司 | Image processing method, device and apparatus |
CN112837323A (en) * | 2021-01-12 | 2021-05-25 | 全时云商务服务股份有限公司 | Video processing method, system and storage medium based on portrait segmentation |
CN112911318B (en) * | 2021-01-15 | 2023-03-31 | 广州虎牙科技有限公司 | Live broadcast room background replacement method and device, electronic equipment and storage medium |
CN113763439A (en) * | 2021-02-07 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Image processing method and device |
CN113191184A (en) * | 2021-03-02 | 2021-07-30 | 深兰科技(上海)有限公司 | Real-time video processing method and device, electronic equipment and storage medium |
CN115082366B (en) * | 2021-03-12 | 2024-07-19 | 中国移动通信集团广东有限公司 | Image synthesis method and system |
CN113079383B (en) * | 2021-03-25 | 2023-06-20 | 北京市商汤科技开发有限公司 | Video processing method, device, electronic equipment and storage medium |
CN113112508A (en) * | 2021-03-30 | 2021-07-13 | 北京大米科技有限公司 | Video processing method and device |
CN114943909B (en) * | 2021-03-31 | 2023-04-18 | 华为技术有限公司 | Method, device, equipment and system for identifying motion area |
CN113132638B (en) * | 2021-04-22 | 2023-06-09 | Oppo广东移动通信有限公司 | Video processing method, video processing system, mobile terminal and readable storage medium |
CN113034648A (en) * | 2021-04-30 | 2021-06-25 | 北京字节跳动网络技术有限公司 | Image processing method, device, equipment and storage medium |
CN113301384B (en) * | 2021-05-21 | 2023-03-24 | 苏州翼鸥时代科技有限公司 | Background replacing method and device, electronic equipment and readable storage medium |
CN113436097B (en) * | 2021-06-24 | 2022-08-02 | 湖南快乐阳光互动娱乐传媒有限公司 | Video matting method, device, storage medium and equipment |
CN113538270A (en) * | 2021-07-09 | 2021-10-22 | 厦门亿联网络技术股份有限公司 | Portrait background blurring method and device |
CN113505737B (en) * | 2021-07-26 | 2024-07-02 | 浙江大华技术股份有限公司 | Method and device for determining foreground image, storage medium and electronic device |
CN113660495A (en) * | 2021-08-11 | 2021-11-16 | 易谷网络科技股份有限公司 | Real-time video stream compression method and device, electronic equipment and storage medium |
WO2023039865A1 (en) * | 2021-09-17 | 2023-03-23 | 深圳市大疆创新科技有限公司 | Image processing method, video processing method, training method, device, program product, and storage medium |
CN114155268A (en) * | 2021-11-24 | 2022-03-08 | 北京市商汤科技开发有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN114040129B (en) * | 2021-11-30 | 2023-12-05 | 北京字节跳动网络技术有限公司 | Video generation method, device, equipment and storage medium |
CN117336422A (en) * | 2022-06-21 | 2024-01-02 | 北京字跳网络技术有限公司 | Video processing method, device, equipment and medium |
CN115908120B (en) * | 2023-01-06 | 2023-07-07 | 荣耀终端有限公司 | Image processing method and electronic device |
CN116229337B (en) * | 2023-05-10 | 2023-09-26 | 瀚博半导体(上海)有限公司 | Method, apparatus, system, device and medium for video processing |
CN118524258B (en) * | 2024-07-25 | 2024-10-18 | 浙江嗨皮网络科技有限公司 | Offline video background processing method, system and readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106204426A (en) * | 2016-06-30 | 2016-12-07 | 广州华多网络科技有限公司 | A kind of method of video image processing and device |
CN108124109A (en) * | 2017-11-22 | 2018-06-05 | 上海掌门科技有限公司 | A kind of method for processing video frequency, equipment and computer readable storage medium |
CN108520223A (en) * | 2018-04-02 | 2018-09-11 | 广州华多网络科技有限公司 | Dividing method, segmenting device, storage medium and the terminal device of video image |
CN109151489A (en) * | 2018-08-14 | 2019-01-04 | 广州虎牙信息科技有限公司 | live video image processing method, device, storage medium and computer equipment |
CN109684920A (en) * | 2018-11-19 | 2019-04-26 | 腾讯科技(深圳)有限公司 | Localization method, image processing method, device and the storage medium of object key point |
CN109697689A (en) * | 2017-10-23 | 2019-04-30 | 北京京东尚科信息技术有限公司 | Storage medium, electronic equipment, image synthesizing method and device |
CN109816011A (en) * | 2019-01-21 | 2019-05-28 | 厦门美图之家科技有限公司 | Generate the method and video key frame extracting method of portrait parted pattern |
-
2020
- 2020-01-21 CN CN202010071578.1A patent/CN111292337B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106204426A (en) * | 2016-06-30 | 2016-12-07 | 广州华多网络科技有限公司 | A kind of method of video image processing and device |
CN109697689A (en) * | 2017-10-23 | 2019-04-30 | 北京京东尚科信息技术有限公司 | Storage medium, electronic equipment, image synthesizing method and device |
CN108124109A (en) * | 2017-11-22 | 2018-06-05 | 上海掌门科技有限公司 | A kind of method for processing video frequency, equipment and computer readable storage medium |
CN108520223A (en) * | 2018-04-02 | 2018-09-11 | 广州华多网络科技有限公司 | Dividing method, segmenting device, storage medium and the terminal device of video image |
CN109151489A (en) * | 2018-08-14 | 2019-01-04 | 广州虎牙信息科技有限公司 | live video image processing method, device, storage medium and computer equipment |
CN109684920A (en) * | 2018-11-19 | 2019-04-26 | 腾讯科技(深圳)有限公司 | Localization method, image processing method, device and the storage medium of object key point |
CN109816011A (en) * | 2019-01-21 | 2019-05-28 | 厦门美图之家科技有限公司 | Generate the method and video key frame extracting method of portrait parted pattern |
Also Published As
Publication number | Publication date |
---|---|
CN111292337A (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111292337B (en) | Image background replacement method, device, equipment and storage medium | |
JP2022528294A (en) | Video background subtraction method using depth | |
US10964000B2 (en) | Techniques for reducing noise in video | |
US9947077B2 (en) | Video object tracking in traffic monitoring | |
CN110189336B (en) | Image generation method, system, server and storage medium | |
US20110211749A1 (en) | System And Method For Processing Video Using Depth Sensor Information | |
CN111914698B (en) | Human body segmentation method, segmentation system, electronic equipment and storage medium in image | |
US11748986B2 (en) | Method and apparatus for recognizing key identifier in video, device and storage medium | |
WO2015192115A1 (en) | Systems and methods for automated hierarchical image representation and haze removal | |
CN107273895B (en) | Method for recognizing and translating real-time text of video stream of head-mounted intelligent device | |
CN106991686B (en) | A kind of level set contour tracing method based on super-pixel optical flow field | |
CN112270745B (en) | Image generation method, device, equipment and storage medium | |
CN111382647B (en) | Picture processing method, device, equipment and storage medium | |
US20220103782A1 (en) | Method for video frame interpolation, and electronic device | |
JP2018507477A (en) | Method and apparatus for generating initial superpixel label map for image | |
JP2014044461A (en) | Image processing device and method, and program | |
KR20220153667A (en) | Feature extraction methods, devices, electronic devices, storage media and computer programs | |
CN112752158A (en) | Video display method and device, electronic equipment and storage medium | |
CN114449181A (en) | Image and video processing method, system thereof, data processing apparatus, and medium | |
CN113436251B (en) | Pose estimation system and method based on improved YOLO6D algorithm | |
Nunes et al. | Adaptive global decay process for event cameras | |
Kim | Image enhancement using patch-based principal energy analysis | |
CN117934688A (en) | Nerve representation modeling method based on Gaussian splatter sample | |
CN112598687A (en) | Image segmentation method and device, storage medium and electronic equipment | |
Zhao et al. | Real-time saliency-aware video abstraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |