CN116029895B

CN116029895B - AI virtual background implementation method, system and computer readable storage medium

Info

Publication number: CN116029895B
Application number: CN202310153669.3A
Authority: CN
Inventors: 王先来
Original assignee: Guangzhou Bairui Network Technology Co ltd
Current assignee: Guangzhou Bairui Network Technology Co ltd
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-08-04
Anticipated expiration: 2043-02-23
Also published as: CN116029895A

Abstract

The invention discloses an AI virtual background realization method, a system and a medium, wherein the method identifies a face region in an original picture through a depth convolution neural network, positions a human image according to the face region, converts the original picture into a binary picture to calculate the coordinate range of the human image position and the coordinate information of the face region, continuously carries out downwards extending region-of-interest processing twice according to the calculated coordinate information of the face region to respectively obtain a human body region and a bottom region, carries out background segmentation processing on the face region through a pixel value clustering algorithm, carries out background segmentation processing on the human body region and the bottom region respectively through an edge finding algorithm, carries out binary image addition synthesis on the face region, the human body region and the bottom region together to generate an integral foreground human image binary image which is segmented with the background, finally carries out fusion operation on the integral foreground human image and a new background image to generate final video frame image data to replace the original picture, and rapidly completes video background processing.

Description

AI virtual background implementation method, system and computer readable storage medium

Technical Field

The present invention relates to the field of virtual background technologies, and in particular, to a method and a system for implementing an AI virtual background, and a computer readable storage medium, where the computer readable storage medium can implement the AI virtual background implementing method when executed by a processor.

Background

The background replacement technology of the virtual background, namely the image, is to divide the foreground 0 foreground image and the background image in the video or the picture, and synthesize the divided foreground image into another background image. The technology is widely applied to video and picture editing, and the display effect of the video and the picture is increased through different background replacement.

The current common segmentation method for foreground images and background images comprises full-automatic image segmentation and user interactive image segmentation, wherein the full-automatic image segmentation adopts a clustering algorithm to maximize the difference between the foreground and the background, and then n data objects are divided into k clusters, so that the square sum of the distance between the data in each cluster and the center of the cluster is minimum; the user interactive image segmentation is to provide seeds of foreground and background for the user, then respectively establish probability distribution models for the foreground image and the background image, then segment the images through the models, and finally synthesize the segmented foreground images.

The method has the following technical problems: the full-automatic image segmentation mode needs to extract accurate data of an image, the quality of the image is greatly influenced during extraction, if the image is blurred due to dark shooting light, the extracted image data is influenced by the interference of fuzzy noise to have some deviation, and the edge noise of the segmented image is more; the user interactive image segmentation method needs to rely on seeds provided by users, if the seeds cannot cover all the distributions, the segmentation accuracy can be affected, the segmentation edges of the foreground image and the background image are blurred, and the edge local features of the segmented foreground image are difficult to be completely reserved.

Disclosure of Invention

The technical problem to be solved by the invention is to provide an AI virtual background implementation method, which can provide a clearer foreground image under the condition of improving the integrity of the foreground image, and a computer readable storage medium storing a computer program for realizing the method when being executed.

The method for realizing the AI virtual background comprises the following steps:

a face positioning step, namely recognizing a face region in an original picture through a trained deep convolutional neural network, positioning a face position according to the face region, and converting the original picture into a binary picture;

a human body positioning step, namely calculating the coordinate range of the positioned portrait position and the coordinate information of the human face region according to the binary image, and processing the downward extending interested region with a preset degree according to the calculated coordinate information of the human face region to obtain a human body region;

a bottom positioning step, namely processing a region of interest extending downwards from a human body region to a preset degree to obtain a bottom region;

background segmentation, namely carrying out background segmentation processing on the face region through a pixel value clustering algorithm; respectively carrying out background segmentation processing on a human body region and a bottom region through an edge searching algorithm, wherein the edge recognition strength of the edge searching algorithm of the human body region is greater than that of the bottom region;

a step of combining figures, in which a binary image is added and synthesized to a face region, a human body region and a bottom region which are subjected to background segmentation processing, so as to generate a synthesized figure binary image after the whole background is segmented;

and a background replacement step, wherein the original picture subjected to the portrait merging step is fused with a new background picture, and a portrait binary image area of the original picture is taken as a portrait foreground on the new background picture.

Preferably, the method further comprises an optical flow adjustment step executed in the background segmentation step, wherein an outer contour region of the portrait binary image of the current original picture is identified, an optical flow value generated by each pixel value of the outer contour region is calculated, and if the optical flow value variation of the continuous contour segment exceeds a preset degree, the probability of judging the outer contour region as a foreground region is improved.

Preferably, the outer contour area of the portrait binary image refers to a closed area in which the outer contour extends inwards and outwards by a preset pixel point respectively.

Preferably, if in the continuous multi-frame images, the optical flow value changes of the foreground optical flow and the background optical flow at the joint of the human body and the bottom of the outer contour area exceed the preset degree, the segmentation precision of the background segmentation step of the subsequent frame is improved.

Preferably, the face adjustment step is further performed after the face merging step, multiple frames of images are extracted from the continuous frame interval before the current frame, the foreground optical flow and the background optical flow of the extracted multiple frames of images at the joint of the face and the human body in the outline area are calculated respectively, and if the foreground optical flow and/or the background optical flow of the multiple frames of images are scattered to exceed a preset degree, the current face binary image predicted by the previous frame of optical flow image is fused with the current frame to serve as a new face prospect.

Preferably, the method further comprises a smoothing step executed before the portrait merging step, wherein the image smoothing step is carried out on images of the face region, the human body region and the bottom region by using a preset Gaussian filter respectively, the amplitude and the direction of the profile gradient at the joint of the three regions are calculated by first-order partial derivative finite difference, and the non-maximum suppression is carried out on the gradient amplitude, so that the smoothing treatment on the profile joint is realized.

Preferably, the method further comprises a denoising step executed in the portrait merging step, wherein a closed contour line and a region surrounded by the closed contour line are identified within a closed outer contour region of the portrait binary image, and foreground filling is carried out on a region with the length of the closed contour line or the contour area of the region surrounded by the closed contour line within a preset range.

Preferably, the method further comprises an antialiasing step executed in the portrait merging step, and a cutter operator operation is performed on the synthesized portrait binary image to obtain a parting line outline area.

There is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, is capable of implementing the AI virtual background implementing method described above.

The AI virtual background implementation system also comprises a processor, a video input module and a video output module which are respectively communicated with the processor, and the AI virtual background implementation system also comprises the pre-stored computer readable storage medium, wherein a computer program on the computer readable storage medium can be executed by the processor.

The beneficial effects are that: according to the AI virtual background implementation method, firstly, a face region in an original picture is recognized through a deep convolutional neural network to initially position the face region, then, edges of a background and a face and edges of an extracted image are rapidly divided through a face segmentation technology and an edge detection technology respectively, wherein background segmentation processing is carried out on the face region through a pixel value clustering algorithm to achieve fine-grained contour extraction of a complex foreground image region of the face, when edge contours are extracted on a human region and a bottom region through an edge searching algorithm, edge recognition intensity of the human region is greater than that of the bottom region, and because in video communication application such as real-time video conference or network live broadcast, the human region usually comprises hand motions and has some movement motions, the bottom region is always in a blocked or semi-blocked state and has small movement amplitude, rough granularity of contour extraction of the two is distinguished, and the definition of the edge contours of the face can be maintained under the condition of reducing processing workload. The AI virtual background realization method locates the portrait position and the portrait background area by a portrait positioning technology, and performs blurring or replacement on the portrait background, thereby rapidly completing video background processing.

Drawings

Fig. 1 is a flow chart of steps of an AI virtual background implementation method.

Fig. 2 is an original picture of an AI virtual background implementation method.

Fig. 3 is a schematic view of the face region, body region and bottom region ranges of the original picture in fig. 2.

Fig. 4 is a portrait binary diagram of the original picture in fig. 2.

Fig. 5 is a schematic diagram of face region, body region and bottom region segmentation of an AI virtual background implementation method.

Fig. 6 is a schematic diagram of an image processing flow of the AI virtual background implementation method.

Fig. 7 is a schematic diagram of optical flow processing of the AI virtual background implementation method.

Fig. 8 is a schematic diagram showing the effect of the denoising step in the AI virtual background implementation method.

Fig. 9 is an effect diagram of the antialiasing step of the AI virtual background implementation method.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The AI virtual background implementation system of this embodiment includes a processor, and a video input module and a video output module that communicate with the processor, respectively, where the video input module, such as an intelligent device equipped with a camera, may capture a user in real time, acquire a video frame from the real-time communication as an original image, the processor quickly divides the edges of the background and the portrait in the video frame, performs blurring or replacement on the portrait background, quickly completes the video background processing, and then outputs the portrait foreground input by the video input module and the replaced AI virtual background together by the video output module, as a video stream, to the user and the other party of the video.

The system for implementing the AI virtual background of the present embodiment replaces the background by the method for implementing the AI virtual background shown in fig. 1, and the specific implementation steps and flow effects (see fig. 6) are described in detail below.

And a portrait positioning step, namely recognizing a face region Head in the original picture of fig. 2 through the trained deep convolutional neural network, positioning a portrait position according to the face region, and converting the original picture into a binary picture.

The original image in fig. 2 is input into a preset CNN convolutional neural network for Face recognition, and the Face area is positioned according to the characteristics of eyes, noses and the like. The face recognition positioning method of the CNN convolutional neural network is realized through the existing mature system, and is not described in detail herein. According to the embodiment, the idea of semantic segmentation (CNN positioning of the portrait first) is integrated in the conventional image processing background segmentation problem, and whether the original image contains the portrait or not can be rapidly and accurately determined. And determining that the original image contains a human image through the Face area of fig. 3, generating a binary image (see fig. 4) with the same size as the original image, performing pixel expansion according to the Face area to obtain a human Face area Head (see the leftmost image Head of fig. 5), and calculating the human Face coordinate information through a CNN convolution network to accurately position the human Face area. In the subsequent background segmentation step, the background segmentation processing is carried out on the face region through a pixel value clustering algorithm K-means, and the original image is segmented into a foreground image and a background image.

And a human body positioning step, namely calculating the coordinate range of the positioned portrait position and the coordinate information of the human face region according to the binary image, and performing the downward extending region of interest (ROI) processing of a preset degree (such as 35 cm) according to the calculated coordinate information of the human face region to obtain the human body region. The human Body region of the present embodiment represents the upper part of the upper torso part below the head of the human Body (see fig. 5, middle image Body), and in consideration of the need for the background segmentation operation to be performed later, the human Body region is subjected to an appropriate enlargement operation in the ROI processing so as to cover a part of the face region upward, and the portion from the chin to the neck is also taken into the human Body region.

And a bottom positioning step, namely performing a downward extending region of interest (ROI) treatment of a preset degree (such as 15 cm) from the human body region to obtain a bottom region (see right-most image bottom in FIG. 5). The bottom region of the present embodiment represents the lower part of the upper torso part below the head of the human body, and in consideration of the need for the background segmentation operation to be performed later, the human body region is appropriately enlarged in the ROI processing so as to cover a part of the human body region upward, such as the chest part below the shoulders, also to incorporate the bottom region.

And a background segmentation step, namely carrying out background segmentation processing on the face region through a pixel value clustering algorithm K-means. The background segmentation is respectively carried out on the human body region and the bottom region through an edge search algorithm, such as a Canny edge detection algorithm, the Canny algorithm has a good effect in extracting multi-level strong and weak edge information, and has a very high advantage in running speed, the strong and weak edges are extracted through double thresholds, the two thresholds are distinguished, the high threshold is used for distinguishing an object to be extracted with a contour from the background, and the contrast between a target and the background is determined just like a parameter for threshold segmentation; a low threshold is a contour used to smooth the edge, and sometimes a high threshold is set too large, possibly the edge contour is discontinuous or not smooth enough, the contour is smoothed by a low threshold, or discontinuous portions are connected. The AI virtual background implementation method of the embodiment extracts strong and weak edges through double thresholds, and strengthens the processing of the edge unclear region so as to improve the adaptability of segmentation, thereby perfecting images.

And, the edge recognition strength of the edge finding algorithm for the human body area is greater than that of the bottom area. The range of the double threshold values is 170-200 when the human body edge contour is extracted, and the threshold value range can better extract the edge contour information of the portrait. Because the background of the bottom of the human body is complex, the relative comparison outline information is weak, and the double threshold is set to 50-80 when the bottom edge outline is extracted, the threshold can better extract the weak edge outline information, and particularly, the better outline extraction can be realized under the condition that the portrait is partially blocked or is not greatly distinguished from the background.

The AI virtual background implementation method of the embodiment utilizes a head and trunk multi-scale transformation processing method to accelerate the processing speed of an algorithm, so that the real-time background segmentation effect can be achieved on a high-resolution 1080p image. The face area is segmented at the pixel level, and other areas are segmented at the rough granularity outline, so that the whole processing time is about one third of that of the conventional method.

Preferably, because the original image input by the user may be a photo taken by the user or a captured image in the video, and the situation of shaking may occur when the user takes the photo or the captured image, the moving action of the portrait in the original image may occur, so that the portrait in the original image is blurred, and cannot be accurately identified in the later period.

By identifying whether the face area in the image has a moving action, whether the portrait in the original image has a moving action can be determined, if not, the portrait in the original image can be determined to be clearer, and the segmentation operation can be performed. Specifically, a frame difference method can be adopted for a face area to analyze whether the head has a moving action, so that the phenomenon of fine ripple jitter of the face edge of a continuous video stream is avoided.

And a smoothing processing step executed before the portrait merging step, wherein the images of the face region, the human body region and the bottom region are respectively subjected to image smoothing processing by using a preset Gaussian filter, the amplitude and the direction of the profile gradient at the two-by-two connection positions of the three regions are calculated through first-order partial derivative finite difference, and the non-maximum suppression is carried out on the gradient amplitude, so that the subsequent smoothing processing of the profile connection positions is realized.

When processing communication video, the embodiment also uses the optical flow algorithm technology to calculate the optical flow value generated by each pixel value, and combines the size range of the optical flow value to determine the probability screening of the foreground and the background of the edge of the portrait, so that the segmented portrait can be adjusted, the portrait in the video is kept stable, the display effect of video communication is improved, and the network video requirements of different users are met.

Specifically, in the optical flow adjustment step performed in the background segmentation step, as shown in fig. 7, the outer contour area of the portrait binary image of the current original picture refers to a closed area in which the outer contour L extends inward Li and outward Le by a preset number of pixel points (e.g., 15), respectively. And calculating an optical flow value generated by each pixel value, and if the optical flow value change of the continuous contour segment exceeds a preset degree, under the condition that the ambient light change is not large, indicating that the position change of the contour segment is obvious, improving the probability of judging the region as a foreground region.

If the optical flow values of the foreground optical flow and the background optical flow at the joint S2 between the human body and the bottom of the outer contour area in the continuous multi-frame images are changed to exceed the preset degree, the change of the optical flow values is less in the conference process of the human body bottom area, and when the condition that the optical flow values are changed to exceed the preset degree appears, the human body is displaced to a large extent, the segmentation precision of the background segmentation step of the subsequent frame is required to be improved for the human body in the motion state so as to adapt to the motion change of the human body.

And a step of combining the figures, which is to add and synthesize the two-value figures of the face region, the body region and the bottom region which are subjected to the background segmentation processing, so as to generate a synthesized figure two-value figure after the whole background is segmented.

The method further comprises a denoising step executed in the portrait merging step, as shown in fig. 8, and foreground filling is carried out on a contour area with a contour area or a contour length within a preset range in a closed outer contour area of the portrait binary image. Specifically, the contour area of the portrait contour in the combined binary image is searched, then the part of the noise block in the contour area of the portrait contour is screened out, and the noise block part area is filled with the color matched with the contour area (for example, the contour area in the portrait contour is white, the noise block can be filled with the white), so that local small block noise can be eliminated more skillfully under the condition that the edge characteristic information of the binary image segmentation is not influenced, and the picture integrity can be better maintained by locating and removing by a method of searching the irregular polygonal contour in the area.

The method also comprises an antialiasing step executed in the portrait merging step, see fig. 9, and a cutter operator operation is performed on the synthesized portrait binary image to obtain a parting line outline area. Specifically, in merging the parting lines of the foreground and the background of the contour in the binary image, in this embodiment, the blur blurring operation is performed on the merged binary image first, so that the contour region of the portrait contour in the merged binary image can form a softer parting line contour region, and then the binarization operation of the parting line region is performed, so that the parting line of the contour region generates softer and smoother parting lines. Because the actual segmented image is scaled down and then enlarged to the original image size after segmentation, the edge saw-tooth phenomenon is inevitably brought, and the embodiment aims at the problem of edge saw-tooth of the binary image, and the smoothing processing of the image edge is increased, so that the fusion of the foreground and the background is more natural.

And a face adjusting step executed after the face merging step, wherein a plurality of frames of images are extracted from a continuous frame interval before the current frame, the foreground optical flow and the background optical flow of the extracted plurality of frames of images at the joint of the face and the human body in the outline area are respectively calculated, and if the foreground optical flow and/or the background optical flow of the plurality of frames of images are scattered to exceed a preset degree, the current face binary image predicted by the previous frame of optical flow image is merged with the current frame to serve as a new face prospect.

And a background replacement step, wherein the original picture after the portrait merging step and a new background picture are fused, a portrait binary image area of the original picture is used as a portrait foreground on the new background picture, and finally, an upper left corner image in fig. 2 replaces the background to generate an image in the lower left corner. Specifically, the contour areas of the portrait and the background are respectively determined, then the value of the foreground image is adjusted according to the contour area of the portrait, the value of the replacement image is adjusted according to the contour area of the background, and fusion processing is carried out through the two adjusted values, so that a fusion image is obtained.

According to the AI virtual background implementation method, firstly, a face region in an original picture is recognized through a deep convolutional neural network to initially position the face region, then, edges of a background and a face and edges of an extracted image are rapidly divided through a face segmentation technology and an edge detection technology respectively, wherein background segmentation processing is carried out on the face region through a pixel value clustering algorithm to achieve fine-grained contour extraction of a complex foreground image region of the face, when edge contours are extracted on a human region and a bottom region through an edge searching algorithm, edge recognition intensity of the human region is greater than that of the bottom region, and because in video communication application such as real-time video conference or network live broadcast, the human region usually comprises hand motions and has some movement motions, the bottom region is always in a blocked or semi-blocked state and has small movement amplitude, rough granularity of contour extraction of the two is distinguished, and the definition of the edge contours of the face can be maintained under the condition of reducing processing workload. The AI virtual background realization method locates the portrait position and the portrait background area by a portrait positioning technology, and performs blurring or replacement on the portrait background, thereby rapidly completing video background processing.

The above embodiments are merely embodiments of the present invention and are not intended to limit the scope of the patent protection. Insubstantial changes and substitutions can be made by one skilled in the art in light of the teachings of the invention, as yet fall within the scope of the claims.

Claims

1. The AI virtual background implementation method is characterized by comprising the following steps:

background segmentation, namely carrying out background segmentation processing on the face region through a pixel value clustering algorithm; respectively carrying out background segmentation processing on a human body region and a bottom region through an edge searching algorithm, wherein the edge recognition strength of the edge searching algorithm of the human body region is greater than that of the bottom region; adopting pixel level segmentation in a face region, wherein other regions are rough-granularity contour segmentation; a frame difference method is adopted for the face area, whether the head moves or not is analyzed, and therefore fine ripple jitter of the face edge of the continuous video stream is avoided;

a background replacement step, wherein the original picture subjected to the portrait merging step is fused with a new background picture, and a portrait binary image area of the original picture is taken as a portrait foreground on the new background picture;

the human face adjusting step is carried out after the human image merging step, multiple frames of images are extracted from the continuous frame interval before the current frame, the foreground optical flow and the background optical flow of the extracted multiple frames of images at the joint of the human face and the human body in the outline area are respectively calculated, and if the foreground optical flow and/or the background optical flow of the multiple frames of images are scattered to exceed a preset degree, the current human image binary image predicted by the previous frame of optical flow image is fused with the current frame to serve as a new human image foreground of the background replacing step.

2. The AI virtual background implementation method according to claim 1, further comprising a smoothing step performed before the portrait merging step, wherein the images of the face region, the human body region and the bottom region are respectively subjected to image smoothing processing by using a preset gaussian filter, the amplitude and the direction of the contour gradient at the two-by-two connection of the three regions are calculated by first-order partial derivative finite difference, and the non-maximum suppression is performed on the gradient amplitude, so that the smoothing processing of the contour connection is realized.

3. The AI virtual background implementing method according to claim 1, further comprising a denoising step performed in the portrait merging step, wherein a closed contour line and a region surrounded by the closed contour line are identified within a closed outer contour region of a portrait binary image, and foreground filling is performed on a region in which a length of the closed contour line or a contour area of the surrounded region is within a preset range.

4. The AI virtual background implementing method according to claim 1, further comprising an antialiasing step performed in the portrait merging step, wherein a blast operator operation is performed on the synthesized portrait binary image to obtain a parting line contour area.

5. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, is capable of implementing the AI virtual background implementing method of any of claims 1-4.

6. An AI virtual background implementing system comprising a processor, and a video input module and a video output module in communication with the processor, respectively, further comprising the computer readable storage medium of claim 5, the computer program on the computer readable storage medium being executable by the processor.