CN113676692A

CN113676692A - Video processing method and device in video conference, electronic equipment and storage medium

Info

Publication number: CN113676692A
Application number: CN202110809168.7A
Authority: CN
Inventors: 吕亚亚; 李云鹏; 谢文龙; 王艳辉
Original assignee: Visionvera Information Technology Co Ltd
Current assignee: Visionvera Information Technology Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2021-11-19

Abstract

The application provides a method and a device for processing videos in a video conference, electronic equipment and a storage medium, belongs to the technical field of video processing, and aims to process videos in the video conference on line in real time so as to better meet user requirements. The method comprises the following steps: responding to a video calling request of a participant terminal in a video conference, and acquiring a video stream acquired by the participant terminal; replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame; and splicing a plurality of video frames after background replacement into a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request.

Description

Video processing method and device in video conference, electronic equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for processing a video in a video conference, an electronic device, and a storage medium.

Background

With the rapid development of network technologies, bidirectional communications such as video conferences and video teaching are widely popularized in the aspects of life, work, learning and the like of users.

In the prior art, users can only watch real-time video pictures collected by participating terminals generally in a video conference, and in the online video conference, in order to meet some scene requirements of holding the conference on site, some tools need to be prepared on site. For example, when a background board meeting the conference theme needs to be built, the background board is built only on the shooting site to meet the requirements of the video conference, but the conference cost is obviously increased in this way, and a user needs to spend a long time for arrangement of the conference site before the video conference is held, so that the problem of high conference cost for holding the video conference with a scene requirement is caused.

Disclosure of Invention

In view of the foregoing, embodiments of the present invention provide a video processing method and apparatus in a video conference, an electronic device, and a storage medium, so as to overcome the foregoing problems or at least partially solve the foregoing problems.

In a first aspect of the embodiments of the present invention, a video conference processing method is provided, where the method includes:

responding to a video calling request of a participant terminal in a video conference, and acquiring a video stream acquired by the participant terminal;

replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame;

and splicing a plurality of video frames after background replacement into a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request.

Optionally, replacing a background image in each original video frame in the video stream with a set background image at the current time to obtain a background-replaced video frame, including:

removing background images which do not belong to human body images from each original video frame in the video stream;

processing the original video frame without the background image based on the reserved human body image in each original video frame and the set background image at the current moment to obtain a video frame with the background replaced; wherein, the video frame after the background replacement comprises the set background image and the reserved human body image.

Optionally, removing a background image not belonging to a human body image from each original video frame in the video stream includes:

for each original video frame, the following steps are performed:

dividing the original video frame into a plurality of image blocks;

respectively sending the plurality of image blocks to the corresponding sub-threads; the different image blocks correspond to different sub-threads, and each sub-thread is used for removing background images which do not belong to human body images in the received image blocks;

obtaining a processed image block which is returned by the sub thread and is removed of the background image;

and splicing the plurality of processed image blocks to obtain a video frame with the background image removed and the human body image reserved.

Optionally, the removing the background image, which is not the human body image, in the received image block includes:

each sub-thread is configured to perform the following steps to remove background images in the received image blocks that do not belong to the human body image:

performing frame selection on the region where the human body part is located in the received image block to obtain a human body prediction frame where the human body part is located;

carrying out human body part identification on the image in the human body prediction frame to obtain an image area belonging to a human body part;

and removing image areas which do not belong to the human body part in the image blocks.

Optionally, the identifying the human body part of the image in the human body prediction frame to obtain an image region belonging to the human body part includes:

performing boundary delineation on the image in the human body prediction frame to obtain a plurality of image areas;

removing a target image area of which the area of the image area is smaller than a preset area in the multiple image areas;

and identifying the human body part of other image areas in the multiple image areas after the target image area is removed to obtain the image areas belonging to the human body part.

according to the splicing sequence of all original video frames in the video stream, respectively sending a preset number of original video frames to respective corresponding main threads each time; the main thread is respectively used for executing the step of replacing the background image in each original video frame in the video stream with the set background image at the current moment;

obtaining a background replaced video frame returned by each main thread;

and sending a plurality of video frames after background replacement as a video stream to be sent to a target terminal corresponding to the video calling request, wherein the video stream comprises:

and according to the splicing sequence, splicing the video frames after the background replacement into a video stream to be sent, and sending the video stream to the target terminal.

Optionally, the number of the participating terminals is multiple, and the method further includes:

aiming at each participant terminal, acquiring each original video frame in the video stream acquired by the participant terminal;

splicing the original video frames belonging to the same timestamp or the same receiving moment aiming at the multiple participating terminals to obtain spliced video frames;

adjusting the size of the spliced video frame to the size of the original video frame;

replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain the video frame after background replacement, including:

and replacing the background image in the spliced video frame after the size is adjusted with the set background image to obtain the video frame after the background replacement.

Optionally, after obtaining the background-replaced video frame, the method further includes:

carrying out human body gesture recognition on the video frame after the background replacement;

when the recognition result of human body posture recognition is detected to represent a preset type of human body posture, reading a prestored material image corresponding to the human body posture;

determining an image position corresponding to the preset type of human body posture in the video frame after the background replacement;

adding a layer at the image position in the video frame after the background replacement to obtain a video frame after the layer is overlapped;

and taking the video frame obtained by superposing the multiple image layers as a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request.

In a second aspect of the embodiments of the present invention, there is provided a device for processing a video in a video conference, where the device includes:

the response module is used for responding to a video calling request of a participant terminal in a video conference and obtaining a video stream collected by the participant terminal;

the background replacing module is used for replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame;

and the sending module is used for splicing the video frames after the background replacement into a video stream to be sent and sending the video stream to a target terminal corresponding to the video calling request.

In a third aspect of the embodiments of the present invention, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

the processor is configured to, when executing the program stored in the memory, implement the steps of the method for processing a video in a video conference according to the first aspect of the embodiment of the present invention.

In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, which stores a computer program for causing a processor to execute the method for processing a video in a video conference according to the first aspect of the embodiments of the present invention.

The embodiment of the invention has the following advantages:

in this embodiment, video streams acquired by respective participant terminals can be obtained in response to a video retrieval request for the participant terminals in a video conference; replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame; and splicing a plurality of video frames after background replacement into a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request. Therefore, in the video conference, the video stream to be called is subjected to background replacement on line and then is sent to the target terminal, and the scene requirement on the video conference can be met. Therefore, before the video conference starts, a tool does not need to be manually prepared in advance, the background of the conference place does not need to be arranged, the cost and the time consumption of the video conference can be saved, and the conference participating experience of the conference participating users in the video conference is optimized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is an implementation environment diagram of a video processing method in a video conference according to an embodiment of the present invention;

fig. 2 is a diagram of another implementation environment of a video processing method in a video conference according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps of a method for processing video in a video conference according to an embodiment of the present invention;

FIG. 4 is a flowchart of the steps for performing background replacement using child threads in an embodiment of the present invention;

FIG. 5 is a flowchart of the steps for replacing context with a main thread in an embodiment of the present invention;

FIG. 6 is a schematic diagram of the connection between the video networking terminal and the background replacement device in the embodiment of the present invention;

fig. 7 is a flowchart illustrating a processing method of a background replacement device executing a video in a video conference according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video processing apparatus in a video conference according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

The application provides the following technical ideas for solving the problem of high conference cost of a video conference with scene requirements in the related technology: and carrying out background replacement on the video stream of the video conference to be held in real time, and then sending the video stream after the background replacement to the target terminal so as to replace the background of the video conference with the set background on line without arranging a conference place.

Referring to fig. 1 and fig. 2, two implementation environment diagrams of the video processing method in a video conference according to the embodiment of the present application are shown.

As shown in fig. 1, the system comprises a plurality of participant terminals and a server, wherein the participant terminals are respectively connected with the server in a communication way. The conference participating terminal comprises a terminal for executing the video processing method in the video conference, a conference participating terminal for collecting and sending the video stream and a conference participating terminal for receiving the processed video stream. In this implementation environment, the participant terminal executes the method for processing the video in the video conference provided in the embodiment of the present application, and the participant terminal executing the method may be a participant terminal specially set in the video conference, or a participant terminal that acquires and sends a video stream, that is, the participant terminal processes the video stream after acquiring the video stream, and then sends the processed video stream to the target participant terminal through the server.

As shown in fig. 2, includes a plurality of participant terminals, a server and a background replacement device. In this implementation environment, the background replacement device executes the video processing method in the video conference provided in the embodiment of the present application. The conference participating terminal is in communication connection with the server and comprises a conference participating terminal for acquiring and sending out a video stream and a conference participating terminal for receiving the processed video stream; the background replacing equipment is in communication connection with the participant terminal and is used for acquiring video streams acquired by the participant terminal; the background replacing device is also in communication connection with the server and is used for sending the processed video stream to the target participating terminal through the server. Optionally, the background replacement device may also send the processed video stream to a participant terminal that collects the video stream, where the participant terminal that collects the video stream sends the processed video stream to the server, and then the server sends the processed video stream to the target participant terminal.

The video processing method in the video conference can be applied to video networking and can also be applied to the Internet. The participant terminal may be a terminal in a video network or a terminal in the internet, and specifically, the participant terminal may be a personal computer, a notebook computer, a smart phone, a set-top box, or the like. The server can be a video network server or an internet server.

The video network is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces. The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer. Therefore, when the video processing method in the video conference provided by the embodiment of the application is applied to the video network, high-definition real-time transmission of video streams in the video conference can be realized, so that a smoother picture playing effect is achieved, and normal and orderly operation of the video conference is ensured.

Referring to fig. 3, a flowchart illustrating a method for processing a video in a video conference executed in the implementation environment illustrated in fig. 1 or fig. 2 is shown, and as illustrated in fig. 3, the method for processing a video in a video conference specifically may include the following steps:

step S110: and responding to a video calling request of a participant terminal in the video conference, and obtaining the video stream collected by the participant terminal.

When a video calling request for at least one participant terminal in a video conference is received, responding to the video calling request, and acquiring a video stream acquired by the at least one participant terminal corresponding to the video calling request.

The video calling request may be sent by a conference control terminal in a video conference, and the video calling request may include an identifier of a source participant terminal from which a video stream is called and an identifier of a target terminal from which the video stream is called, so as to send a real-time video stream collected by the source participant terminal to the target terminal.

When the participating terminal performs the method as an execution subject, the executed participating terminal may be a speaking terminal or a chairman terminal in the conference. The participating terminal can establish communication with the source participating terminal through the server or directly, so that the video stream collected by the source participating terminal is obtained.

When the background replacement device is used as an execution subject to execute the method, the background replacement device may also obtain the video stream acquired by the source participating terminal through the server, or directly establish communication with the source participating terminal to obtain the video stream acquired by the source participating terminal. Wherein the background replacement device may be connected to the participant terminal in a wired or wireless manner.

The participating terminal for collecting the video stream is provided with a camera, and the configured camera can be externally connected with the participating terminal or can be carried by the participating terminal. The video stream collected by the camera is a video stream for shooting the environment where the participating terminal is located and surrounding people.

Step S120: replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame.

The human body image may be an image depicting a contour of a human body, or may be an image including only a partial region of the human body such as a human face and an upper half of the human body, that is, the human body image of the present application may include a part or all of the human body features.

The background image refers to an image which does not belong to the human body image, and the background image set at the current moment is the background image which needs to be replaced into the original video frame at the current moment. The setting background images can be uploaded by users, the background images at different moments in the same video conference can be different, and the background images of different video conferences at the same moment can also be different, so that the setting background images can be switched in the video conference process.

The process of uploading the setting background image may be: if the conference participating terminal executes the video processing method, the conference control terminal can directly issue the set background image to the conference participating terminal; if the background replacement device executes the video processing method of the present application, the conference control may send the set background image to the participant terminal, and then the participant terminal sends the background image to the background replacement device through an HDMI (High-Definition Multimedia Interface). Therefore, the conference control terminal can control the switching of the set background images in real time on line in the video conference.

Optionally, the process of uploading the setting background image may further be: the set background image is stored in the conference participating terminal or the background replacing device executing the application in advance, and in practice, the set background image in the video conference can be switched by directly adjusting the set background image in the conference participating terminal or the background replacing device so as to meet the actual requirement.

In order to process the captured video stream, video frame extraction needs to be performed on the video stream first, so as to obtain a plurality of original video frames of the video stream.

The background image in each original video frame is replaced by the set background image at the current moment, and the obtained video frame after background replacement can be: and for each original video frame, replacing the background image in the original video frame with the set background image at the current moment, thereby obtaining the video frame after the human body image is reserved and the background image is replaced with the set background image. And finally, splicing the plurality of background-replaced video frames according to a time sequence to obtain a background-replaced video stream.

Optionally, replacing the background image in each original video frame with the set background image at the current time, and obtaining the video frame after background replacement may further be: and removing the background image of each original video frame, splicing each video frame which only retains the human body image to obtain a video stream which only retains the human body image, and adding the set background image at the current moment to the video stream which only retains the human body image to obtain the video stream after background replacement.

Step S130: and splicing a plurality of video frames after background replacement into a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request.

The splicing here may be performed in time sequence, that is, a plurality of video frames after background replacement are fused into a video stream after background replacement in time sequence, the target terminal is a terminal that requests to acquire the video stream, and the video stream to be sent is the video stream after background replacement. In this embodiment, the participating terminal or the background replacement device executing the video processing method of the present application may send the processed video stream to the target terminal through the server, or directly establish communication with the target terminal to send the processed video stream to the target terminal. And the target terminal can decode and play the processed video stream, so that the background image is replaced by the video picture of the set background image when the target terminal watches the video picture.

By adopting the technical scheme of the embodiment of the application, the video streams acquired by the participant terminals can be obtained in response to the video calling request of the participant terminals in the video conference; replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame; and splicing a plurality of video frames after background replacement into a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request. In this way, the video picture in which the background image is replaced with the set background image is played back on the target terminal. Therefore, before the video conference is started, a manual tool preparation, arrangement of conference place backgrounds and the like are not needed, the video conference cost and time consumption can be saved, and the conference participating experience of conference participating users in the video conference is optimized.

Next, a method for processing a video in a video conference according to an embodiment of the present application is described in detail.

Optionally, as an embodiment, replacing a background image in each original video frame in the video stream with a set background image at the current time to obtain a background-replaced video frame, where the method includes:

step S210: removing background images not belonging to human body images from each original video frame in the video stream.

Firstly, each original video frame is subjected to binarization processing to obtain a binarization image, namely a black-and-white image, wherein the binarization image refers to a process of setting the gray value of a pixel point on the image to be 0 or 255, namely, the whole image presents an obvious black-and-white effect. Then, an AI (Artificial Intelligence) human body part identification technology is utilized to identify a human body of each original video frame, a rectangular area where the human body is located in each original video frame is subjected to frame selection, then the rectangular area where the human body is located is subjected to frame selection by utilizing Opencv (open source computer vision library) matrix operation, the rectangular area where the human body is located is removed from the video frames, a human body prediction frame of each original video frame is obtained, then the rest area is a background image, and in practice, the pixel color of the rest area can be set to be a white board, so that the purpose of removing the background image is achieved.

And then, only the images in the human body prediction frame need to be processed, and compared with the method of directly processing the whole image in the related technology, the method can save the computing resource.

After the human body prediction frame is obtained, the image in the human body prediction frame may be subjected to boundary delineation, specifically, the human body prediction frame is a rectangular frame, and the rectangular frame may include images of other objects in addition to the image of the human body. Each image in the human body prediction frame can be subjected to boundary delineation by using Canny (edge detection algorithm), and a plurality of image areas are obtained by using FindContours (contour function search) in Opencv. Therefore, various shapes in the human body prediction frame, such as the shapes of a human face and a clock, can be found out, and a plurality of image areas are obtained.

And then, carrying out human body recognition on the plurality of image areas to obtain a human body image with the image areas being human bodies. The human body identification can be performed by using a relevant technology, for example, the human body identification is performed by using a human body identification model, which is not described in detail in the present application.

After the human body image area in each human body prediction frame is identified, the background image which does not belong to the human body image in each human body prediction frame is removed, so that the human body prediction frame only retaining the human body image is obtained.

Step S220: processing the original video frame without the background image based on the reserved human body image in each original video frame and the set background image at the current moment to obtain a video frame with the background replaced; wherein, the video frame after the background replacement comprises the set background image and the reserved human body image.

After obtaining the human body image retained in each original video frame, fusing the set background image at the current moment with each original video frame retaining the human body image, thereby obtaining a background-replaced video frame with each human body image as a foreground and the set background image as a background, and finally splicing the background-replaced video frames according to a time sequence to obtain a background-replaced video stream.

Or, splicing each original video frame which retains the human body image according to a time sequence to obtain a video stream which only retains the human body image, and then adding a set background image at the current moment to the video stream which only retains the human body image to obtain a video stream after background replacement.

Specifically, fusing a set background image at the current moment and each original video frame which retains a human body image, including: aiming at each original video frame with a reserved human body image, acquiring a first mask image and a second mask image of the video frame; the second mask image is a mask image obtained by inverting the first mask image; obtaining a first video frame to be processed with a background image removed based on the first mask image and each video frame; obtaining a second video frame to be processed based on the second mask image and the set background image; and fusing the first video frame to be processed and the second video frame to be processed to obtain the video frame after the background is replaced.

Determining a first mask map and a second mask map, specifically: the color of the human body image area is set to be black, that is, the pixel value of the pixel point of the human body image area is set to be 0. Then, determining a first mask image and a second mask image of each video frame based on the human body image; the background area in the first mask image is invisible, and the foreground area is visible, that is, the pixel values of the pixels belonging to the area of the human body image in the first mask image are 0, and the pixel values of the pixels belonging to the other areas are 255. The second mask map is obtained by color negation of the first mask map by using a BitwiseNot (negation operation function for the image), that is, the pixel values of the pixels in the region belonging to the human body image in the second mask map are 255, and the pixel values of the pixels in the other regions are 0.

In this way, since the pixel value of the pixel point belonging to the region of the human body image in the first mask image is 0, and the pixel value of the pixel point belonging to the other region is 255, the bitwisean (a function of taking and operating on the image) can be used to perform an and operation on the first mask image and the video frame, so as to restore the color of the region of the human body image, and obtain an image colored with the human body image, that is, the first to-be-processed video frame from which the background image is removed.

When a second video frame to be processed is obtained based on the second mask image and the set background image, the pixel values of the pixel points of the region belonging to the human body image in the second mask image are 255, and the pixel values of the pixel points of the other regions are 0, when the second mask image and the set background image are subjected to and operation, the color of the image region belonging to the background in the video frame can be restored to the color of the set background image, so that the second video frame to be processed is obtained, and the second video frame to be processed is the set background image from which the human body image is removed.

The first video frame to be processed and the second video frame to be processed are fused to obtain a background-replaced video frame, and specifically, the first video frame to be processed and the second video frame to be processed are subjected to matrix addition to obtain the background-replaced video frame. When the method is adopted, the human body image in the obtained video frame after the background replacement is naturally and unobtrusively fused with the set background image.

Optionally, as an embodiment, referring to fig. 4, a flowchart illustrating a step of performing background replacement by using a child thread is shown, where removing a background image that does not belong to a human body image from each original video frame in the video stream includes: for each original video frame, the following steps are performed:

step S310: the original video frame is divided into a plurality of image blocks.

For each original video frame, in order to remove a background image that does not belong to a human body image, the original video frame may be divided into a plurality of image blocks, and then human body part recognition may be performed on each image block. The size of the area of the image block is a preset value which is convenient for human body part identification and can be adjusted; or each original video frame may be equally divided into a preset number of image blocks. And human body part identification is carried out on each image block, so that the accuracy of the result of human body identification can be improved.

For example, the original video frame is equally cross-divided into four image blocks, or divided into nine image blocks according to a squared figure.

Step S320: respectively sending the plurality of image blocks to the corresponding sub-threads; the different image blocks correspond to different sub-threads, and each sub-thread is used for removing background images which do not belong to human body images in the received image blocks.

In this embodiment, the participating terminal or the background replacing device may have a plurality of sub-threads, and the plurality of image blocks are processed in parallel by using the plurality of sub-threads. It is understood that a thread is a unit capable of performing operation scheduling by an operating system, and is included in a process, and each sub-thread in the embodiment may perform a task of removing a background image.

Specifically, after obtaining a plurality of image blocks, the plurality of image blocks may be sent to respective corresponding sub-threads, and each sub-thread removes a background image that does not belong to the human body image in the received image blocks. Wherein different image blocks may be sent to different sub-thread processes, or one or more image blocks may be sent to the same sub-thread process.

Each thread is used for removing the area belonging to the background image in the received image block so as to reserve the area of the human body image. The detailed method for removing the background image, which is not the human body image, from the received image block by the sub-thread will be described in detail later.

Step S330: and obtaining the processed image block which is returned by the sub thread and is removed with the background image.

After each sub-thread obtains the processed image block with the background image removed, the processed image block can be returned to the participating terminal or the background replacing device.

Step S340: and splicing the plurality of processed image blocks to obtain a video frame with the background image removed and the human body image reserved.

The splicing here may refer to splicing at a position, and specifically, the processed image blocks may be spliced according to positions of the processed image blocks in the original video. That is, the processed image blocks in each original video frame are restored to the corresponding positions of the original video frame and then merged into the video frame replacing the background image. The participating terminal or the background replacing device splices the plurality of processed image blocks, and each processed image block is an image block only retaining the human body part and removing the background image, so that the video frame only retaining the human body part and removing the background image is obtained after the plurality of processed image blocks are spliced.

By adopting the technical scheme of the embodiment of the application, the multiple image blocks are processed in parallel by utilizing the multiple sub-threads, the efficiency of processing the image can be effectively improved, partial images in the original video frame are processed by each sub-thread, the calculated amount of each sub-thread is reduced, the efficiency of removing background images is improved, the target terminal can receive the video stream replacing the background more quickly, and the use experience of a user is improved. And moreover, the background image of each image block of the video frame is removed by utilizing the sub-thread, so that the local details of the original video frame can be paid more attention to, the background image can be accurately removed, the accuracy of identifying the human body image area of the original video frame is improved, the background image is more accurately removed, and the video frame only retaining the human body image is obtained.

In one embodiment, when removing the background image not belonging to the human body image in the received image block, each sub-thread may perform the following steps:

step S410: and performing frame selection on the region where the human body part is located in the received image block to obtain a human body prediction frame where the human body part is located.

Each sub-thread carries out binarization processing on the received image block to obtain a binarization image, namely a black-and-white image, wherein the binarization image refers to a process of setting the gray value of a pixel point on the image to be 0 or 255, namely, the whole image presents an obvious black-and-white effect. And then, carrying out human body identification on the image block by utilizing an AI human body part identification technology, carrying out frame selection on the rectangular region where the human body is located in the image block, and then carrying out scratch-off on the rectangular region where the human body is located, which is selected out by utilizing an Opencv matrix operation, from the image block to obtain a human body prediction frame in the image block.

It can be understood that when the AI human body part identification technology is used to perform human body identification on an image block and the image block is identified not to contain a human body part, the image block can be directly discarded to save computing resources.

Step S420: and identifying the human body part of the image in the human body prediction frame to obtain an image area belonging to the human body part.

After the human body prediction frame is obtained, performing boundary drawing on the images in the human body prediction frame, performing boundary drawing on each image in the human body prediction frame by using Canny, and obtaining a plurality of image areas by using findcours in Opencv. Therefore, each shape in the human body prediction frame can be found out, and a plurality of image areas can be obtained. And then, carrying out human body identification on the plurality of image areas to obtain the image areas of the human body image.

Step S430: and removing image areas which do not belong to the human body part in the image blocks.

By adopting the embodiment, the image blocks which are sent by the plurality of sub-threads and are removed of the background image can be obtained, and then the image blocks only retaining the human body image are spliced into the blank image to obtain the video frame which is removed of the background image and retains the human body image.

By adopting the technical scheme of the embodiment of the application, the human body part of each image block can be identified, and the image blocks without the identified human body parts are directly abandoned, so that the computing resources are saved; and the human body part in the image block of the recognized human body part is subjected to frame selection, so that only the image in the frame-selected human body prediction frame is processed, the calculated amount is further reduced, and the calculation resource can be further saved.

Optionally, as an embodiment, when the image in the human body prediction frame is subjected to human body part recognition to obtain an image region of a human body part, the image in the human body prediction frame may be subjected to boundary delineation to obtain a plurality of image regions; removing a target image area of which the area of the image area is smaller than a preset area in the multiple image areas; and identifying the human body part of other image areas in the multiple image areas after the target image area is removed to obtain the image areas belonging to the human body part.

After the boundary of the image in the human body prediction frame is drawn to obtain various image areas, the target image area with the area smaller than the preset area can be removed from the image areas, so that noise points such as black points and white points can be removed, and the image can be refined.

Optionally, in an example, an image area that is an open boundary may be removed, where the open boundary refers to that two ends of the boundary are not connected, which is often seen in a scene in which an object is blocked by a human body, and in this case, the image area of the object blocked by the human body may be removed, so as to obtain a human body image with a relatively pure background.

Then, the remaining image regions from which the small-area regions are removed may be subjected to human body recognition to obtain a human body image in which the image regions are human bodies, and specifically, other image regions from which the target image regions are removed from the plurality of image regions are subjected to human body recognition to obtain a human body image in which the image regions are human bodies. Specifically, the human body recognition may use a correlation technique, for example, the human body recognition is performed by using a human body recognition model, which is not described in detail herein.

Optionally, as an embodiment, when replacing the background image in the original video frame in the video stream, to further improve the replacement efficiency, the multiple original video frames may be replaced with the background image in parallel by using multiple main threads. Referring to fig. 5, a flowchart illustrating a step of performing background replacement by using a main thread is shown, where replacing a background image in each original video frame in the video stream with a set background image at a current time to obtain a video frame after background replacement, the method may specifically include the following steps:

step S510: according to the splicing sequence of all original video frames in the video stream, respectively sending a preset number of original video frames to respective corresponding main threads each time; the main thread is respectively used for executing the step of replacing the background image in each original video frame in the video stream with the set background image at the current moment.

It is understood that the main thread is a unit that the operating system can perform operation scheduling, as with the sub-thread described above, and is included in the process, and each main thread in this embodiment may execute a task of replacing the background image.

The splicing order of the original video frames may be the order of the size of the time stamp of each original video frame, or the order of the size of the receiving time. It will be appreciated that the order of extraction and splicing of the video frames is the same. The preset number may be the number of the main threads, or other preset numbers provided according to actual requirements, for example, when each main thread only processes one original video frame, the preset number may be the number of the main threads, and when one main thread can process a plurality of original video frames, the preset number may be the total number of the original video frames that can be processed by the main thread.

Each main thread may perform a background image operation on the received original video frame according to the above-described background image replacement process. In this way, a plurality of main threads can process a preset number of original video frames in parallel, so as to perform batch background replacement on the original video frames. For a specific method for performing background replacement on an original video frame by each main thread, reference may be made to methods for performing background replacement on an original video frame in other embodiments, which are not described herein again.

Illustratively, the number of main threads is 10, during a video conference, video streams sent by participating terminals are continuously received, a plurality of original video frames are extracted from the received video streams, each original video frame carries a timestamp, the time of the original video frame in the video streams is represented, and the smaller the timestamp is, the earlier the time is represented; dividing each 10 original video frames into a batch according to the time stamp from small to large, simultaneously sending the 10 original video frames of a batch to 10 main threads each time, and simultaneously performing background replacement on the 10 original video frames by the 10 main threads to obtain 10 video frames after background replacement.

For another example, the number of the main threads is multiple, during the video conference, the video stream sent by the participating terminal is continuously received, multiple original video frames are extracted from the received video stream, each original video frame is sequentially sent to the multiple main threads for background replacement according to the received time sequence, after each main thread completes the background replacement of one original video frame, the next original video frame is sent to the main thread immediately, all the main threads are ensured to be in a working state, and thus the video frame after the background replacement continuously output by each main thread is obtained.

Step S520: and obtaining the background replaced video frame returned by each main thread.

And each main thread performs background replacement on a preset number of original video frames to obtain a preset number of background-replaced video frames, and returns the background-replaced video frames to the participating terminal or the background replacement equipment serving as the execution subject.

Step S530: and sending a plurality of video frames after background replacement as a video stream to be sent to a target terminal corresponding to the video calling request, wherein the video stream comprises: and according to the splicing sequence, splicing the video frames after the background replacement into a video stream to be sent, and sending the video stream to the target terminal.

The splicing here refers to temporal splicing, and a plurality of background-replaced video frames are fused into a background-replaced video stream in a time sequence; and splicing the video frames after the plurality of backgrounds are replaced into a video stream to be sent according to the splicing sequence, and sending the video stream to a target terminal corresponding to the video calling request. The target terminal is a terminal requesting to acquire a video stream, and the video stream to be sent is a processed video stream. The participating terminal or the background replacement device as the execution subject may transmit the processed video stream to the target terminal through the server, or may directly establish communication with the target terminal to transmit the processed video stream to the target terminal. And the target terminal can decode and play the processed video stream, so that the background image is replaced by the video picture of the set background image when the target terminal watches the video picture.

By adopting the technical scheme of the embodiment of the application, the multiple main threads can work in parallel, the background replacement of the original video frame is carried out in batch, the background replacement of the original video frame is carried out relative to a single thread, the processing of the next original video frame can be carried out only after the background replacement of the previous original video frame is finished, and the time can be saved by adopting the multiple main threads to carry out the background replacement, so that the requirement on the real-time performance of the video is met, and the use experience of a user is improved.

Certainly, when the multiple sub-threads perform background removal on the multiple image blocks of the original video frame, multiple sub-threads may be further included under each main thread, where the multiple sub-threads under each main thread may send the image blocks with the removed background to the main thread, and then the main thread may splice the multiple image blocks with the removed background according to the position relationship, and then replace the set background image into the spliced video frame, so as to obtain the video frame with the replaced background.

Optionally, as an embodiment, the number of the participating terminals is multiple, and the method further includes:

step S610: and aiming at each participating terminal, obtaining each original video frame in the video stream collected by the participating terminal.

And when the number of the source participant terminals corresponding to the video calling request is multiple, acquiring the video stream acquired by each participant terminal corresponding to the video calling request. And acquiring a plurality of original video frames according to each path of video stream. In order to facilitate splicing of the original video frames, the frame rates of the original video frames acquired for each video stream may be the same.

Step S620: and splicing the original video frames belonging to the same timestamp or the same receiving moment aiming at the plurality of participant terminals to obtain spliced video frames.

The splicing here refers to splicing based on picture size, and a plurality of original video frames are arranged end to end. The length-based splicing or the width-based splicing may be used. In length-based splicing, the length of a spliced video frame is the sum of the lengths of a plurality of video frames used for splicing. In width-based stitching, the width of the stitched video frame is the sum of the widths of the plurality of video frames used for stitching.

When a plurality of original video frames corresponding to each participating terminal are obtained, the corresponding original video frames of the participating terminals can be spliced. For example, one of the original video frames 1 corresponding to the participant terminal 1 and one of the original video frames 2 corresponding to the participant terminal 2 are spliced to obtain a spliced video frame.

Specifically, when video frame splicing is performed, multiple video frames with the same timestamp from different participant terminals may be spliced, or multiple video frames with the same receiving time and from different participant terminals may be spliced. In this way, video frames originating from different participant terminals but having the same timestamp may be stitched together into one video frame, or video frames originating from different participant terminals but having the same time of reception may be stitched together into one video frame.

Illustratively, during a video conference, video streams sent by two different participating terminals are continuously received. 100 original video frames are extracted from the video stream of each participant terminal of two video streams received at the same time, and 200 original video frames are obtained in total. The received time of every two original video frames in the 200 video frames is the same because the video streams are received at the same time. And splicing every two video frames with the same receiving time to obtain 100 spliced video frames.

As yet another example, a video stream is received that is captured by each of two different participant terminals, the video stream carrying a timestamp. 100 original video frames are extracted from the video stream of each participating terminal, and a total of 200 original video frames are obtained, wherein each original video frame carries a time stamp. The original video frames in which the timestamps are the same are stitched together. Because the time stamps of each original video frame in the same video stream are necessarily different, only video frames from different video streams will be spliced together, not video frames from the same video stream.

Step S630: and adjusting the size of the spliced video frame to the size of the original video frame.

The splicing of the video frames may refer to the arrangement of a plurality of video frames end to end, that is, the splicing is based on the picture size, so that after the splicing, the plurality of spliced video frames can be respectively adjusted to a preset size, and then the adjusted plurality of spliced video frames are synthesized into a video stream to be sent.

Step S640: replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain the video frame after background replacement, including: and replacing the background image in the spliced video frame after the size is adjusted with the set background image to obtain the video frame after the background replacement.

And after the spliced video frame with the adjusted size is obtained, replacing the background image of the spliced video frame with the set background image at the current moment to obtain the video frame with the replaced background. For the method for performing background replacement on the spliced video frame, reference may be made to the method for performing background replacement on the original video frame in other embodiments, which is not described herein again.

By adopting the technical scheme of the embodiment of the application, the video frames after the background of a plurality of participant terminals is replaced can be obtained, and the video stream to be sent can be synthesized. Therefore, the multi-channel video streams can be spliced to obtain the spliced video streams, and the spliced video streams are sent to the target terminal, so that different video conference scene requirements are met, for example, requirements of people in different physical spaces such as photo combination are met; meanwhile, the data volume of the video stream to be sent can be reduced, so that the sending speed of the video stream is improved.

Optionally, as an embodiment, after obtaining the background-replaced video frame, the method further includes:

step S710: and carrying out human body gesture recognition on the video frame after the background replacement.

The video frame after background replacement may be a video frame obtained by performing background replacement on a single original video frame, or a video frame obtained by performing background replacement on a spliced video frame. After the background-replaced video frames are obtained, human body posture recognition can be performed on each background-replaced video frame by using an AI technology. .

Step S720: and reading a prestored material image corresponding to the human body posture when the recognition result of the human body posture recognition is detected to represent the human body posture of a preset type.

And presetting various types of human body postures, wherein the preset types of human body postures are provided with corresponding prestored material images. The prestored material images uploaded by different video conferences aiming at different types of postures can be different, and the prestored material images at different moments can also be different in the same video conference. The pre-stored material images can also be uploaded by a user, and the method for uploading and switching the pre-stored material images can refer to the method for uploading and switching the set background images.

And matching the human body posture in the human body image of the video frame after background replacement with the preset type of human body posture, determining that the recognition result of the detected human body posture recognition is the matched preset type of human body posture when the matching of the human body posture and the preset type of human body posture is detected, and then reading the prestored material image corresponding to the human body posture.

For example, a prize drawing type gesture is preset, the prize drawing gesture is that two hands extend out, and a prestored material image corresponding to the prize drawing gesture is in a prize shape; and when the human body posture is recognized to be a prize drawing type posture, namely the human body is recognized to extend the two hands, reading the pre-stored prize-like image.

Step S730: and determining the image position corresponding to the preset type of human body posture in the video frame after the background replacement.

The display positions of the pre-stored material images corresponding to different preset types of human body postures can be preset, and the positions of the material images corresponding to the human body postures are determined in the video frames after the plurality of backgrounds are replaced according to the preset display positions.

Exemplarily, the human body posture is a two-hand stretching, the matched human body posture of the preset type is a prize drawing posture, the prestored material image corresponding to the prize drawing posture is in a prize shape, and the display position of the corresponding prestored material image is on hand.

Step S740: and adding a layer at the image position in the video frame after the background replacement to obtain the video frame after the layer superposition.

After image positions corresponding to different preset types of human body postures are determined, adding a layer at the image positions, wherein the layer is a prestored material image corresponding to the preset type of human body postures. And the video frame after the image layer superposition is the video frame added with the corresponding pre-stored material image at the image position corresponding to the identified human body posture of the preset type.

Exemplarily, the human body posture is that two hands extend out, the matched human body posture of the preset type is a prize drawing posture, the prestored material image corresponding to the prize drawing posture is in a prize shape, and the display position of the corresponding prestored material image is on the hand; displaying a prize image on a hand in each video frame in which a prize drawing gesture is recognized; therefore, even though the positions of the hands in different video frames are changed, as long as the positions of the hands are in the prize drawing postures with both hands stretched out, the prize figures are displayed on the hands; so that the video stream finally synthesized is in a state that the two hands of the human body extend out to draw the prize.

Optionally, the added layer may be located on the layer where the video frame is located, and respective display weights of the added layer and the layer where the video frame is located may be set, so that the pre-stored material image superimposed on the layer may be displayed in a corresponding position more naturally.

Step S750: and sending a plurality of video frames after background replacement as a video stream to be sent to a target terminal corresponding to the video calling request, wherein the video stream comprises: and taking the video frame obtained by superposing the multiple image layers as a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request.

And temporally splicing a plurality of video frames added with corresponding pre-stored material images at the image positions corresponding to the recognized human body postures of the preset type to obtain a video stream to be sent.

Because the material image is added to the image position corresponding to the human body posture in each video frame, the display position of the material image in the synthesized video stream to be sent is updated in real time. For example, in each video frame in which a prize drawing gesture is recognized, a prize-like image is displayed on the hand; therefore, even though the positions of the hands in different video frames are changed, as long as the positions of the hands are in the prize drawing postures with both hands stretched out, the prize figures are displayed on the hands; therefore, the finally synthesized video stream to be sent presents a state that the two hands of the human body stretch out to draw the prize.

It can be understood that the display position of the material image corresponding to the prize drawing pose is only roughly described in this example, and in practical applications, the display position and the display method of different pre-stored material images corresponding to different poses in different modes can be set in detail, so that the human body in the synthesized video stream is more natural. For example, the conference control terminal may set a certain time period as a prize-awarding mode, and set the mode to recognize that there is no prize when the human body stretches out the hand for the first time, and a prize appears when the human body stretches out the hand for the second time; and so on.

By adopting the technical scheme of the embodiment of the application, the original video frame or the spliced video frame can be replaced on line, and the material images can be flexibly added, so that the setting requirements on various meeting scenes in a video conference are met, for example, although the meeting personnel A and B are in different spaces, the situation that the meeting personnel A carries out award on the meeting personnel B in the set award awarding background image can be presented in the video conference. Therefore, prize awarding and the like in the video conference can be realized, the arrangement of meeting places is not needed in advance, the conference cost and time resources are saved, the interaction between people in the video conference is increased, the flexibility of the video conference is further improved, and the user experience in the video conference is optimized.

Optionally, as an embodiment, referring to fig. 6, the video network terminal and the background replacement device are connected through an HDMI capture card. The video network terminal (equipment such as an auroral starting device) is used as a participant terminal and is connected with the background replacing device through an HDMI acquisition card. And switching the participant terminal needing to replace the virtual background into a speaker by using the conference control terminal, so that the video stream collected by the participant terminal can be transmitted to each participant terminal through the video network. The terminal connected with the background replacement device outputs the video stream to the background replacement device through the HDMI line. Background elimination is carried out on the video stream after the background replacing device collects the video stream, a set virtual background is added, other image layers are added according to an instruction, and if multiple paths of images are connected to the background replacing device, multiple paths of video confluence are triggered and then processed.

Optionally, as an embodiment, with reference to a flowchart of a method for processing a video in a video conference by using a background replacement device shown in fig. 7, as shown in fig. 7, a background replacement terminal is a virtual background device, a participant terminal is a video networking terminal, a conference control terminal is a controller in the video conference, and the video networking terminal can be switched to a speaker and broadcast a video of the speaker, where the method for processing the video in the video conference includes:

the video network terminals (devices such as an aurora starting device and the like) serving as the participant terminals are switched into speakers by the conference control terminal and can be connected with the virtual background devices through the HDMI acquisition card, meanwhile, the conference control terminal can switch the virtual background devices needing to be added into the speakers so that the speakers can serve as the participant terminals to send video streams after background replacement, and therefore the video streams after background replacement can be transmitted to all the conference place terminals through the video network.

The video network terminal connected with the virtual background equipment can output the video stream to the virtual background equipment through the HDMI line.

After the virtual background device collects a video stream, an original video frame of the video stream is firstly extracted, then picture binarization processing, such as gray processing and binarization processing, is carried out on the original video frame, then coordinates of a human body region are identified by using an AI algorithm, for example, an Opencv matrix operation is used for deducting a human body rectangular region from the original video frame, then boundary drawing is carried out on the human image region by using canny, each shape in the region is found by using FinndConours of Opencv, then the shape area is calculated, the region with an excessively small area is removed, the color in the region is set to be black, then a three-channel blank image Mask (Mask image) with the same size as an original image is copied, and the deducted human image is pasted into the blank image.

Then, the image is colored by using a mask, that is, the portrait is re-colored, and for example, the mask and the artwork are subjected to an and operation by using bitwisean to obtain a portrait-colored image img1 (image 1). The mask _ inv is negated for the mask image color by using the BitwiseNot, and then the mask _ inv and the background map are anded to obtain the colored background img2 (image 2). The img1 and the img2 are added in time array to obtain a picture of the alternative background, and then the composite video stream is obtained.

After the composite video stream is obtained, the composite video stream can be directly sent to a video network server in the video network so as to be forwarded to other meeting place terminals.

Layer overlay of the video stream may also be performed, as shown in fig. 7. The image layer superposition of the video stream refers to adding pictures to the video stream image. The application scene can be in the conference ending prize-awarding link, and the certificate is output to the video picture. The specific process can be as follows: the conference control terminal sets an award mode, an award mode instruction is sent to the video network terminal, the video network terminal forwards the award mode instruction to the virtual background device, the virtual background device can detect the posture of a human body by using an AI technology, when the human body gesture is found to stretch out, the gesture position is identified, a set certificate is read, the certificate is placed to the gesture position by using a copyto method and is updated in real time, and therefore the effect that the certificate moves along with the human hand is achieved. When the gesture is withdrawn, the certificate disappears, and the lottery taker stretches out and recognizes the gesture and also presents the certificate in the lottery taker. And the layer size in the layer superposition adopts a real-time skip strategy, and the set size is transmitted to the terminal by using the conference control terminal.

The method can further perform multi-terminal character co-shooting, specifically, multi-terminal character co-shooting refers to the fact that a plurality of participant terminals are connected to the virtual background device, the virtual background device receives video streams, combines the video streams, and then processes the video streams. The multi-terminal character group photo supports remote background switching, specifically, the background is uploaded through the conference control terminal, the background is issued to the terminal through the conference control terminal, the terminal receives the background and outputs the background to the background replacing device through an HDMI line, and the background replacing device receives new background reset background data.

By adopting the technical scheme of the embodiment of the application, according to different requirements, technical means such as background replacement, layer superposition, multi-terminal figure photo combination and the like of the video stream can be combined and used at will, so that various scene requirements of the video conference are met, time and resources spent on accurate tools in advance and meeting place arrangement are reduced, and user experience is effectively improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 8, there is shown an apparatus for processing video in a video conference, the apparatus comprising:

the response module is used for responding to a video calling request of a participant terminal in a video conference and obtaining video streams acquired by the participant terminal;

Optionally, as an embodiment, the background replacement module includes:

a removing submodule, configured to remove a background image that does not belong to a human body image from each original video frame in the video stream;

the processing submodule is used for processing the original video frame without the background image based on the reserved human body image in each original video frame and the set background image at the current moment to obtain a video frame after background replacement; wherein, the video frame after the background replacement comprises the set background image and the reserved human body image.

Optionally, as an embodiment, the removing sub-module includes:

an execution unit for executing the following steps for each original video frame: dividing the original video frame into a plurality of image blocks; respectively sending the plurality of image blocks to the corresponding sub-threads; the different image blocks correspond to different sub-threads, and each sub-thread is used for removing background images which do not belong to human body images in the received image blocks; obtaining a processed image block which is returned by the sub thread and is removed of the background image; and splicing the plurality of processed image blocks to obtain a video frame with the background image removed and the human body image reserved.

Optionally, as an embodiment, the execution unit includes:

an execution subunit, configured to enable each sub-thread to perform the following steps to remove a background image, which does not belong to a human body image, in the received image block: performing frame selection on the region where the human body part is located in the received image block to obtain a human body prediction frame where the human body part is located; carrying out human body part identification on the image in the human body prediction frame to obtain an image area belonging to a human body part; and removing image areas which do not belong to the human body part in the image blocks.

Optionally, as an embodiment, the execution subunit includes:

the boundary drawing subunit is used for drawing the boundary of the image in the human body prediction frame to obtain a plurality of image areas;

a small-area removing subunit, configured to remove a target image area, of which the area of the image area is smaller than a preset area, from the multiple image areas;

and the identification subunit is used for identifying the human body part of other image areas in the multiple image areas after the target image area is removed to obtain the image areas belonging to the human body part.

Optionally, as an embodiment, the background replacement module includes:

the main thread sub-module is used for respectively sending a preset number of original video frames to respective corresponding main threads each time according to the splicing sequence of the original video frames in the video stream; the main thread is respectively used for executing the step of replacing the background image in each original video frame in the video stream with the set background image at the current moment;

the return submodule is used for obtaining the video frame after the background returned by each main thread is replaced;

the sending module comprises: and the splicing sending submodule is used for splicing the video frames after the plurality of backgrounds are replaced into a video stream to be sent according to the splicing sequence and sending the video stream to the target terminal.

Optionally, as an embodiment, the number of the participating terminals is multiple, and the apparatus further includes:

the acquisition module is used for acquiring each original video frame in the video stream acquired by each participant terminal;

the splicing module is used for splicing the original video frames belonging to the same timestamp or the same receiving moment aiming at the multiple participating terminals to obtain spliced video frames;

the adjusting module is used for adjusting the size of the spliced video frame to the size of the original video frame;

the background replacement module includes: and the replacing submodule is used for replacing the background image in the spliced video frame after the size is adjusted with the set background image to obtain the video frame after the background is replaced.

Optionally, as an embodiment, after obtaining the background-replaced video frame, the apparatus further includes:

the gesture recognition module is used for carrying out human body gesture recognition on the video frame after the background replacement;

the reading module is used for reading a prestored material image corresponding to the human body posture when the recognition result of the human body posture recognition is detected to represent the human body posture of a preset type;

the position determining module is used for determining the image position corresponding to the preset type of human body posture in the video frame after the background replacement;

the layer adding module is used for adding layers at image positions in the video frame after the background replacement to obtain a video frame after the layers are overlapped;

the sending module comprises: and the superposition sending submodule is used for sending the video frame obtained by superposing the plurality of layers as a video stream to be sent to a target terminal corresponding to the video calling request.

By adopting the technical scheme of the embodiment of the application, the processing device of the video in the video conference can respond to the video calling request of the participant terminals in the video conference to obtain the video streams respectively collected by the participant terminals; replacing the background image in each original video frame in the video stream with the set background image at the current moment to obtain a video frame after background replacement; the background image is an image area which does not belong to a human body image in an original video frame; and splicing a plurality of video frames after background replacement into a video stream to be sent, and sending the video stream to a target terminal corresponding to the video calling request. In this way, the video picture in which the background image is replaced with the set background image is played back on the target terminal. Therefore, before the video conference is started, a manual tool preparation, arrangement of conference place backgrounds and the like are not needed, the video conference cost and time consumption can be saved, and the conference participating experience of conference participating users in the video conference is optimized.

It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points.

The embodiment of the invention also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

the processor is configured to implement the steps of the video processing method in the video conference according to any one of the embodiments when executing the program stored in the memory.

An embodiment of the present invention further provides a computer-readable storage medium, where a stored computer program causes a processor to execute the method for processing a video in a video conference according to any of the above embodiments.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method, the apparatus, the electronic device and the storage medium for processing video in a video conference provided by the present invention are introduced in detail, and a specific example is applied in the present document to illustrate the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for processing video in a video conference, the method comprising:

2. The method of claim 1, wherein replacing the background image in each original video frame in the video stream with the set background image at the current time to obtain a background-replaced video frame comprises:

3. The method of claim 2, wherein removing background images not belonging to human body images from each original video frame in the video stream comprises:

for each original video frame, the following steps are performed:

dividing the original video frame into a plurality of image blocks;

4. The method of claim 3, wherein the removing the background image not belonging to the human body image in the received image blocks comprises:

5. The method according to claim 4, wherein the identifying the human body part of the image in the human body prediction frame to obtain the image area belonging to the human body part comprises:

6. The method according to any one of claims 1 to 5, wherein replacing the background image in each original video frame in the video stream with the set background image at the current time to obtain a background-replaced video frame comprises:

obtaining a background replaced video frame returned by each main thread;

7. The method according to any one of claims 1 to 5, wherein the number of the participating terminals is plural, the method further comprising:

8. The method of any of claims 1-5, wherein after obtaining the background-replaced video frame, the method further comprises:

9. An apparatus for processing video in a video conference, the apparatus comprising:

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the method for processing video in a video conference according to any one of claims 1 to 8 when executing the program stored in the memory.

11. A computer-readable storage medium storing a computer program for causing a processor to execute the method for processing video in a video conference according to any one of claims 1 to 8.