CN112188145A

CN112188145A - Video conference method and system, and computer readable storage medium

Info

Publication number: CN112188145A
Application number: CN202010988005.5A
Authority: CN
Inventors: 李璐; 冯文澜
Original assignee: Suirui Technology Group Co Ltd
Current assignee: Suirui Technology Group Co Ltd
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2021-01-05

Abstract

The invention discloses a video conference method and a system thereof, and a computer readable storage medium, wherein the video conference method comprises the following steps: monitoring the face state of a speaker in a video conference; if the face state is monitored to meet a first preset condition, replacing the real face of the speaker in the video picture with a prestored face model of the speaker; performing voice recognition on the speaker; and adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition. The video conference method and the video conference system can improve the video conference effect and improve the video conference efficiency.

Description

Video conference method and system, and computer readable storage medium

Technical Field

The present invention relates to the field of video communication technologies, and in particular, to a video conference method and system, and a computer-readable storage medium.

Background

With the development of internet technology, video conferences are more and more widely applied.

The inventor finds that in the process of implementing the invention, a large number of participants may occur in the video conference process, the time of all the participants needs to be gathered in the conference process, and when a speaker has a poor state in a video picture or frequently enters and exits the picture, the conference feeling of other people is affected, and the conference efficiency is reduced.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a video conference method and a video conference system, which can improve the video conference effect and improve the video conference efficiency.

In order to achieve the above object, the present invention provides a video conference method, including: monitoring the face state of a speaker in a video conference; if the face state is monitored to meet a first preset condition, replacing the real face of the speaker in the video picture with a prestored face model of the speaker; performing voice recognition on the speaker; and adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition.

In an embodiment of the present invention, the monitoring the face state of the speaker in the video conference includes: and acquiring the face information of the speaker in the video stream in real time.

In an embodiment of the present invention, the first preset condition includes that the eye-closing time of the speaker exceeds a first preset threshold, the number of times that the face of the speaker enters or exits the video frame within a preset time exceeds a second preset threshold, or the duration of the state that the face of the speaker is not fully displayed in the video frame exceeds a third preset threshold.

In an embodiment of the present invention, the video conference method further includes: replacing the real face of the speaker in the video picture with a pre-stored face model of the speaker, and then continuing to monitor the face state of the speaker; and if the face state of the speaker is monitored to meet a second preset condition, switching the face model of the speaker in the video picture back to the real face of the speaker.

In an embodiment of the present invention, the second preset condition is that a state duration time of a complete display of the face of the speaker in the video image exceeds a fourth preset threshold and an eye closing time of the speaker does not exceed the first preset threshold.

Based on the same inventive concept, the invention also provides a video conference method, which comprises the following steps: when a first face switching request is received, replacing the real face of the speaker in a video picture with a prestored face model of the speaker; performing voice recognition on the speaker; and adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition.

In an embodiment of the present invention, the video conference method further includes: and when a second face switching request is received, switching the face model of the speaker in the video picture back to the real face of the speaker.

Based on the same inventive concept, the invention also provides a video conference system, which comprises: the face switching device comprises a face state monitoring module, a first face switching module, a voice recognition module and a lip simulation module. The face state monitoring module is used for monitoring the face state of a speaker in the video conference. The first face switching module is coupled with the face state monitoring module and used for replacing a real face of a speaker in a video picture with a prestored face model of the speaker if the face state monitoring module monitors that the face state meets a first preset condition. The voice recognition module is coupled with the first face switching module and is used for performing voice recognition on a speaker after the first face switching module replaces the real face of the speaker in the video picture with a pre-stored face model of the speaker. And the lip simulation module is coupled with the voice recognition module and is used for adding lip simulation actions in a face model of a speaker in a video picture according to voice recognition conditions.

In an embodiment of the present invention, the face state monitoring module monitors the face state of a speaker by acquiring face information of the speaker in a video stream in real time.

In an embodiment of the present invention, the video conference system further includes: and the second face switching module is coupled with the face state monitoring module and used for switching the face model of the speaker in the video picture back to the real face of the speaker if the face state monitoring module monitors that the face state of the speaker meets a second preset condition. The second preset condition is that the state duration time of the complete display of the face of the speaker in the video picture exceeds a fourth preset threshold value and the eye closing time of the speaker does not exceed the first preset threshold value.

Based on the same inventive concept, the invention also provides a video conference system, which comprises: the third face switches module, speech recognition module, lip simulation module. And the third face switching module is used for replacing the real face of the speaker in the video picture with a prestored face model of the speaker when receiving the first face switching request. The voice recognition module is coupled with the third face switching module and is used for performing voice recognition on the speaker after the third face switching module replaces the real face of the speaker in the video picture with the pre-stored face model of the speaker. And the lip simulation module is coupled with the voice recognition module and is used for adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition.

In an embodiment of the present invention, the video conference system further includes: and the fourth face switching module. And the fourth face switching module is used for switching the face model of the speaker in the video picture back to the real face of the speaker when receiving the second face switching request.

Based on the same inventive concept, the present invention also provides a computer-readable storage medium for executing the video conference method according to any one of the above embodiments.

Compared with the prior art, according to the video conference method and system and the computer readable storage medium, the function of switching the real human face and the human face model is designed, when the speaker is in a poor state or needs to leave the camera temporarily due to a busy state, the real human face can be replaced by the human face model in an automatic or manual mode, and the lip action is restored through voice recognition, so that the sense of reality of the human face model in a video picture can be kept, the video conference effect and the video conference efficiency are improved, and the feeling of participants is improved.

Drawings

Fig. 1 is a block diagram of the steps of a video conferencing method according to an embodiment of the present invention.

Fig. 2 is a block diagram of a video conferencing system according to an embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

In order to overcome the problems in the prior art, the following embodiments of the video conference method and system and the computer-readable storage medium design a function of switching between a real face and a face model, and when a speaker is in a poor state or needs to leave a camera temporarily due to a busy state, the real face can be replaced by the face model in an automatic or manual manner.

Fig. 1 is a video conferencing method according to an embodiment of the present invention. Through this embodiment, can the automatic identification speaker's state, when the speaker state is not good or when busy, can switch into the face model with the real face in the video picture automatically, improve participant's perception, improve meeting efficiency.

The video conference method includes the following steps.

The face state of the speaker in the video conference is monitored in step S1. Specifically, the face state can be acquired by acquiring the face information of the speaker in the video stream in real time.

The real face is replaced with a face model in step S2: if the face state is monitored to meet a first preset condition, replacing the real face of the speaker in the video picture with a pre-stored face model of the speaker. The first preset condition comprises that the eye closing time of a speaker exceeds a first preset threshold, the frequency of the face of the speaker entering and exiting the video picture within the preset time exceeds a second preset threshold or the state duration time of the face of the speaker displaying incompletely in the video picture exceeds a third preset threshold. If the eye closing time of the speaker exceeds 5s, the frequency of frequently entering and exiting the video picture by the speaker within 5s exceeds 3 times, or the face of the speaker only displays 80% or the head roll angle is 45%, and the duration is 10s, the speaker can be determined to be in a poor or busy state, and the face switching can be performed at the moment.

Speech recognition is performed on the speaker in step S3.

A lip simulation action is added in step S4: and adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition.

In order to switch back to the real face when the speaker is in a good state, in a preferred embodiment, the video conference method further includes: replacing the real face of the speaker in the video picture with a pre-stored face model of the speaker, and then continuing to monitor the face state of the speaker; and if the face state of the speaker is monitored to meet a second preset condition, switching the face model of the speaker in the video picture back to the real face of the speaker. The second preset condition is that the state duration time of the face of the speaker completely displayed in the video picture exceeds a fourth preset threshold value and the eye closing time of the speaker does not exceed the first preset threshold value. For example, the whole face of the speaker is displayed for 10 seconds or more, and the eye-closed state exceeding 5 seconds does not occur.

Based on the same inventive concept, the invention also provides another video conference method, and the video conference method of one embodiment comprises the following steps: when a first face switching request is received, replacing the real face of the speaker in a video picture with a prestored face model of the speaker; performing voice recognition on the speaker; and adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition. Through the implementation mode, when a speaker is busy, the real face in the video picture can be manually controlled to be switched into the face model, so that the sensitivity of participants is improved, and the conference efficiency is improved.

In order to be able to manually switch the face model back to a real face, in a preferred embodiment, the video conference method further comprises: and when a second face switching request is received, switching the face model of the speaker in the video picture back to the real face of the speaker.

Based on the same inventive concept, the invention also provides a video conference system. As shown in fig. 2, a video conference system of an embodiment includes: the system comprises a face state monitoring module 10, a first face switching module 11, a voice recognition module 12 and a lip simulation module 13.

The face state monitoring module 10 is used for monitoring the face state of a speaker in a video conference. Specifically, the face state monitoring module 10 may monitor the face state of the speaker by acquiring the face information of the speaker in the video stream in real time.

The first face switching module 11 is coupled to the face state monitoring module 10, and configured to replace, by the first face switching module 11, a real face of a speaker in the video picture with a pre-stored face model of the speaker if the face state monitoring module 10 monitors that the face state meets a first preset condition. The first preset condition comprises one or more of the condition that the eye closing time of the speaker exceeds a first preset threshold, the frequency of the face of the speaker entering and exiting the video picture within the preset time exceeds a second preset threshold, and the duration of the state that the face of the speaker is not completely displayed in the video picture exceeds a third preset threshold.

The voice recognition module 12 is coupled to the first face switching module 11, and configured to perform voice recognition on a speaker after the first face switching module 11 replaces a real face of the speaker in the video picture with a pre-stored face model of the speaker.

The lip simulation module 13 is coupled to the voice recognition module 12, and is configured to add a lip simulation action to the face model of the speaker in the video frame according to the voice recognition condition.

Preferably, the video conference system of this embodiment further includes: and the second face switching module 14 is coupled to the face state monitoring module 10, and configured to switch the face model of the speaker in the video picture back to the real face of the speaker if the face state monitoring module 10 monitors that the face state of the speaker meets a second preset condition. The second preset condition is that the state duration time of the face of the speaker completely displayed in the video picture exceeds a fourth preset threshold value and the eye closing time of the speaker does not exceed the first preset threshold value.

Based on the same inventive concept, the invention also provides another video conference system. The video conference system of an embodiment includes: the third face switches module, speech recognition module, lip simulation module.

And the third face switching module is used for replacing the real face of the speaker in the video picture with a prestored face model of the speaker when receiving the first face switching request. The voice recognition module is coupled with the third face switching module and is used for performing voice recognition on the speaker after the third face switching module replaces the real face of the speaker in the video picture with the pre-stored face model of the speaker. And the lip simulation module is coupled with the voice recognition module and is used for adding a lip simulation action in the face model of the speaker in the video picture according to the voice recognition condition.

Preferably, the video conference system of this embodiment further includes: and the fourth face switching module. And the fourth face switching module is used for switching the face model of the speaker in the video picture back to the real face of the speaker when receiving the second face switching request.

Based on the same inventive concept, the present invention also provides a computer-readable storage medium for executing the video conference method of any one of the above embodiments.

In summary, according to the video conference method and system and the computer-readable storage medium of the embodiments, a function of switching between the real face and the face model is designed, when the speaker is in a poor state or needs to leave the camera temporarily due to a busy state, the real face can be replaced by the face model in an automatic or manual manner, and the lip motion is restored through voice recognition, so that the sense of reality of the face model in the video picture can be maintained, the video conference effect and the video conference efficiency are improved, and the experience of participants is improved. And only manage for the speaker, the resource that needs is few, and easy to deploy.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A video conferencing method, the video conferencing method comprising:

monitoring the face state of a speaker in a video conference;

if the face state is monitored to meet a first preset condition, replacing the real face of the speaker in the video picture with a prestored face model of the speaker;

performing voice recognition on the speaker; and

and adding lip simulation actions in the face model of the speaker in the video picture according to the voice recognition condition.

2. The video conferencing method of claim 1, wherein the monitoring of the face state of the speaker in the video conference comprises:

and acquiring the face information of the speaker in the video stream in real time.

3. The video conference method according to claim 1, wherein the first preset condition includes that the eye-closing time of the speaker exceeds a first preset threshold, the number of times the face of the speaker enters or exits the video screen within a preset time exceeds a second preset threshold, or the duration of the state in which the face of the speaker is not fully displayed in the video screen exceeds a third preset threshold.

4. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

replacing the real face of the speaker in the video picture with a pre-stored face model of the speaker, and then continuing to monitor the face state of the speaker;

and if the face state of the speaker is monitored to meet a second preset condition, switching the face model of the speaker in the video picture back to the real face of the speaker.

5. The video conference method according to claim 4, wherein the second preset condition is that a duration of a state in which the face of the speaker is fully displayed in the video screen exceeds a fourth preset threshold and an eye-closing time of the speaker does not exceed the first preset threshold.

6. A video conferencing method, the video conferencing method comprising:

when a first face switching request is received, replacing a real face of a speaker in a video picture with a prestored face model of the speaker;

performing voice recognition on the speaker; and

7. The video conferencing method of claim 6, wherein the video conferencing method further comprises:

and when a second face switching request is received, switching the face model of the speaker in the video picture back to the real face of the speaker.

8. A video conferencing system, the video conferencing system comprising:

the face state monitoring module is used for monitoring the face state of a speaker in the video conference;

the first face switching module is coupled with the face state monitoring module and used for replacing the real face of a speaker in a video picture with a prestored face model of the speaker if the face state monitoring module monitors that the face state meets a first preset condition;

the voice recognition module is coupled with the first face switching module and used for performing voice recognition on a speaker after the first face switching module replaces the real face of the speaker in a video picture with a pre-stored face model of the speaker; and

and the lip simulation module is coupled with the voice recognition module and is used for adding a lip simulation action in a face model of a speaker in a video picture according to the voice recognition condition.

9. A video conferencing system, the video conferencing system comprising:

the third face switching module is used for replacing the real face of the speaker in the video picture with a prestored face model of the speaker when receiving the first face switching request;

the voice recognition module is coupled with the third face switching module and used for performing voice recognition on a speaker after the third face switching module replaces the real face of the speaker in the video picture with a prestored face model of the speaker; and

and the lip simulation module is coupled with the voice recognition module and is used for adding a lip simulation action in the face model of the speaker in the video picture according to the voice recognition condition.

10. A computer-readable storage medium for performing the video conferencing method of any of claims 1-7.