CN111986703B

CN111986703B - Video conference method and system, and computer readable storage medium

Info

Publication number: CN111986703B
Application number: CN202010845755.7A
Authority: CN
Inventors: 李璐; 冯文澜
Original assignee: Suirui Technology Group Co Ltd
Current assignee: Suirui Technology Group Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2023-05-26
Anticipated expiration: 2040-08-20
Also published as: CN111986703A

Abstract

The invention discloses a video conference method and a system and a computer readable storage medium, wherein the video conference method comprises the following steps: confirming a current speaker in the video conference process; starting expression recognition for the current speaker in the video stream, and starting a microphone intelligent detection function when the expression of the current speaker is abnormal; and when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value, a prompt is sent to a client of the current speaker. The video conference method and the video conference system can monitor the emotion of the personnel in the video conference, and realize auxiliary management of the conference.

Description

Video conference method and system, and computer readable storage medium

Technical Field

The present invention relates to the field of video communication technologies, and in particular, to a video conference method and system, and a computer readable storage medium.

Background

With the development of internet technology, video conferences are increasingly being applied.

The inventor finds that in the process of realizing the invention, in the video conference process, sometimes, the abnormal emotion or out of control of people occurs, so that the conference cannot normally progress, even is interrupted, the conference efficiency and effect are seriously affected, and no effective solution exists at present.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a video conference method and system and a computer readable storage medium, which can monitor the emotion of a person in a video conference and realize auxiliary management of the conference.

To achieve the above object, the present invention provides a video conference method, comprising: confirming a current speaker in the video conference process; starting expression recognition for the current speaker in the video stream, and starting a microphone intelligent detection function when the expression of the current speaker is abnormal; and when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value, a prompt is sent to a client of the current speaker.

In one embodiment of the present invention, the identifying the current speaker in the video conference process includes: the current speaker in the video conference process is confirmed by the change of the microphone pick-up.

In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is detected to be higher than the first threshold value or the speaking frequency is detected to be higher than the second threshold value for a plurality of times, sending an emotion overdriving prompt to the client of the current speaker, and providing an option of closing the voice or the camera.

In an embodiment of the present invention, the video conference method further includes: presetting and storing a plurality of keywords before starting a video conference, and pre-distributing the grades of the keywords; performing voice recognition on the current speaker in the video conference process; when the speaking content of the current speaker comprises the keyword, popping up a warning of a corresponding grade to the client of the current speaker according to the grade degree of the keyword; and when the keyword in the speaking content of the current speaker is at the highest level, the keyword is forbidden to the current speaker.

In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is not detected to be higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time, the microphone intelligent detection function is turned off.

In an embodiment of the present invention, the video conference method further includes: in the video conference process, face recognition is carried out on each person in the video stream, and the face contour positions of each person are recorded continuously for many times; when the overlapping part of the facial contour position of a certain person recorded twice successively is lower than a third threshold value, recording as one-time large movement; and when detecting that the number of times of large movement accumulated by a certain person in a period of time exceeds a fourth threshold value, sending a notice-focusing reminding to the client of the person.

In an embodiment of the present invention, the video conference method further includes: in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a certain person cannot be detected within a period of time, a notice reminding for asking attention is sent to a client of the person.

In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is higher than the first threshold value and/or the speaking frequency is higher than the second threshold value, sending a prompt for asking to pay attention to the key speaking content to the clients of the participants in the video conference process in the form of voice or text.

Based on the same inventive concept, the present invention also provides a video conference system, comprising: the system comprises a current speaker confirmation module, an expression recognition module and a first reminding module. The current speaker confirmation module is used for confirming the current speaker in the video conference process. The expression recognition module is coupled with the current speaker confirmation module and is used for starting expression recognition for the current speaker in the video stream. The microphone intelligent detection module is coupled with the expression recognition module and is used for starting microphone intelligent detection when the expression recognition module judges that the expression of the current speaker is abnormal. The first reminding module is coupled with the microphone intelligent detection module and the current speaker confirmation module and is used for sending reminding to the client of the current speaker when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value.

In one embodiment, the video conferencing system further comprises: keyword module and speech recognition module. The keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and preassigned the grades of the keywords. The voice recognition module is used for carrying out voice recognition on the current speaker in the video conference process, and if the voice recognition module recognizes that the voice content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speaking content of the current speaker is identified as the highest level, the keyword is forbidden to the current speaker.

In one embodiment, the video conferencing system further comprises: and the closing module is used for closing the microphone intelligent detection function when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time.

In one embodiment, the video conferencing system further comprises: face recognition module, remove record module, second warning module, third warning module by a wide margin. The face recognition module is used for recognizing the face of each person in the video stream in the video conference process, and continuously recording the face contour positions of each person for a plurality of times. The large-amplitude movement recording module is coupled with the face recognition module and is used for recording that the large-amplitude movement is performed once when the superposition part of the face outline position of a certain person recorded twice in succession is lower than a third threshold value. The second reminding module is coupled with the large-amplitude movement recording module and the face recognition module and is used for sending a notice reminding to the client of a person in a voice or text mode when the fact that the accumulated large-amplitude movement times of the person in a period of time exceeds a fourth threshold value is detected. The third reminding module is coupled with the face recognition module and is used for sending out attention-requesting reminding to the client of a person when the face recognition module can not detect the face information of the person in a continuous period of time.

In one embodiment, the video conferencing system further comprises: and the fourth reminding module is coupled with the microphone intelligent detection module and is used for sending a reminding of paying attention to the key speaking content to the clients of the participants in the video conference process when the microphone intelligent detection module detects that the speaking decibel of the current speaker is higher than the first threshold value and/or the speaking frequency is higher than the second threshold value.

Based on the same inventive concept, the present invention also provides a computer-readable storage medium for performing the video conference method as claimed in any one of claims 1 to 8.

Compared with the prior art, according to the video conference method and system, the expression recognition is firstly carried out on the speaker, after the occurrence of the abnormality of the expression is primarily judged, the microphone is triggered to carry out intelligent detection, then the sound of the speaker is judged, whether the emotion of the speaker is out of control or not is judged according to the two conditions of the expression and the sound, the emotion of the speaker can be accurately judged, further, when the emotion and the sound of the speaker are abnormal, a prompt is sent, the speaker in the conference is guided to calm down, and the aim of auxiliary management of the conference is fulfilled.

Drawings

Fig. 1 is a video conference method according to an embodiment of the present invention;

fig. 2 is a video conferencing system in accordance with an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention is, therefore, to be taken in conjunction with the accompanying drawings, and it is to be understood that the scope of the invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the term "comprise" or variations thereof such as "comprises" or "comprising", etc. will be understood to include the stated element or component without excluding other elements or components.

Fig. 1 is a video conference method according to an embodiment of the present invention, the video conference method including: step S1 to step S3.

The current speaker is confirmed in step S1. The current speaker during the videoconference can be confirmed by a change in microphone pickup.

And starting the intelligent microphone detection function after judging that the expression is abnormal in the step S2. Specifically, the expression recognition is started for the current speaker in the video stream, and if the expression of the current speaker is abnormal, such as anger, violence, excitation and the like, the intelligent microphone detection function is started.

In step S3, emotion reminding is performed. And when the speaking decibel of the current speaker is detected to be higher than the first threshold value and/or the speaking frequency is detected to be higher than the second threshold value, sending a prompt to the client of the current speaker in the form of voice or characters. The system can record the decibels and the frequency of the frequent speaking of the speaker, and take the decibels and the frequency of the frequent speaking of the speaker as the basis for determining the first threshold and the second threshold, specifically, in this embodiment, when detecting that the current speaking decibels of the speaker are higher than 10% of the normal decibels of the speaker (such as more than 70 db for 2 times) or the speaking frequency exceeds 10% of the normal frequency of the speaker for many times, the system sends an emotion overstress prompt to the client of the current speaker to remind the speaker whether to continue to participate in the conference after suspension, and can provide an option of closing the voice and/or the camera, and the speaker can automatically close the voice and/or the camera under the condition of losing state of the speaker.

According to the method, the emotion of the speaker can be accurately judged by firstly carrying out the emotion recognition on the speaker, triggering the microphone to carry out intelligent detection after the occurrence of the abnormality of the emotion is primarily judged, then judging the sound of the speaker, judging whether the emotion of the speaker is out of control according to the two conditions of the emotion and the sound, and further, when the emotion and the sound of the speaker are abnormal, reminding is sent out, the speaker in the conference is led to calm down, and the aim of auxiliary conference management is fulfilled. In addition, only the speaker is managed in the method, and the required resources are small.

Preferably, in order to improve the conference quality, the video conference method of an embodiment further includes: presetting and storing a plurality of keywords before starting a video conference, and pre-distributing the grades of the keywords; performing voice recognition on the current speaker in the video conference process; if the speaking content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords, and if the keywords in the speaking content of the current speaker are at the highest grade, the current speaker is forbidden. The keywords can be the name of the honour person, special events, names forbidden to appear, non-civilized expressions and the like, and the keyword level can be set to be first-level, second-level, third-level and the like, if the non-civilized expressions are first-level, warning is directly popped up and forbidden; and if the personal privacy is related to a secondary warning, triggering the intelligent detection function of the microphone, performing corresponding processing according to the result, and if the direct caller name is related to a tertiary warning, giving a prompt to ask the speaker to pay attention to the politics.

Preferably, in order to save network resources, the video conference method further comprises: and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and keywords appear in the speaking content of the current speaker are not detected within a period of time, the microphone intelligent detection function is turned off.

Preferably, in order to be able to further improve the effect of the conference, the video conference method further comprises: in the video conference process, face recognition is carried out on each person in the video stream, and the face contour positions of each person are recorded continuously for many times; when the overlapping part of the facial contour positions recorded by a certain person twice continuously is lower than a third threshold value, recording as one large movement; when detecting that the accumulated large movement times of a certain person exceeds a fourth threshold value within a period of time, sending a notice reminding to the client of the person in a voice or text mode.

Preferably, in order to be able to further improve the effect of the conference, the video conference method further comprises: in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a certain person cannot be detected in a continuous period of time, a prompt for asking attention is sent to a client of the person in a voice or text mode.

Preferably, the video conference method further comprises: when the speaking decibel of the current speaker is detected to be higher than the first threshold value and/or the speaking frequency is detected to be higher than the second threshold value, a prompt for paying attention to the key speaking content is sent to the clients of the participants in the video conference process in the form of voice or words. This embodiment can be applied in the field of remote teaching, in the lecture of a lecturer, if the lecturer sound becomes loud, the system gives a reminder to the student client, reminding the student that this is the focus, please note listening. The reminding mode can be that the reminding is carried out by a striking screen character reminding mode, a 2-3 times of screen flashing reminding mode, a short-term special reminding sound of 'dripping', and the like, so that the long reminding of the system is prevented from covering the sound of a lecturer.

Based on the same inventive concept, there is also provided a video conference system according to this embodiment, as shown in fig. 2, the system including: the system comprises a current speaker confirmation module 10, an expression recognition module 11, a microphone intelligent detection module 12 and a first reminding module 13.

The current speaker verification module 10 is configured to verify a current speaker during a video conference, specifically, the present embodiment verifies the current speaker during the video conference through a change in microphone pickup.

The expression recognition module 11 is coupled to the current speaker verification module 10 for initiating expression recognition for the current speaker in the video stream.

The microphone intelligent detection module 12 is coupled to the expression recognition module 11, and is configured to start microphone intelligent detection when the expression recognition module 11 determines that the expression of the current speaker is abnormal.

The first reminding module 13 is coupled to the microphone intelligent detection module 12 and the current speaker verification module 10, and is configured to send a reminder to the client of the current speaker in the form of voice or text when detecting that the speaking decibel of the current speaker is higher than the first threshold and/or the speaking frequency is higher than the second threshold. The reminding module is also used for sending out an emotion overdriving reminding to the client of the current speaker and providing an option of closing the voice or the camera when the fact that the speaking decibel of the current speaker is higher than the first threshold value for a plurality of times or the speaking frequency is higher than the second threshold value for a plurality of times is detected.

Preferably, in order to improve the conference quality, in an embodiment, the video conference system further comprises: keyword module and speech recognition module.

The keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and preassigned the grades of the keywords.

The voice recognition module is used for carrying out voice recognition on the current speaker in the video conference process, and if the voice recognition module recognizes that the voice content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speaking content of the current speaker is identified as the highest level, the keyword is forbidden to the current speaker.

Preferably, in order to save network resources, in an embodiment, the videoconferencing system further comprises: and the closing module is used for closing the microphone intelligent detection function when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time.

Preferably, in order to further improve the effect of the conference, in an embodiment, the video conference system further comprises: face recognition module, remove record module, second warning module, third warning module by a wide margin.

The face recognition module is used for recognizing the face of each person in the video stream in the video conference process, and continuously recording the face contour position of each person for a plurality of times (for example, recording every 1 second).

The large-amplitude movement recording module is coupled with the face recognition module and is used for recording a large-amplitude movement when the overlapping part of the face outline positions recorded by a person twice successively is lower than a third threshold value (such as 65%).

The second reminding module is coupled with the large-amplitude movement recording module and the face recognition module and is used for sending a notice reminding to the client of a person in a voice or text mode when the accumulated large-amplitude movement times of the person in a period of time exceeds a fourth threshold (for example, more than 6 times in 1 minute) are detected.

The third reminding module is coupled with the face recognition module, and is used for sending a notice reminding to the client of a person in a voice or text mode when the face recognition module cannot detect the face information of the person in a continuous period of time (such as 10 seconds).

In one embodiment, the video conferencing system further comprises: and the fourth reminding module is coupled with the microphone intelligent detection module 12 and is used for sending a reminding of paying attention to important speaking contents to the clients of the participants in the video conference process in a voice or text mode when the microphone intelligent detection module 12 detects that the speaking decibel of the current speaker is higher than the first threshold value and/or the speaking frequency is higher than the second threshold value. This embodiment can be applied in the field of remote teaching, in the lecture of a lecturer, if the lecturer sound becomes loud, the system gives a reminder to the student client, reminding the student that this is the focus, please note listening. The reminding mode can be that the reminding is carried out by a striking screen character reminding mode, a 2-3 times of screen flashing reminding mode, a short-term special reminding sound of 'dripping', and the like, so that the long reminding of the system is prevented from covering the sound of a lecturer.

Based on the same inventive concept, the present embodiment also provides a computer-readable storage medium for performing the video conference method of any one of the above embodiments.

In summary, according to the video conference method and system of the present embodiment, firstly, the expression recognition is performed on the speaker, after the occurrence of the abnormality of the expression is primarily determined, the microphone is triggered to perform intelligent detection, then the sound of the speaker is determined, and whether the emotion of the speaker is out of control is determined according to two conditions of the expression and the sound, so that the emotion of the speaker can be accurately determined.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A method of video conferencing, comprising:

confirming a current speaker in the video conference process;

starting expression recognition for the current speaker in the video stream, and starting a microphone intelligent detection function when the expression of the current speaker is abnormal;

when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value, a prompt is sent to a client of the current speaker;

the video conference method further comprises the following steps:

presetting and storing a plurality of keywords before starting a video conference, and pre-distributing the grades of the keywords;

performing voice recognition on the current speaker in the video conference process;

when the speaking content of the current speaker comprises the keyword, popping up a warning of a corresponding grade to the client of the current speaker according to the grade degree of the keyword;

and when the keyword in the speaking content of the current speaker is at the highest level, the keyword is forbidden to the current speaker.

2. The video conferencing method of claim 1 wherein the identifying the current speaker in the video conferencing process comprises:

the current speaker in the video conference process is confirmed by the change of the microphone pick-up.

3. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

and when the speaking decibel of the current speaker is detected to be higher than the first threshold value or the speaking frequency is detected to be higher than the second threshold value for a plurality of times, sending an emotion overdriving prompt to the client of the current speaker, and providing an option of closing the voice or the camera.

4. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is not detected to be higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time, the microphone intelligent detection function is turned off.

5. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

in the video conference process, face recognition is carried out on each person in the video stream, and the face contour positions of each person are recorded continuously for many times;

when the overlapping part of the facial contour position of a certain person recorded twice continuously is lower than a third threshold value, recording as one-time large movement;

and when detecting that the number of times of large movement accumulated by a certain person in a period of time exceeds a fourth threshold value, sending a notice-focusing reminding to the client of the person.

6. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a certain person cannot be detected within a period of time, a notice reminding for asking attention is sent to a client of the person.

7. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

and when the speaking decibel of the current speaker is detected to be higher than the first threshold value and/or the speaking frequency is detected to be higher than the second threshold value, sending a reminder of asking to pay attention to the key speaking content to the clients of the participants in the video conference process.

8. A video conferencing system, comprising:

the current speaker confirming module is used for confirming the current speaker in the video conference process;

the expression recognition module is coupled with the current speaker confirmation module and is used for starting expression recognition for the current speaker in the video stream;

the microphone intelligent detection module is coupled with the expression recognition module and is used for starting microphone intelligent detection when the expression recognition module judges that the expression of the current speaker is abnormal;

the reminding module is coupled with the microphone intelligent detection module and the current speaker confirmation module and is used for sending a reminder to a client of the current speaker when detecting that the speaking decibel of the current speaker is higher than a first threshold value and/or the speaking frequency is higher than a second threshold value;

the keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and preassigned the grades of the keywords;

the voice recognition module is used for carrying out voice recognition on the current speaker in the video conference process, and if the recognition result shows that the speaking content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speaking content of the current speaker is identified as the highest level, the keyword is forbidden to the current speaker.

9. A computer readable storage medium for performing the videoconferencing method of any of claims 1-7.