CN111986703A

CN111986703A - Video conference method and system, and computer readable storage medium

Info

Publication number: CN111986703A
Application number: CN202010845755.7A
Authority: CN
Inventors: 李璐; 冯文澜
Original assignee: Suirui Technology Group Co Ltd
Current assignee: Suirui Technology Group Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-11-24
Anticipated expiration: 2040-08-20
Also published as: CN111986703B

Abstract

The invention discloses a video conference method and a system thereof, and a computer readable storage medium, wherein the video conference method comprises the following steps: confirming a current speaker in the video conference process; starting expression recognition on the current speaker in the video stream, and starting an intelligent microphone detection function when the expression of the current speaker is abnormal; and when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, a prompt is sent to the client side of the current speaker. The video conference method and the system can monitor the emotion of personnel in the video conference and realize auxiliary management of the conference.

Description

Video conference method and system, and computer readable storage medium

Technical Field

The present invention relates to the field of video communication technologies, and in particular, to a video conference method and system, and a computer-readable storage medium.

Background

With the development of internet technology, the application range of video conferences is wider and wider.

The inventor finds that in the process of implementing the invention, in the video conference process, overstrain or runaway of people can sometimes occur, so that the conference can not normally progress or even be interrupted, and the conference efficiency and effect are seriously affected, and an effective solution is not available at present.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a video conference method and system and a computer readable storage medium, which can monitor the emotion of personnel in a video conference and realize auxiliary management of the conference.

To achieve the above object, the present invention provides a video conference method, which includes: confirming a current speaker in the video conference process; starting expression recognition on the current speaker in the video stream, and starting an intelligent microphone detection function when the expression of the current speaker is abnormal; and when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, a prompt is sent to the client side of the current speaker.

In an embodiment of the present invention, the confirming the current speaker in the video conference process includes: and confirming the current speaker in the video conference process through the change of the microphone pickup.

In an embodiment of the present invention, the video conference method further includes: and when the fact that the speaking decibel of the current speaker is higher than the first threshold value for multiple times or the speaking frequency is higher than the second threshold value for multiple times is detected, sending out an emotional overstimulation prompt to the client side of the current speaker, and providing an option of turning off the voice or the camera.

In an embodiment of the present invention, the video conference method further includes: presetting and storing a plurality of keywords before a video conference is started, and allocating the grade of each keyword in advance; performing voice recognition on the current speaker in the video conference process; when the speech content of the current speaker comprises the keywords, a warning of a corresponding grade is popped up to the client of the current speaker according to the grade degree of the keywords; and when the keyword in the speech content of the current speaker is in the highest level, forbidding speaking for the current speaker.

In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is not detected to be higher than the second threshold value and the keywords are not detected to appear in the speaking content of the current speaker within a period of time, the intelligent microphone detection function is closed.

In an embodiment of the present invention, the video conference method further includes: in the video conference process, carrying out face recognition on each person in a video stream, and continuously recording the face contour position of each person for multiple times; when the overlapped part of the face contour positions of a certain person recorded twice continuously is lower than a third threshold value, recording as one large-amplitude movement; and when the number of the large-scale movements accumulated by a certain person in a period of time is detected to exceed a fourth threshold value, sending a prompt for paying attention to the client of the person.

In an embodiment of the present invention, the video conference method further includes: in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a person cannot be detected within a period of time, a prompt for paying attention to a client of the person is sent.

In an embodiment of the present invention, the video conference method further includes: and when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, sending a prompt for paying attention to important speaking contents to the client of the participant in the video conference process in a voice or text mode.

Based on the same inventive concept, the present invention also provides a video conference system, which includes: the system comprises a current speaker confirmation module, an expression recognition module and a first reminding module. The current speaker confirmation module is used for confirming a current speaker in the video conference process. The expression recognition module is coupled with the current speaker confirmation module and used for starting expression recognition on the current speaker in the video stream. The microphone intelligent detection module is coupled with the expression recognition module and used for starting microphone intelligent detection when the expression recognition module judges that the expression of the current speaker is abnormal. The first reminding module is coupled with the microphone intelligent detection module and the current speaker confirmation module and used for sending a reminding to the client of the current speaker when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected.

In one embodiment, the video conferencing system further comprises: a keyword module and a voice recognition module. The keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and pre-distributing the grade of each keyword. The voice recognition module is used for performing voice recognition on the current speaker in the video conference process, and if the content of the current speaker including the keywords is recognized, a warning of a corresponding grade is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speech content of the current speaker is identified as the highest level, forbidding speaking for the current speaker.

In one embodiment, the video conferencing system further comprises: and the closing module is used for closing the intelligent microphone detection function when the fact that the speaking decibel of the current speaker is higher than the first threshold value, the speaking frequency is higher than the second threshold value and the fact that the keywords appear in the speaking content of the current speaker are not detected within a period of time.

In one embodiment, the video conferencing system further comprises: the system comprises a face recognition module, a large-amplitude movement recording module, a second reminding module and a third reminding module. The face recognition module is used for carrying out face recognition on each person in the video stream in the video conference process and continuously recording the face contour position of each person for multiple times. The large-amplitude movement recording module is coupled with the face recognition module and used for recording as one large-amplitude movement when the overlapped part of the face outline position of a certain person recorded twice continuously is lower than a third threshold value. And the second reminding module is coupled with the large-amplitude movement recording module and the face recognition module and is used for sending a reminding of paying attention to a client of a certain person in a voice or text mode when the number of times of large-amplitude movement of the person accumulated in a period of time exceeds a fourth threshold value is detected. The third reminding module is coupled with the face recognition module and is used for sending a reminding to the client of a certain person to pay attention when the face recognition module cannot detect the face information of the person within a continuous period of time.

In one embodiment, the video conferencing system further comprises: and the fourth reminding module is coupled with the microphone intelligent detection module and is used for sending a reminding for paying attention to the key speaking content to the client of the participant in the video conference process when the microphone intelligent detection module detects that the speaking decibel of the current speaker is higher than the first threshold and/or the speaking frequency is higher than the second threshold.

Based on the same inventive concept, the present invention also provides a computer-readable storage medium for performing the video conference method according to any one of claims 1 to 8.

Compared with the prior art, according to the video conference method and the video conference system, expression recognition is carried out on a speaker, after the expression is judged to be abnormal preliminarily, intelligent detection of a microphone is triggered, then the voice of the speaker is judged, whether the emotion of the speaker is out of control or not is judged according to the expression and the voice, the emotion of the speaker can be judged accurately, further, when the emotion and the voice of the speaker are abnormal, a prompt is given out, the speaker in a conference is guided to calm down, the purpose of auxiliary management of the conference is achieved, and the video conference efficiency can be improved.

Drawings

FIG. 1 is a video conferencing method according to an embodiment of the present invention;

fig. 2 is a video conferencing system according to an embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

Fig. 1 is a video conference method according to an embodiment of the present invention, the video conference method including: step S1 to step S3.

The current speaker is confirmed in step S1. The current speaker during the video conference can be confirmed through the change of the microphone pickup.

And in step S2, the intelligent microphone detection function is started after the expression abnormality is determined. Specifically, expression recognition is started for a current speaker in the video stream, and if the expression of the current speaker is abnormal, such as anger, irritability, excitement and the like, a microphone intelligent detection function is started.

In step S3, an emotional alert is performed. And when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, reminding is sent to the client side of the current speaker in a voice or text mode. Specifically, when the fact that the speaking decibel of the current speaker is higher than 10% of the normal decibel of the speaker for multiple times (for example, 2 times is higher than 70 decibels) or the speaking frequency exceeds 10% of the normal frequency of the speaker for multiple times is detected, an emotional overstimulation prompt is sent to a client of the current speaker to remind whether the speaker needs to suspend and then continues to participate in the conference, an option of closing the voice and/or the camera can be provided, and the speaker can automatically close the voice and/or the camera under the condition of losing states.

According to the embodiment, the expression recognition is firstly carried out on the speaker, after the expression is judged to be abnormal preliminarily, the microphone is triggered to carry out intelligent detection, then the voice of the speaker is judged, whether the emotion of the speaker is out of control or not is judged according to the two conditions of the expression and the voice, the emotion of the speaker can be judged accurately, furthermore, when the emotion and the voice of the speaker are abnormal, a prompt is sent out to guide the speaker in the conference to calm down, the purpose of auxiliary management of the conference is achieved, and the conference efficiency can be improved. In addition, the method only manages speakers, and the required resources are small.

Preferably, in order to improve the conference quality, the video conference method of an embodiment further includes: presetting and storing a plurality of keywords before a video conference is started, and allocating the grade of each keyword in advance; performing voice recognition on a current speaker in the video conference process; and if the speech content of the current speaker comprises the keywords, popping up a warning of a corresponding grade to the client of the current speaker according to the grade degree of the keywords, and if the keywords in the speech content of the current speaker are the highest grade, forbidding speaking for the current speaker. The keywords can be names of honours, special events, forbidden names, non-civilized terms and the like, the keyword grades can be set to be primary grade, secondary grade, tertiary grade and the like, and if the non-civilized terms are primary grade, a warning is directly popped up and a language is forbidden; and triggering the intelligent microphone detection function if the privacy of the person is related to secondary warning, carrying out corresponding processing according to the result, and giving a prompt to please the spokesman to pay attention to the courtesy if the name of the caller is tertiary warning.

Preferably, in order to save network resources, the video conference method further comprises: and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and the keywords are not detected to appear in the speaking content of the current speaker within a period of time, the intelligent detection function of the microphone is closed.

Preferably, in order to further improve the effect of the conference, the video conference method further includes: in the video conference process, carrying out face recognition on each person in a video stream, and continuously recording the face contour position of each person for multiple times; when the overlapped part of the facial contour positions recorded by a certain person twice continuously is lower than a third threshold value, recording as one large-amplitude movement; and when the number of the large-scale movements of a certain person accumulated in a period of time is detected to exceed a fourth threshold value, sending a prompt for paying attention to a client of the person in a voice or text mode.

Preferably, in order to further improve the effect of the conference, the video conference method further includes: in the video conference process, face recognition is carried out on each person in a video stream, and when face information of a person cannot be detected within a continuous period of time, a prompt asking for attention is sent to a client of the person in a voice or text mode.

Preferably, the video conference method further comprises: and when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, a prompt for paying attention to the important speaking content is sent to the client side of the participant in the video conference process in a voice or text mode. This embodiment can use in the remote teaching field, and in the lecture of lecturer, if the lecturer sound grow, the system sends the warning to student's customer end, reminds student here to be the key point, please pay attention to the listening. The reminding mode can be a striking screen character reminding mode, a 2-3-time screen flashing reminding mode, a system sending a short special prompt sound of 'dripping' and the like, so that the overlong system reminding mode is avoided, and the sound of a teacher is covered.

Based on the same inventive concept, the present embodiment further provides a video conference system, as shown in fig. 2, the system includes: the system comprises a current speaker confirmation module 10, an expression recognition module 11, a microphone intelligent detection module 12 and a first reminding module 13.

The current speaker confirmation module 10 is configured to confirm a current speaker during a video conference, and in particular, the present embodiment confirms the current speaker during the video conference through changes of microphone sound pickup.

The expression recognition module 11 is coupled to the current speaker determination module 10 and configured to initiate expression recognition for a current speaker in the video stream.

The microphone intelligent detection module 12 is coupled with the expression recognition module 11, and is configured to start microphone intelligent detection when the expression recognition module 11 determines that the expression of the current speaker is abnormal.

The first reminding module 13 is coupled to the microphone intelligent detection module 12 and the current speaker confirmation module 10, and configured to send a reminder to a client of the current speaker in a form of voice or text when it is detected that the speaking decibel of the current speaker is higher than the first threshold and/or the speaking frequency is higher than the second threshold. The reminding module is also used for sending out an emotional overstimulation reminder to the client side of the current speaker and providing an option for closing the voice or the camera when detecting that the speaking decibel of the current speaker is higher than a first threshold for multiple times or the speaking frequency is higher than a second threshold for multiple times.

Preferably, in order to improve the conference quality, in an embodiment, the video conference system further includes: a keyword module and a voice recognition module.

The keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and pre-distributing the grade of each keyword.

The voice recognition module is used for performing voice recognition on the current speaker in the video conference process, and if the content of the current speaker including the keywords is recognized, a warning of a corresponding grade is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speech content of the current speaker is identified as the highest level, forbidding speaking for the current speaker.

Preferably, in order to save network resources, in an embodiment, the video conference system further includes: and the closing module is used for closing the intelligent microphone detection function when the fact that the speaking decibel of the current speaker is higher than the first threshold value, the speaking frequency is higher than the second threshold value and the fact that the keywords appear in the speaking content of the current speaker are not detected within a period of time.

Preferably, in order to further improve the effect of the conference, in an embodiment, the video conference system further includes: the system comprises a face recognition module, a large-amplitude movement recording module, a second reminding module and a third reminding module.

The face recognition module is used for carrying out face recognition on each person in the video stream in the video conference process, and continuously recording the face contour position of each person for multiple times (for example, recording every 1 second).

The large-amplitude movement recording module is coupled with the face recognition module and is used for recording a large-amplitude movement when the coincidence part of the face contour positions recorded by a person twice in succession is lower than a third threshold (such as 65%).

And the second reminding module is coupled with the large-amplitude movement recording module and the face recognition module and is used for sending a reminding of paying attention to a client of a certain person in a voice or text mode when the number of times of large-amplitude movement of the person in a certain period of time exceeds a fourth threshold (for example, the number of times of large-amplitude movement exceeds 6 within 1 minute).

The third reminding module is coupled with the face recognition module, and is used for sending a reminding asking for attention to the client of a person in a voice or text mode when the face recognition module cannot detect the face information of the person within a continuous period of time (such as within 10 seconds).

In one embodiment, the video conferencing system further comprises: and a fourth reminding module, coupled to the microphone intelligent detection module 12, configured to send a reminder to please pay attention to the important speech content to the client of the participant in the video conference process in a form of voice or text when the microphone intelligent detection module 12 detects that the current speaker speaks decibels higher than the first threshold and/or the speaking frequency is higher than the second threshold. This embodiment can use in the remote teaching field, and in the lecture of lecturer, if the lecturer sound grow, the system sends the warning to student's customer end, reminds student here to be the key point, please pay attention to the listening. The reminding mode can be a striking screen character reminding mode, a 2-3-time screen flashing reminding mode, a system sending a short special prompt sound of 'dripping' and the like, so that the overlong system reminding mode is avoided, and the sound of a teacher is covered.

Based on the same inventive concept, the present embodiment also provides a computer-readable storage medium for executing the video conference method according to any one of the above embodiments.

In summary, according to the video conference method and system of the embodiment, firstly, the expression of the speaker is recognized, after the expression is judged to be abnormal preliminarily, the microphone is triggered to perform intelligent detection, then, the voice of the speaker is judged, whether the emotion of the speaker is out of control is judged according to the two conditions of the expression and the voice, the emotion of the speaker can be judged more accurately, further, when the emotion and the voice of the speaker are abnormal, a prompt is given to guide the speaker in the conference to calm down, the purpose of auxiliary management of the conference is achieved, and the video conference efficiency can be improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A video conferencing method, comprising:

confirming a current speaker in the video conference process;

starting expression recognition on the current speaker in the video stream, and starting an intelligent microphone detection function when the expression of the current speaker is abnormal;

and when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, a prompt is sent to the client side of the current speaker.

2. The video conferencing method of claim 1, wherein the confirming the current speaker during the video conference comprises:

and confirming the current speaker in the video conference process through the change of the microphone pickup.

3. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

and when the fact that the speaking decibel of the current speaker is higher than the first threshold value for multiple times or the speaking frequency is higher than the second threshold value for multiple times is detected, sending out an emotional overstimulation prompt to the client side of the current speaker, and providing an option of turning off the voice or the camera.

4. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

presetting and storing a plurality of keywords before a video conference is started, and allocating the grade of each keyword in advance;

performing voice recognition on the current speaker in the video conference process;

when the speech content of the current speaker comprises the keywords, a warning of a corresponding grade is popped up to the client of the current speaker according to the grade degree of the keywords;

and when the keyword in the speech content of the current speaker is in the highest level, forbidding speaking for the current speaker.

5. The video conferencing method of claim 4, wherein the video conferencing method further comprises:

and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is not detected to be higher than the second threshold value and the keywords are not detected to appear in the speaking content of the current speaker within a period of time, the intelligent microphone detection function is closed.

6. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

in the video conference process, carrying out face recognition on each person in a video stream, and continuously recording the face contour position of each person for multiple times;

when the overlapped part of the contour position of the face of a certain person recorded twice continuously is lower than a third threshold value, recording as one large-amplitude movement;

and when the number of the large-scale movements accumulated by a certain person in a period of time is detected to exceed a fourth threshold value, sending a prompt for paying attention to the client of the person.

7. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a person cannot be detected within a period of time, a prompt for paying attention to a client of the person is sent.

8. The video conferencing method of claim 1, wherein the video conferencing method further comprises:

and when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected, sending a prompt for paying attention to important speaking content to the client side of the participant in the video conference process.

9. A video conferencing system, comprising:

the current speaker confirmation module is used for confirming a current speaker in the video conference process;

the expression recognition module is coupled with the current speaker confirmation module and used for starting expression recognition on the current speaker in the video stream;

the microphone intelligent detection module is coupled with the expression recognition module and used for starting microphone intelligent detection when the expression recognition module judges that the expression of the current speaker is abnormal;

and the reminding module is coupled with the microphone intelligent detection module and the current speaker confirmation module and is used for sending a reminder to the client of the current speaker when the fact that the speaking decibel of the current speaker is higher than a first threshold and/or the speaking frequency is higher than a second threshold is detected.

10. A computer-readable storage medium for performing the video conferencing method of any of claims 1-8.