CN111986703B - Video conference method and system, and computer readable storage medium - Google Patents

Video conference method and system, and computer readable storage medium Download PDF

Info

Publication number
CN111986703B
CN111986703B CN202010845755.7A CN202010845755A CN111986703B CN 111986703 B CN111986703 B CN 111986703B CN 202010845755 A CN202010845755 A CN 202010845755A CN 111986703 B CN111986703 B CN 111986703B
Authority
CN
China
Prior art keywords
current speaker
speaking
video conference
threshold value
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010845755.7A
Other languages
Chinese (zh)
Other versions
CN111986703A (en
Inventor
李璐
冯文澜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suirui Technology Group Co Ltd
Original Assignee
Suirui Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suirui Technology Group Co Ltd filed Critical Suirui Technology Group Co Ltd
Priority to CN202010845755.7A priority Critical patent/CN111986703B/en
Publication of CN111986703A publication Critical patent/CN111986703A/en
Application granted granted Critical
Publication of CN111986703B publication Critical patent/CN111986703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H17/00Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves, not provided for in the preceding groups
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Abstract

The invention discloses a video conference method and a system and a computer readable storage medium, wherein the video conference method comprises the following steps: confirming a current speaker in the video conference process; starting expression recognition for the current speaker in the video stream, and starting a microphone intelligent detection function when the expression of the current speaker is abnormal; and when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value, a prompt is sent to a client of the current speaker. The video conference method and the video conference system can monitor the emotion of the personnel in the video conference, and realize auxiliary management of the conference.

Description

Video conference method and system, and computer readable storage medium
Technical Field
The present invention relates to the field of video communication technologies, and in particular, to a video conference method and system, and a computer readable storage medium.
Background
With the development of internet technology, video conferences are increasingly being applied.
The inventor finds that in the process of realizing the invention, in the video conference process, sometimes, the abnormal emotion or out of control of people occurs, so that the conference cannot normally progress, even is interrupted, the conference efficiency and effect are seriously affected, and no effective solution exists at present.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a video conference method and system and a computer readable storage medium, which can monitor the emotion of a person in a video conference and realize auxiliary management of the conference.
To achieve the above object, the present invention provides a video conference method, comprising: confirming a current speaker in the video conference process; starting expression recognition for the current speaker in the video stream, and starting a microphone intelligent detection function when the expression of the current speaker is abnormal; and when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value, a prompt is sent to a client of the current speaker.
In one embodiment of the present invention, the identifying the current speaker in the video conference process includes: the current speaker in the video conference process is confirmed by the change of the microphone pick-up.
In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is detected to be higher than the first threshold value or the speaking frequency is detected to be higher than the second threshold value for a plurality of times, sending an emotion overdriving prompt to the client of the current speaker, and providing an option of closing the voice or the camera.
In an embodiment of the present invention, the video conference method further includes: presetting and storing a plurality of keywords before starting a video conference, and pre-distributing the grades of the keywords; performing voice recognition on the current speaker in the video conference process; when the speaking content of the current speaker comprises the keyword, popping up a warning of a corresponding grade to the client of the current speaker according to the grade degree of the keyword; and when the keyword in the speaking content of the current speaker is at the highest level, the keyword is forbidden to the current speaker.
In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is not detected to be higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time, the microphone intelligent detection function is turned off.
In an embodiment of the present invention, the video conference method further includes: in the video conference process, face recognition is carried out on each person in the video stream, and the face contour positions of each person are recorded continuously for many times; when the overlapping part of the facial contour position of a certain person recorded twice successively is lower than a third threshold value, recording as one-time large movement; and when detecting that the number of times of large movement accumulated by a certain person in a period of time exceeds a fourth threshold value, sending a notice-focusing reminding to the client of the person.
In an embodiment of the present invention, the video conference method further includes: in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a certain person cannot be detected within a period of time, a notice reminding for asking attention is sent to a client of the person.
In an embodiment of the present invention, the video conference method further includes: and when the speaking decibel of the current speaker is higher than the first threshold value and/or the speaking frequency is higher than the second threshold value, sending a prompt for asking to pay attention to the key speaking content to the clients of the participants in the video conference process in the form of voice or text.
Based on the same inventive concept, the present invention also provides a video conference system, comprising: the system comprises a current speaker confirmation module, an expression recognition module and a first reminding module. The current speaker confirmation module is used for confirming the current speaker in the video conference process. The expression recognition module is coupled with the current speaker confirmation module and is used for starting expression recognition for the current speaker in the video stream. The microphone intelligent detection module is coupled with the expression recognition module and is used for starting microphone intelligent detection when the expression recognition module judges that the expression of the current speaker is abnormal. The first reminding module is coupled with the microphone intelligent detection module and the current speaker confirmation module and is used for sending reminding to the client of the current speaker when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value.
In one embodiment, the video conferencing system further comprises: keyword module and speech recognition module. The keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and preassigned the grades of the keywords. The voice recognition module is used for carrying out voice recognition on the current speaker in the video conference process, and if the voice recognition module recognizes that the voice content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speaking content of the current speaker is identified as the highest level, the keyword is forbidden to the current speaker.
In one embodiment, the video conferencing system further comprises: and the closing module is used for closing the microphone intelligent detection function when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time.
In one embodiment, the video conferencing system further comprises: face recognition module, remove record module, second warning module, third warning module by a wide margin. The face recognition module is used for recognizing the face of each person in the video stream in the video conference process, and continuously recording the face contour positions of each person for a plurality of times. The large-amplitude movement recording module is coupled with the face recognition module and is used for recording that the large-amplitude movement is performed once when the superposition part of the face outline position of a certain person recorded twice in succession is lower than a third threshold value. The second reminding module is coupled with the large-amplitude movement recording module and the face recognition module and is used for sending a notice reminding to the client of a person in a voice or text mode when the fact that the accumulated large-amplitude movement times of the person in a period of time exceeds a fourth threshold value is detected. The third reminding module is coupled with the face recognition module and is used for sending out attention-requesting reminding to the client of a person when the face recognition module can not detect the face information of the person in a continuous period of time.
In one embodiment, the video conferencing system further comprises: and the fourth reminding module is coupled with the microphone intelligent detection module and is used for sending a reminding of paying attention to the key speaking content to the clients of the participants in the video conference process when the microphone intelligent detection module detects that the speaking decibel of the current speaker is higher than the first threshold value and/or the speaking frequency is higher than the second threshold value.
Based on the same inventive concept, the present invention also provides a computer-readable storage medium for performing the video conference method as claimed in any one of claims 1 to 8.
Compared with the prior art, according to the video conference method and system, the expression recognition is firstly carried out on the speaker, after the occurrence of the abnormality of the expression is primarily judged, the microphone is triggered to carry out intelligent detection, then the sound of the speaker is judged, whether the emotion of the speaker is out of control or not is judged according to the two conditions of the expression and the sound, the emotion of the speaker can be accurately judged, further, when the emotion and the sound of the speaker are abnormal, a prompt is sent, the speaker in the conference is guided to calm down, and the aim of auxiliary management of the conference is fulfilled.
Drawings
Fig. 1 is a video conference method according to an embodiment of the present invention;
fig. 2 is a video conferencing system in accordance with an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention is, therefore, to be taken in conjunction with the accompanying drawings, and it is to be understood that the scope of the invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the term "comprise" or variations thereof such as "comprises" or "comprising", etc. will be understood to include the stated element or component without excluding other elements or components.
Fig. 1 is a video conference method according to an embodiment of the present invention, the video conference method including: step S1 to step S3.
The current speaker is confirmed in step S1. The current speaker during the videoconference can be confirmed by a change in microphone pickup.
And starting the intelligent microphone detection function after judging that the expression is abnormal in the step S2. Specifically, the expression recognition is started for the current speaker in the video stream, and if the expression of the current speaker is abnormal, such as anger, violence, excitation and the like, the intelligent microphone detection function is started.
In step S3, emotion reminding is performed. And when the speaking decibel of the current speaker is detected to be higher than the first threshold value and/or the speaking frequency is detected to be higher than the second threshold value, sending a prompt to the client of the current speaker in the form of voice or characters. The system can record the decibels and the frequency of the frequent speaking of the speaker, and take the decibels and the frequency of the frequent speaking of the speaker as the basis for determining the first threshold and the second threshold, specifically, in this embodiment, when detecting that the current speaking decibels of the speaker are higher than 10% of the normal decibels of the speaker (such as more than 70 db for 2 times) or the speaking frequency exceeds 10% of the normal frequency of the speaker for many times, the system sends an emotion overstress prompt to the client of the current speaker to remind the speaker whether to continue to participate in the conference after suspension, and can provide an option of closing the voice and/or the camera, and the speaker can automatically close the voice and/or the camera under the condition of losing state of the speaker.
According to the method, the emotion of the speaker can be accurately judged by firstly carrying out the emotion recognition on the speaker, triggering the microphone to carry out intelligent detection after the occurrence of the abnormality of the emotion is primarily judged, then judging the sound of the speaker, judging whether the emotion of the speaker is out of control according to the two conditions of the emotion and the sound, and further, when the emotion and the sound of the speaker are abnormal, reminding is sent out, the speaker in the conference is led to calm down, and the aim of auxiliary conference management is fulfilled. In addition, only the speaker is managed in the method, and the required resources are small.
Preferably, in order to improve the conference quality, the video conference method of an embodiment further includes: presetting and storing a plurality of keywords before starting a video conference, and pre-distributing the grades of the keywords; performing voice recognition on the current speaker in the video conference process; if the speaking content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords, and if the keywords in the speaking content of the current speaker are at the highest grade, the current speaker is forbidden. The keywords can be the name of the honour person, special events, names forbidden to appear, non-civilized expressions and the like, and the keyword level can be set to be first-level, second-level, third-level and the like, if the non-civilized expressions are first-level, warning is directly popped up and forbidden; and if the personal privacy is related to a secondary warning, triggering the intelligent detection function of the microphone, performing corresponding processing according to the result, and if the direct caller name is related to a tertiary warning, giving a prompt to ask the speaker to pay attention to the politics.
Preferably, in order to save network resources, the video conference method further comprises: and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and keywords appear in the speaking content of the current speaker are not detected within a period of time, the microphone intelligent detection function is turned off.
Preferably, in order to be able to further improve the effect of the conference, the video conference method further comprises: in the video conference process, face recognition is carried out on each person in the video stream, and the face contour positions of each person are recorded continuously for many times; when the overlapping part of the facial contour positions recorded by a certain person twice continuously is lower than a third threshold value, recording as one large movement; when detecting that the accumulated large movement times of a certain person exceeds a fourth threshold value within a period of time, sending a notice reminding to the client of the person in a voice or text mode.
Preferably, in order to be able to further improve the effect of the conference, the video conference method further comprises: in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a certain person cannot be detected in a continuous period of time, a prompt for asking attention is sent to a client of the person in a voice or text mode.
Preferably, the video conference method further comprises: when the speaking decibel of the current speaker is detected to be higher than the first threshold value and/or the speaking frequency is detected to be higher than the second threshold value, a prompt for paying attention to the key speaking content is sent to the clients of the participants in the video conference process in the form of voice or words. This embodiment can be applied in the field of remote teaching, in the lecture of a lecturer, if the lecturer sound becomes loud, the system gives a reminder to the student client, reminding the student that this is the focus, please note listening. The reminding mode can be that the reminding is carried out by a striking screen character reminding mode, a 2-3 times of screen flashing reminding mode, a short-term special reminding sound of 'dripping', and the like, so that the long reminding of the system is prevented from covering the sound of a lecturer.
Based on the same inventive concept, there is also provided a video conference system according to this embodiment, as shown in fig. 2, the system including: the system comprises a current speaker confirmation module 10, an expression recognition module 11, a microphone intelligent detection module 12 and a first reminding module 13.
The current speaker verification module 10 is configured to verify a current speaker during a video conference, specifically, the present embodiment verifies the current speaker during the video conference through a change in microphone pickup.
The expression recognition module 11 is coupled to the current speaker verification module 10 for initiating expression recognition for the current speaker in the video stream.
The microphone intelligent detection module 12 is coupled to the expression recognition module 11, and is configured to start microphone intelligent detection when the expression recognition module 11 determines that the expression of the current speaker is abnormal.
The first reminding module 13 is coupled to the microphone intelligent detection module 12 and the current speaker verification module 10, and is configured to send a reminder to the client of the current speaker in the form of voice or text when detecting that the speaking decibel of the current speaker is higher than the first threshold and/or the speaking frequency is higher than the second threshold. The reminding module is also used for sending out an emotion overdriving reminding to the client of the current speaker and providing an option of closing the voice or the camera when the fact that the speaking decibel of the current speaker is higher than the first threshold value for a plurality of times or the speaking frequency is higher than the second threshold value for a plurality of times is detected.
Preferably, in order to improve the conference quality, in an embodiment, the video conference system further comprises: keyword module and speech recognition module.
The keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and preassigned the grades of the keywords.
The voice recognition module is used for carrying out voice recognition on the current speaker in the video conference process, and if the voice recognition module recognizes that the voice content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speaking content of the current speaker is identified as the highest level, the keyword is forbidden to the current speaker.
Preferably, in order to save network resources, in an embodiment, the videoconferencing system further comprises: and the closing module is used for closing the microphone intelligent detection function when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time.
Preferably, in order to further improve the effect of the conference, in an embodiment, the video conference system further comprises: face recognition module, remove record module, second warning module, third warning module by a wide margin.
The face recognition module is used for recognizing the face of each person in the video stream in the video conference process, and continuously recording the face contour position of each person for a plurality of times (for example, recording every 1 second).
The large-amplitude movement recording module is coupled with the face recognition module and is used for recording a large-amplitude movement when the overlapping part of the face outline positions recorded by a person twice successively is lower than a third threshold value (such as 65%).
The second reminding module is coupled with the large-amplitude movement recording module and the face recognition module and is used for sending a notice reminding to the client of a person in a voice or text mode when the accumulated large-amplitude movement times of the person in a period of time exceeds a fourth threshold (for example, more than 6 times in 1 minute) are detected.
The third reminding module is coupled with the face recognition module, and is used for sending a notice reminding to the client of a person in a voice or text mode when the face recognition module cannot detect the face information of the person in a continuous period of time (such as 10 seconds).
In one embodiment, the video conferencing system further comprises: and the fourth reminding module is coupled with the microphone intelligent detection module 12 and is used for sending a reminding of paying attention to important speaking contents to the clients of the participants in the video conference process in a voice or text mode when the microphone intelligent detection module 12 detects that the speaking decibel of the current speaker is higher than the first threshold value and/or the speaking frequency is higher than the second threshold value. This embodiment can be applied in the field of remote teaching, in the lecture of a lecturer, if the lecturer sound becomes loud, the system gives a reminder to the student client, reminding the student that this is the focus, please note listening. The reminding mode can be that the reminding is carried out by a striking screen character reminding mode, a 2-3 times of screen flashing reminding mode, a short-term special reminding sound of 'dripping', and the like, so that the long reminding of the system is prevented from covering the sound of a lecturer.
Based on the same inventive concept, the present embodiment also provides a computer-readable storage medium for performing the video conference method of any one of the above embodiments.
In summary, according to the video conference method and system of the present embodiment, firstly, the expression recognition is performed on the speaker, after the occurrence of the abnormality of the expression is primarily determined, the microphone is triggered to perform intelligent detection, then the sound of the speaker is determined, and whether the emotion of the speaker is out of control is determined according to two conditions of the expression and the sound, so that the emotion of the speaker can be accurately determined.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (9)

1. A method of video conferencing, comprising:
confirming a current speaker in the video conference process;
starting expression recognition for the current speaker in the video stream, and starting a microphone intelligent detection function when the expression of the current speaker is abnormal;
when the speaking decibel of the current speaker is detected to be higher than a first threshold value and/or the speaking frequency is detected to be higher than a second threshold value, a prompt is sent to a client of the current speaker;
the video conference method further comprises the following steps:
presetting and storing a plurality of keywords before starting a video conference, and pre-distributing the grades of the keywords;
performing voice recognition on the current speaker in the video conference process;
when the speaking content of the current speaker comprises the keyword, popping up a warning of a corresponding grade to the client of the current speaker according to the grade degree of the keyword;
and when the keyword in the speaking content of the current speaker is at the highest level, the keyword is forbidden to the current speaker.
2. The video conferencing method of claim 1 wherein the identifying the current speaker in the video conferencing process comprises:
the current speaker in the video conference process is confirmed by the change of the microphone pick-up.
3. The video conferencing method of claim 1, wherein the video conferencing method further comprises:
and when the speaking decibel of the current speaker is detected to be higher than the first threshold value or the speaking frequency is detected to be higher than the second threshold value for a plurality of times, sending an emotion overdriving prompt to the client of the current speaker, and providing an option of closing the voice or the camera.
4. The video conferencing method of claim 1, wherein the video conferencing method further comprises:
and when the speaking decibel of the current speaker is not detected to be higher than the first threshold value, the speaking frequency is not detected to be higher than the second threshold value and the keyword appears in the speaking content of the current speaker is not detected within a period of time, the microphone intelligent detection function is turned off.
5. The video conferencing method of claim 1, wherein the video conferencing method further comprises:
in the video conference process, face recognition is carried out on each person in the video stream, and the face contour positions of each person are recorded continuously for many times;
when the overlapping part of the facial contour position of a certain person recorded twice continuously is lower than a third threshold value, recording as one-time large movement;
and when detecting that the number of times of large movement accumulated by a certain person in a period of time exceeds a fourth threshold value, sending a notice-focusing reminding to the client of the person.
6. The video conferencing method of claim 1, wherein the video conferencing method further comprises:
in the video conference process, face recognition is carried out on each person in the video stream, and when face information of a certain person cannot be detected within a period of time, a notice reminding for asking attention is sent to a client of the person.
7. The video conferencing method of claim 1, wherein the video conferencing method further comprises:
and when the speaking decibel of the current speaker is detected to be higher than the first threshold value and/or the speaking frequency is detected to be higher than the second threshold value, sending a reminder of asking to pay attention to the key speaking content to the clients of the participants in the video conference process.
8. A video conferencing system, comprising:
the current speaker confirming module is used for confirming the current speaker in the video conference process;
the expression recognition module is coupled with the current speaker confirmation module and is used for starting expression recognition for the current speaker in the video stream;
the microphone intelligent detection module is coupled with the expression recognition module and is used for starting microphone intelligent detection when the expression recognition module judges that the expression of the current speaker is abnormal;
the reminding module is coupled with the microphone intelligent detection module and the current speaker confirmation module and is used for sending a reminder to a client of the current speaker when detecting that the speaking decibel of the current speaker is higher than a first threshold value and/or the speaking frequency is higher than a second threshold value;
the keyword module is used for presetting and storing a plurality of keywords before the video conference is started, and preassigned the grades of the keywords;
the voice recognition module is used for carrying out voice recognition on the current speaker in the video conference process, and if the recognition result shows that the speaking content of the current speaker comprises keywords, a warning with corresponding grades is popped up to the client of the current speaker according to the grade degree of the keywords; and if the keyword in the speaking content of the current speaker is identified as the highest level, the keyword is forbidden to the current speaker.
9. A computer readable storage medium for performing the videoconferencing method of any of claims 1-7.
CN202010845755.7A 2020-08-20 2020-08-20 Video conference method and system, and computer readable storage medium Active CN111986703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010845755.7A CN111986703B (en) 2020-08-20 2020-08-20 Video conference method and system, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010845755.7A CN111986703B (en) 2020-08-20 2020-08-20 Video conference method and system, and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111986703A CN111986703A (en) 2020-11-24
CN111986703B true CN111986703B (en) 2023-05-26

Family

ID=73442448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010845755.7A Active CN111986703B (en) 2020-08-20 2020-08-20 Video conference method and system, and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111986703B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996801B (en) * 2023-09-25 2023-12-12 福州天地众和信息技术有限公司 Intelligent conference debugging speaking system with wired and wireless access AI

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6894714B2 (en) * 2000-12-05 2005-05-17 Koninklijke Philips Electronics N.V. Method and apparatus for predicting events in video conferencing and other applications
US7092001B2 (en) * 2003-11-26 2006-08-15 Sap Aktiengesellschaft Video conferencing system with physical cues
JP4287770B2 (en) * 2004-03-18 2009-07-01 日本電信電話株式会社 Information transmission method, communication apparatus for realizing the method, and program thereof
CN101080000A (en) * 2007-07-17 2007-11-28 华为技术有限公司 Method, system, server and terminal for displaying speaker in video conference
CN102843543B (en) * 2012-09-17 2015-01-21 华为技术有限公司 Video conferencing reminding method, device and video conferencing system
US9576190B2 (en) * 2015-03-18 2017-02-21 Snap Inc. Emotion recognition in video conferencing
CN107993674A (en) * 2016-10-27 2018-05-04 中兴通讯股份有限公司 A kind of emotion control method and device
JP2019176375A (en) * 2018-03-29 2019-10-10 株式会社アドバンスト・メディア Moving image output apparatus, moving image output method, and moving image output program
US11222199B2 (en) * 2018-12-05 2022-01-11 International Business Machines Corporation Automatically suggesting behavioral adjustments during video conferences

Also Published As

Publication number Publication date
CN111986703A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
US11386381B2 (en) Meeting management
Jiang et al. Moderation challenges in voice-based online communities on discord
EP3158719B1 (en) Method and system for filtering undesirable incoming telephone calls
US9560208B2 (en) System and method for providing intelligent and automatic mute notification
US8121845B2 (en) Speech screening
CN111402900B (en) Voice interaction method, equipment and system
US8649494B2 (en) Participant alerts during multi-person teleconferences
US9293148B2 (en) Reducing noise in a shared media session
US20110060591A1 (en) Issuing alerts to contents of interest of a conference
CN111540349B (en) Voice breaking method and device
US10984802B2 (en) System for determining identity based on voiceprint and voice password, and method thereof
US20120027195A1 (en) Automatic Editing out of Sensitive Information in Multimedia Prior to Monitoring and/or Storage
US20210359872A1 (en) Automatic correction of erroneous audio setting
US10652396B2 (en) Stream server that modifies a stream according to detected characteristics
JP6420514B1 (en) Conversation robot
CN111986703B (en) Video conference method and system, and computer readable storage medium
US20180158462A1 (en) Speaker identification
US9014058B2 (en) Enhancement of audio conference productivity through gain biasing
US11164577B2 (en) Conversation aware meeting prompts
JP2006279111A (en) Information processor, information processing method and program
CN111199751B (en) Microphone shielding method and device and electronic equipment
CN110865789A (en) Method and system for intelligently starting microphone based on voice recognition
CN111128199A (en) Sensitive speaker monitoring and recording control method and system based on deep learning
CN113488047A (en) Man-machine conversation interruption method, electronic device and computer readable storage medium
US10867609B2 (en) Transcription generation technique selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant