CN102025972A

CN102025972A - Mute indication method and device applied for video conference

Info

Publication number: CN102025972A
Application number: CN2010105916923A
Authority: CN
Inventors: 吴永明
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2010-12-16
Filing date: 2010-12-16
Publication date: 2011-04-20
Also published as: WO2012079510A1

Abstract

The invention provides a mute indication method and device applied for a video conference. The method comprises: a multipoint conference unit (MCU) carries out sound activation detection for a voice frequency media flow sent by a terminal which participates in the video conference and is muted; the MCU obtains the detection result of the terminal, wherein the detection result comprises one of the following states: sound activation state and sound inactivation state; and when the detection result is the sound activation state, the MCU superposes the mute video indication to the video signal of the terminal. The invention can improve the communication experience of the video conference and makes the video conference simple and efficient.

Description

Be applied to the method and the device of the quiet indication of video conference

Technical field

The present invention relates to the communications field, in particular to a kind of quiet indicating means and device that is applied to video conference.

Background technology

Video conferencing system is to support sound, video remote double to the multimedia communications system that transmits, and its finishes the face-to-face visual communication of real time bidirectional with the user who helps to be in the strange land.

International Telecommunication Union, the Internet engineering duty group (IETF), third generation partner program normal structures such as (3GPP) are engaged in the exploitation of multimedia standardizationization separately.ITU developed at present ITU-T H.320, ITU-T H.323, a plurality of multimedia communication standards such as ITU-TH.324, wherein, ITU-T uses at the multimedia communication of narrow band circuit switching network, ITU-T uses at the multimedia communication of IP network, and H.324 ITU-T is at the very multimedia communication application of the network of low speed, as PSTN (Public Switched Telephone Network, public switch telephone network) network and mobile network.IETF then is responsible to define session initiation protocol SIP and based on the multimedia conferencing standard of this agreement.3GPP is the standard that is responsible to define IP Multimedia System IMS, and it has also formulated the multimedia conferencing standard of a cover based on the IMS network on the ietf standard basis, and the standard based on SIP that this standard and IETF formulate is very approaching.

Fig. 1 has described the basic principle of video conference communication.Terminal 101 is the equipment that the user uses, and comprises terminal 1～n.Comprise codec in each terminal, encoding and decoding are responsible for finishing the compressed encoding and the decoding of medium such as sound, video; Terminal also connects microphone, camera, and display, the sound playing subsystem is used for finishing the input and output of sound, video; Terminal also comprises user's input interface, the user by input interface to terminal input instruction and information.When holding video conference, terminal 101 and MCU (Multipoint Conference Unit, multi-conference unit (mcu)) 102 connects, comprise the two-way communication of control signaling, audio frequency, video, in order to save the network bandwidth, Voice ﹠ Video generally adopts the form of compressed encoding to transmit on network.

MCU 102 is used for finishing Multi-Party Conference communication.The terminal 101 and the MCU 102 that participate in Multi-Party Conference communication connect, and control the two-way communication of signaling, audio frequency, video.MCU 102 is responsible for finishing the exchange and the mixing of Media Stream.For acoustic medium stream, MCU 102 is generally the synthetic acoustic medium stream of each terminal 101 output process audio mixing, and several roads acoustic medium stream of the volume maximum of the synthetic general selection input of audio mixing superposes.For video, MCU 102 can send the single-image video flowing of another terminal for certain terminal, if MCU 102 supports many picture functions, also can synthesize a multiple image to the video that a plurality of terminals are come, and sends to certain or some terminals then.

In video conference,, generally all provide session control function in order to satisfy the needs of user to conference management.Meeting Control Software 103 among Fig. 1 is used for finishing session control function.An important function of meeting Control Software 103 is that terminal is carried out quiet control; in order to reach good sound communication effectiveness; usually can carry out quiet operation to the current terminal that does not need to make a speech; after a terminal was muted, other terminal that participates in same meeting can't be heard the speech of this terminal.

Self be not muted if the terminal that is muted is notified, the user of this terminal can attempt the operation of making a speech, but can't hear his speech again the user of other end side, can be misinterpreted as the system failure, causes the decline of ease for use.

In traditional audio conference system, generally be by giving a kind of special prompt tone of the terminal plays be muted, for example " toot " sound is play in the compartment of terrain.The shortcoming of this way is to point out intuitively inadequately, and disturbs listening to of normal meeting sound to a certain extent.

Adopt the prompt tone mode at quiet prompting in the correlation technique, directly perceived inadequately, and disturb the problem of listening to of normal meeting sound to a certain extent, effective solution is not proposed at present as yet.

Summary of the invention

The present invention aims to provide a kind of quiet indicating means and device that is applied to video conference, and is directly perceived inadequately to solve quiet prompting employing prompt tone mode in the correlation technique, and disturbs the problem of listening to of normal meeting sound to a certain extent.

According to an aspect of the present invention, provide a kind of quiet indicating means that is applied to video conference, having comprised: the audio medium stream that multi-conference unit (mcu) MCU sends the terminal that participates in video conference and be muted carries out voice activation and detects; Described MCU obtains the testing result of described terminal, wherein, described testing result comprise following one of arbitrarily: voice activation state and sound unactivated state; When described testing result is the voice activation state, the described MCU described quiet video indication that in sending to the vision signal of described terminal, superposes.

Preferably, the audio medium stream that described MCU sends the terminal that participates in video conference carries out voice activation and detects, and comprising: described MCU periodically carries out voice activation to described audio medium stream and detects.

Preferably, described MCU obtains the testing result of described terminal, comprising: if when the audio parameter of described audio medium stream is higher than the threshold value that described voice activation detects, described MCU determines that described testing result is the voice activation state; If when the audio parameter of described audio medium stream was not higher than the threshold value of described voice activation detection, described MCU determined that described testing result is the sound unactivated state.

Preferably, the described MCU described quiet video indication that superposes in sending to the vision signal of described terminal comprises: described MCU superpose in sending to the vision signal of described terminal literal or icon, described literal or icon are used to indicate described terminal to be muted.

Preferably, the described MCU described quiet video indication that in sending to the vision signal of described terminal, superposes, comprise: described MCU is in the superpose reprocessing of described quiet video indication of each frame of video that sends to described terminal, until the described quiet video indication of cancellation.

According to a further aspect in the invention, a kind of quiet indicating device that is applied to video conference is provided, be arranged among the multi-conference unit (mcu) MCU, comprise: detection module is used for that the audio medium stream that the terminal that participates in video conference and be muted sends is carried out voice activation and detects; Acquisition module is used to obtain the testing result of described terminal, wherein, described testing result comprise following one of arbitrarily: voice activation state and sound unactivated state; Laminating module is used for when described testing result is the voice activation state, the described quiet video indication of stack in sending to the vision signal of described terminal.

Preferably, described detection module also is used for periodically described audio medium stream being carried out the voice activation detection.

Preferably, described acquisition module comprises: first determines submodule, is used for determining that described testing result is the voice activation state when if the audio parameter of described audio medium stream is higher than the threshold value that described voice activation detects; Second determines submodule, is used for determining that described testing result is the sound unactivated state when if the audio parameter of described audio medium stream is not higher than the threshold value that described voice activation detects.

Preferably, described laminating module also is used for sending to the vision signal stack literal or the icon of described terminal, and described literal or icon are used to indicate described terminal to be muted.

Preferably, described laminating module also is used in the superpose reprocessing of described quiet video indication of each frame of video that sends to described terminal, until the described quiet video indication of cancellation.

In embodiments of the present invention, the audio medium stream that MCU sends the terminal that participates in video conference and be muted carries out voice activation and detects, and when testing result was the voice activation state, the MCU quiet video that superpose in sending to the vision signal of terminal was indicated.In the embodiment of the invention, after certain terminal is muted, if when the user of this terminal attempts making a speech, in the vision signal that receives, will show a quiet video Indication message, for example show " you the current speech that is under an embargo, please carry out request floor earlier and operate ".The purpose of the embodiment of the invention is that the communication that improves video conference is experienced, and it is simple efficient to allow video conference use.The advantage of the embodiment of the invention is, information is directly perceived, and the information content can be enriched accurately, and information is dynamic appearance, and the interference minimum to the user is guaranteed in not prompting under the normal condition.

Description of drawings

Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute improper qualification of the present invention.In the accompanying drawings:

Fig. 1 is the basic principle schematic according to the video conference communication of correlation technique;

Fig. 2 is the process chart according to the quiet indicating means that is applied to video conference of the embodiment of the invention;

Fig. 3 is MCU device and the respective handling schematic flow sheet according to the quiet information of support video superimpose of the embodiment of the invention;

Fig. 4 is another one MCU device and the respective handling schematic flow sheet according to the quiet information of support video superimpose of the embodiment of the invention;

Fig. 5 is the process chart according to the specific embodiment of the embodiment of the invention;

Fig. 6 is the display effect figure according to the quiet prompting of the employing video superimpose mode of the embodiment of the invention;

Fig. 7 is the display effect figure according to the quiet prompting of the employing video inserted mode of the embodiment of the invention;

Fig. 8 is the structural representation according to the quiet indicating device that is applied to video conference of the embodiment of the invention;

Fig. 9 is the structural representation according to the acquisition module of the embodiment of the invention.

Embodiment

Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.

Hereinafter will describe the present invention with reference to the accompanying drawings and in conjunction with the embodiments in detail.Need to prove that under the situation of not conflicting, embodiment and the feature among the embodiment among the application can make up mutually.

For solving the problems of the technologies described above, the embodiment of the invention provides a kind of quiet indicating means that is applied to video conference, and handling process comprises as shown in Figure 2:

The audio medium stream that step 202, multi-conference unit (mcu) MCU send the terminal that participates in video conference and be muted carries out voice activation and detects;

Step 204, MCU obtain the testing result of terminal, wherein, testing result comprise following one of arbitrarily: voice activation state and sound unactivated state;

Step 206, when testing result is the voice activation state, the MCU quiet video indication that in sending to the vision signal of terminal, superposes.

Preferably, the audio medium stream that MCU sends the terminal that participates in video conference carries out voice activation and detects VAD, comprising: MCU periodically carries out voice activation to audio medium stream and detects.MCU carries out voice activation to audio medium stream constantly and detects, and T1 exports voice activation status detection result one time at set intervals.Testing result is two states, and one is the voice activation attitude, and another is the sound unactivated state.T1 can be used as adjustable MCU configuration item.

Preferably, MCU obtains the testing result of terminal, comprising: if when the audio parameter of audio medium stream is higher than the threshold value that voice activation detects, MCU determines that testing result is the voice activation state; If when the audio parameter of audio medium stream was not higher than the threshold value of voice activation detection, MCU determined that testing result is the sound unactivated state.The threshold value that VAD detects can be regulated as the case may be.

During enforcement, according to the judged result of step 204, MCU can be chosen in quiet video indication of stack in the vision signal that sends to terminal (or insertion) or the indication of cancellation stack (or insertion) video mute.MCU checks whether terminal is muted, if be muted, then further judge whether the audio medium stream of the transmission of current terminal is state of activation, if the voice activation state, then need to send quiet video indication to this terminal, other condition is for stopping to send quiet video indication.Wherein, be muted the acoustic processing that is meant in MCU inside, can in video conference, stop other terminals listen that participates in video conference to arrive the sound of this terminal.

Preferential, step 206 is when implementing, and the MCU quiet video that superpose in sending to the vision signal of terminal is indicated, and comprising: MCU superpose in sending to the vision signal of terminal literal or icon, literal or icon are used for indicating terminal and are muted.Attributes such as the content of literal or icon, font, literal size, color, display position can be used as adjustable configuration item.

During enforcement, superpose in sending to each frame of video of terminal the reprocessing of quiet video indication of MCU is until the quiet video indication of cancellation.Cancel quiet video indication and then frame of video is not carried out overlap-add procedure.

Can know by above-mentioned explanation, insert the indication of quiet video after, MCU replaces normal meeting video flowing with quiet prompting video flowing.Quiet prompting video flowing comprises literal or icon information, is used to refer to terminal and is muted.Cancel quiet video indication and then recover to send normal meeting video flowing.

Fig. 3 has described a MCU device and a respective handling flow process based on the quiet information of support video superimpose of the embodiment of the invention.Network Interface Module 301 is responsible for and the communicating by letter of terminal, and is responsible for receiving and dispatching sound, video media stream.Network Interface Module 301 is given audio decoder module 302 with the audio stream (1) that receives, audio decoder module 302 is decoded as the unprocessed form audio code stream with the audio format of compression, give audio mixing module 303 and voice activation detection module 304 respectively with unprocessed form audio code stream (2) then, audio mixing module 303 is responsible for the audio stream that multiplex terminal comes is carried out mixed processing, reach the effect of MPTY, audio mixing module 303 is given audio coding module 305 with the audio code stream behind the audio mixing (4), audio coding module 305 is responsible for original audio is carried out compressed encoding, gives Network Interface Module 301 with the audio stream (3) behind the coding.Network Interface Module 301 is given video decode module 306 with the video flowing (5) that receives.Voice activation detection module 304 is responsible for that the audio medium stream that the terminal that participates in video conference and be muted sends is carried out voice activation and is detected, in the present embodiment, the T1 value is 1000ms, and voice activation detection module 304 reports main control module 307 every 1000ms with voice activation state (7).Main control module 307 is responsible for judging whether carrying out the video mute indication, under the situation that terminal is muted, if receive the state indication of voice activation, then needs to send quiet video indication to terminal, indicates for stopping to send quiet video under other situation.The order whether main control module 307 will send quiet video indication (8) sends to figure laminating module 308, and the video code flow (6) that video decode module 306 will send to the unprocessed form of terminal is sent to figure laminating module 308.Figure laminating module 308 is responsible for quiet information is added in the video code flow of the unprocessed form that sends to terminal, unprocessed form video code flow (9) after will superposeing is then given video encoder 309, behind the video code flow compressed encoding of video encoder 309 with unprocessed form, give Network Interface Module 301, send to terminal by Network Interface Module 301.The user can be by the mode of equipment disposition, and the position that volume comparison threshold, volume calculating sample number or corresponding time interval, prompt text content, text color, font size, font type, prompt text are presented in the frame of video sets in advance in the MCU equipment.

Fig. 4 has described MCU device and the handling process of another one based on the quiet information of the support video inserted mode of the embodiment of the invention.Network Interface Module 401 is responsible for and the communicating by letter of terminal, and is responsible for receiving and dispatching sound, video media stream.Network Interface Module 401 is given audio decoder module 402 with the audio stream (1) that receives, audio decoder module 402 is decoded as the unprocessed form audio code stream with the audio format of compression, give audio mixing module 403 and voice activation detection module 404 respectively with unprocessed form audio code stream (2) then, audio mixing module 403 is responsible for the audio stream that multiplex terminal comes is carried out mixed processing, reach the effect of MPTY, audio mixing module 303 is given audio coding module 405 with the audio code stream behind the audio mixing (4), audio coding module 405 is responsible for original audio is carried out compressed encoding, gives Network Interface Module 401 with the audio stream (3) behind the coding.The video flowing (5) that video mix and Switching Module 406 receiving terminals send, the video of a plurality of terminals is synthesized picture video more than, or select the video input exchange of certain terminal to give other terminal, the outputting video streams (6) of video mix and Switching Module 406 is given video frequency switching module 407.Voice activation detection module 404 is responsible for that the audio medium stream that the terminal that participates in video conference and be muted sends is carried out voice activation and is detected, in the present embodiment, the T1 value is 1000ms, and voice activation detection module 404 reports main control module 409 every 1000ms with voice activation state (8).Main control module 409 is responsible for judging whether carrying out the video mute indication, under the situation that terminal is muted, if receive the state indication of voice activation, then needs to send quiet video indication to terminal, indicates for stopping to send quiet video under other situation.The order whether main control module 409 will send quiet video indication (9) sends to video frequency switching module 407.Video frequency switching module 407 selects normal meeting video flowing (6) or quiet prompting video flowing (7) to send to terminal according to the order of main control module 409.Visual prompts module 410 is used for exporting quiet prompting video flowing (7).Adopting the advantage of inserting quiet visual prompts is to save the media computation resource.Usually the video superimpose operation relatively consumes cpu resource.

Fig. 5 is the process chart of the embodiment of the invention, and this flow chart is based on that the MCU embodiment of Fig. 3 illustrates.

The audio stream data of the unprocessed form of step 501, receiving terminal input, for example reception is equivalent to the voice data that the duration is 100ms;

Step 502, utilize the audio stream data of up-to-date reception to carry out voice activation to detect, depend on vad algorithm, calculating may need to use the historical audio stream data of preservation and previous result of calculation; The VAD decision threshold can be disposed by the user, scalable judgement sensitivity;

Step 503, output sound state of activation;

Step 501 can be set to the VAD module to the executive agent of step 503, and step 503 turns back to 501 and repeats after carrying out and finishing; Follow-uply export the voice activation state to main control module, carry out subsequent step 511 to step 515 by main control module;

Step 511, reception input are also upgraded the voice activation state;

Step 512, judge whether to be the voice activation attitude, if be the voice activation attitude, execution in step 513, if be non-voice activation attitude, execution in step 515;

Step 513, judge whether this terminal is muted, if be muted, execution in step 514, otherwise execution in step 515;

Step 514, transmission request stack prompting message, notice video superimpose module is carried out video superimpose, turns back to step 511 and repeats;

Step 515, transmission cancellation stack prompting message, notice video superimpose module cancellation video superimpose turns back to step 512 and repeats;

Follow-up stack prompting message or the cancellation stack prompting message of will asking exports the video superimpose module to, carries out subsequent step 521 to step 524 by the video superimpose module;

Step 521, video superimpose module are upgraded the video superimpose state according to the input of main control module;

Step 522, video superimpose module judge whether to carry out video superimpose, if execution in step 523, otherwise execution in step 524;

Step 523, video superimpose module are added to information and send in the vision signal of terminal, and information can be to express quiet icon or descriptive text strings; Attributes such as the content of prompt text, font, literal size, color, display position can be used as adjustable configuration item;

Step 524, video superimpose module are not carried out overlap-add procedure.

The quiet indicating means that adopts the embodiment of the invention to provide, can in video, generate quiet prompting, for example, Fig. 6 is a kind of display effect of the quiet prompting of employing video superimpose mode, and outer rectangular box is represented video screen, and personage's icon is used for representing the vision signal that terminal is watched, the literal side of bottom is the quiet information of stack, such as, your current speech that is under an embargo please the first to file speech.Again for example, Fig. 7 is the display effect of the quiet prompting of employing video inserted mode, and outer rectangular box is represented video screen.

Based on same inventive concept, the embodiment of the invention also provides a kind of quiet indicating device that is applied to video conference, and its structure is arranged among the multi-conference unit (mcu) MCU as shown in Figure 8, comprising:

Detection module 801 is used for that the audio medium stream that the terminal that participates in video conference and be muted sends is carried out voice activation and detects;

Acquisition module 802 is used to obtain the testing result of terminal, wherein, testing result comprise following one of arbitrarily: voice activation state and sound unactivated state;

Laminating module 803 is used for when testing result is the voice activation state, the described quiet video indication of stack in sending to the vision signal of terminal.

In one embodiment, detection module 801 can also be used for periodically audio medium stream being carried out the voice activation detection.

In one embodiment, as shown in Figure 9, acquisition module 802 can comprise:

First determines submodule 901, is used for determining that testing result is the voice activation state when if the audio parameter of audio medium stream is higher than the threshold value that voice activation detects;

Second determines submodule 902, is used for determining that testing result is the sound unactivated state when if the audio parameter of audio medium stream is not higher than the threshold value that voice activation detects.

In one embodiment, laminating module 803 can also be used for sending to the vision signal stack literal or the icon of terminal, and literal or icon are used for indicating terminal and are muted.

In one embodiment, laminating module 803 can also be used in the superpose reprocessing of quiet video indication of each frame of video that sends to terminal, until the quiet video indication of cancellation.

As can be seen from the above description, the present invention has realized following technique effect:

In embodiments of the present invention, the audio medium stream that MCU sends the terminal that participates in video conference and be muted carries out voice activation and detects, and when testing result was the voice activation state, the MCU quiet video that superpose in sending to the vision signal of terminal was indicated.In the embodiment of the invention, after certain terminal is muted, if when the user of this terminal attempts making a speech, in the vision signal that receives, will show a quiet video Indication message, for example show " you the current speech that is under an embargo, please carry out request floor earlier and operate ".The purpose of the embodiment of the invention is that the communication that improves video conference is experienced, and it is simple efficient to allow video conference use.The advantage of the embodiment of the invention is, lifting information is directly perceived, and the information content can be enriched accurately, and information is dynamic appearance, and the interference minimum to the user is guaranteed in not prompting under the normal condition.

Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with the general calculation device, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element, carry out by calculation element thereby they can be stored in the storage device, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.

The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a quiet indicating means that is applied to video conference is characterized in that, comprising:

The audio medium stream that multi-conference unit (mcu) MCU sends the terminal that participates in video conference and be muted carries out voice activation and detects;

Described MCU obtains the testing result of described terminal, wherein, described testing result comprise following one of arbitrarily: voice activation state and sound unactivated state;

When described testing result is the voice activation state, the described MCU described quiet video indication that in sending to the vision signal of described terminal, superposes.

2. method according to claim 1 is characterized in that, the audio medium stream that described MCU sends the terminal that participates in video conference carries out voice activation and detects, and comprising: described MCU periodically carries out voice activation to described audio medium stream and detects.

3. method according to claim 1 and 2 is characterized in that, described MCU obtains the testing result of described terminal, comprising:

If when the audio parameter of described audio medium stream was higher than the threshold value of described voice activation detection, described MCU determined that described testing result is the voice activation state;

If when the audio parameter of described audio medium stream was not higher than the threshold value of described voice activation detection, described MCU determined that described testing result is the sound unactivated state.

4. method according to claim 3, it is characterized in that, the described MCU described quiet video indication that in sending to the vision signal of described terminal, superposes, comprise: described MCU superpose in sending to the vision signal of described terminal literal or icon, described literal or icon are used to indicate described terminal to be muted.

5. method according to claim 4, it is characterized in that, the described MCU described quiet video indication that in sending to the vision signal of described terminal, superposes, comprise: described MCU is in the superpose reprocessing of described quiet video indication of each frame of video that sends to described terminal, until the described quiet video indication of cancellation.

6. a quiet indicating device that is applied to video conference is characterized in that, is arranged among the multi-conference unit (mcu) MCU, comprising:

Detection module is used for that the audio medium stream that the terminal that participates in video conference and be muted sends is carried out voice activation and detects;

Acquisition module is used to obtain the testing result of described terminal, wherein, described testing result comprise following one of arbitrarily: voice activation state and sound unactivated state;

Laminating module is used for when described testing result is the voice activation state, the described quiet video indication of stack in sending to the vision signal of described terminal.

7. device according to claim 6 is characterized in that, described detection module also is used for periodically described audio medium stream being carried out voice activation and detects.

8. according to claim 6 or 7 described devices, it is characterized in that described acquisition module comprises:

First determines submodule, is used for determining that described testing result is the voice activation state when if the audio parameter of described audio medium stream is higher than the threshold value that described voice activation detects;

Second determines submodule, is used for determining that described testing result is the sound unactivated state when if the audio parameter of described audio medium stream is not higher than the threshold value that described voice activation detects.

9. device according to claim 8 is characterized in that, described laminating module also is used for sending to the vision signal stack literal or the icon of described terminal, and described literal or icon are used to indicate described terminal to be muted.

10. device according to claim 9 is characterized in that, described laminating module also is used in the superpose reprocessing of described quiet video indication of each frame of video that sends to described terminal, until the described quiet video indication of cancellation.