WO2013174115A1 - 一种多画面视讯会议中的画面控制方法、设备及系统 - Google Patents

一种多画面视讯会议中的画面控制方法、设备及系统 Download PDF

Info

Publication number
WO2013174115A1
WO2013174115A1 PCT/CN2012/085024 CN2012085024W WO2013174115A1 WO 2013174115 A1 WO2013174115 A1 WO 2013174115A1 CN 2012085024 W CN2012085024 W CN 2012085024W WO 2013174115 A1 WO2013174115 A1 WO 2013174115A1
Authority
WO
WIPO (PCT)
Prior art keywords
site
specified time
audio energy
time period
specified
Prior art date
Application number
PCT/CN2012/085024
Other languages
English (en)
French (fr)
Inventor
詹五洲
韦海斌
吴姣黎
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013174115A1 publication Critical patent/WO2013174115A1/zh
Priority to US14/553,263 priority Critical patent/US20150092011A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2624Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of whole input images, e.g. splitscreen
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42365Presence services providing information on the willingness to communicate or the ability to communicate in terms of media capability or network connectivity
    • H04M3/42374Presence services providing information on the willingness to communicate or the ability to communicate in terms of media capability or network connectivity where the information is provided to a monitoring entity such as a potential calling party or a call processing server

Definitions

  • the present invention relates to the field of video conferencing, and in particular to a method, device and system for controlling a picture in a multi-view video conference.
  • the current video conferencing system displays the multi-screen scheme as follows: Pre-set multi-screen mode, such as 4 screens, 9 screens, etc., and then fill a fixed number of conference sites into the multi-screen sub-screens, and the conference sites see The multi-screen is the preset mode.
  • An object of the present invention is to provide a method, a device, and a system for controlling a picture in a multi-view video conference to adjust the sub-picture in real time according to the situation of each site in the field, thereby effectively improving the conference effect.
  • the embodiment of the invention discloses a screen control method for a multi-screen video conference, and the method includes:
  • Receiving audio data of the venue And acquiring, according to the audio data of each site in the site, a voice feature value of the corresponding site in a first specified time period, where the voice feature value is used to represent an activation state of the site;
  • the image of the specified venue is filled into the multi-picture as a sub-picture to update the multi-picture in real time.
  • the embodiment of the invention further discloses a picture control device for a multi-view video conference, the device comprising:
  • An audio receiving unit configured to receive audio data of the conference site
  • the voice feature value obtaining unit is configured to acquire, according to the audio data of each site in the site, a voice feature value of the corresponding site in a first specified time period, where the voice feature value is used to represent an activation state of the site;
  • a site selection unit configured to select a specified site from the plurality of sites according to an activation state of each site
  • the sub-picture updating unit is configured to fill the image of the specified site as a sub-picture into the multi-picture to update the multi-picture in real time.
  • the embodiment of the present invention further discloses a screen control system for a multi-screen video conference.
  • the system includes the foregoing device and one or more site terminals, and the site terminal is configured to display a multi-screen generated by the device control.
  • the time period is used as a statistical unit, and some feature values in the time period are used to determine whether a certain site is in an active state, and as a basis for participating in multi-picture synthesis, thereby realizing the dynamics of the sub-picture content in the multi-picture. Adjustments have significantly improved the effectiveness of the conference and greatly improved the conference experience for participants. In addition, the embodiment of the present invention can also dynamically adjust the number and position of the sub-pictures in the multi-picture, thereby effectively improving the conference effect.
  • FIG. 2 is a schematic diagram of audio and video decoding in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a multi-screen equal division method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a nesting and splitting manner of a multi-picture size sub-picture according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of multi-party mixing in an embodiment of the present invention
  • FIG. 6 is a schematic diagram of an apparatus according to another embodiment of the present invention.
  • Figure 7 is a schematic illustration of a system in accordance with yet another embodiment of the present invention.
  • FIG. 1 is a flow chart of a method according to an embodiment of the present invention, where the method includes:
  • S101 Receive audio data of the site.
  • the venue may be one or more.
  • the MCU Multipoint Control Unit
  • the output is the audio and video tree stream. See Figure 2, where the site in Figure 2 represents the site, the audio data in the Site 1 stream is decoded as AudioData 1, and the video data is in the VideoDatal- Site X stream. For AudioData X, the video data is VideoData X.
  • S102 Acquire, according to the audio data of each site in the site, a voice feature value of the corresponding site in a first specified time period, where the voice feature value is used to represent an activation state of the site.
  • the criterion is the speech feature value of each site. If the voice feature value of a site meets a certain In this case, the venue can be viewed as an active venue, or an active venue, as an alternative venue for entering multiple screens.
  • the voice feature values can be defined and evaluated in various ways, which will be described below by way of example. It should be noted that, in other embodiments of the present invention, the voice feature values may be defined and evaluated in other manners, and the embodiment of the present invention is not limited thereto.
  • Manner 1 Obtain an audio energy value of the corresponding site in a first specified time period, and use the audio energy value as the voice feature value V. If the audio energy value is greater than a specified energy threshold, determine that the site is in Activation status.
  • the method for obtaining the audio energy value may be as follows: The first method is: selecting a plurality of second specified time periods in the first specified time period, and acquiring multiple times in each second specified time period The sample audio energy data, the audio energy data of the second time period is obtained according to the root mean square value of the plurality of sample audio energy data, and the average value of the audio energy data of the plurality of second specified time segments is used as the The audio energy value.
  • the TO (typically 1 minute) may be used as the first specified time period, and then the voice feature values of each site in the TO are obtained.
  • the steps are as follows: For a site, select a plurality of second specified time periods T1 (such as 20ms) in the TO, that is, calculate the sub-units with T1 as energy, and then perform a sample in T1 to obtain multiple audio energy of the site. Data, such as N times in a T1, each time the audio energy data obtained by the sample is xl, x2, ... xN, then the audio energy data xrms of a T1 of the site can be calculated by the following formula:
  • the second method is: selecting a plurality of second specified time periods in the first specified time period, and selecting a plurality of third specified time periods in each second specified time period; Obtaining a plurality of sample audio energy data in the segment, acquiring audio energy data of the third time period according to the root mean square value of the plurality of sample audio energy data; and further, according to the audio energy of the plurality of third specified time segments
  • the mean value of the quantity data obtains audio energy data for each second specified time period; finally, the audio energy data of each second specified time period is weighted and added, and the result is taken as the audio energy value; wherein the weighting
  • the rule of processing is: The closer to the current time, the greater the weight.
  • the second method is based on the first method and is an extension based on the first method. Specifically, the difference is that the second method considers a longer time period T, then selects multiple TOs in T, and obtains the audio energy data of each TO by the first method, and then for each TO The audio energy data is weighted and added together, and the result is taken as the final audio energy value. Since the second method examines a longer time period (expanded from TO to T), it is more accurate than the first method.
  • Manner 2 The duration of the continuous voice state of the corresponding site in the first specified time period is counted, and the duration is used as a voice feature value. If the duration is greater than the specified duration, the site is determined to be in an active state. Specifically, the VAD (Voice Acti Detective) detection can be performed to count the duration of the continuous speech state in the TO period, compare the duration, and select the activation site according to the duration.
  • VAD Voice Acti Detective
  • the duration of the VAD detection during the TO time period is: VolTimeLen VolTimeLen 2... VolTimeLen N, sorting each VolTimeLen and comparing it with a preset duration threshold GateVolTimeLen
  • the site that is greater than or equal to the GateVolTimeLen can be identified as the active site; the site ID that is smaller than the GateVolTimeLen is the inactive site.
  • the duration of the duration is not used, and the W venues with the largest duration of the continuous voice state are selected as the active conference site in all the conference sites.
  • Manner 3 Obtain an audio energy value and a continuous voice state duration of the corresponding site in the first specified time period, and use the combination of the audio energy value and the duration as a voice feature value; if the combination meets the specified rule, The site is determined to be active.
  • the audio energy value may be used for initial screening, and then the continuous speech state duration may be used for secondary filtering; or one value may be used as the main value, and another value may be referred to, for example: the voice is long but the speech energy is small, and the venue can be It is considered to be active, and the duration of the speech is short but the speech energy is large, then the venue cannot be considered to be activated, so that it can be avoided.
  • the participant mistakenly judged the venue as activating the venue because the participant suddenly knocked down the table or coughed.
  • S103 Select a designated site from the multiple sites according to an activation state of each site.
  • the designated site may be one or more. After obtaining the activation status of each site based on the voice feature value, it is judged which sites should be entered into the multi-screen as the designated site.
  • the specified site can be selected from the active site to be filled into the multi-screen in a plurality of manners, which will be described below by way of example. It should be noted that, in other embodiments of the present invention, there may be other ways to select, and the embodiment of the present invention is not limited thereto.
  • Method B The site that was last active and the site that is currently active are both designated as the site. This allows for a historical display. Specifically, the current active site is ActiveS i te 2, 3... Act iveS i teN, recorded in a set CurAct iveTabe l; and the active site in the last round of switching is recorded in the set PreAc t iveTabe l , The union of the two sets of PreAc iveTabe 1 and CurAc t ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta ta
  • Mode C The site that is currently active, and the site that was last active and whose voice feature value is greater than the minimum value of the voice feature value of the site that is currently active is used as the designated site. That is, the current active site participates in the multi-screen splicing, and the last active site is compared according to the voice feature, and some of the participants can participate in the multi-screen splicing. For example, the voice feature value of the last activated site is smaller than the minimum voice in the currently activated site. The site of the feature value does not participate in the multi-screen splicing. The site where the voice feature value of the site is activated is greater than or equal to the minimum voice feature value in the currently active site.
  • S104 Fill the image of the specified site as a sub-screen into the multi-screen to update the multi-screen in real time.
  • each sub-picture in the multi-picture can be adjusted in real time as the speech of each venue is in progress, avoiding the sub-pictures seen in the prior art.
  • the inactive venues can be removed from the multi-screen in time, and the new active venues will be in time. Enter into the multi-screen.
  • the sub-pictures in the multi-picture can be one or more.
  • the step of filling the designated site as a sub-screen into a multi-screen may be specifically performed in various manners, which will be described below by way of example. It should be noted that in other embodiments of the present invention, the filling may be performed in a plurality of other manners, and the embodiment of the present invention is not limited thereto.
  • Method A According to the number of the designated sites, the multi-screen is segmented by equal division, and the specified site is filled in the sub-picture obtained by segmentation in a specified order.
  • the so-called equal-segment division can also be called the aspect ratio division, that is, the number of times the multi-screen is segmented is one less than the number of designated sites, and the window to be segmented is divided into two each time. Referring to FIG. 3, FIG.
  • 3 shows a process in which the number of sub-screens varies according to the number of sub-pictures after different numbers of scenes enter multi-picture: 2 screen, the width ratio and height of each sub-picture
  • the ratio is 1: 1 ; 3 frames, the width ratio of each sub-picture is 1: 1 : 1 , the height ratio is 2: 1 : 1 ; 4 pictures, the aspect ratio and height ratio of each sub-picture are 1 : 1 : 1 : 1 , and so on.
  • Method B According to the number of the designated sites, the multi-screen is segmented by means of a large-screen nested small screen, and the specified site is filled in the sub-picture obtained by segmentation in a specified order.
  • Fig. 4 shows the process of changing the slice form of the multi-picture as the number of sub-pictures changes after different numbers of sites enter the multi-picture.
  • the filling order of the large and small sub-pictures is as follows: The site with the highest voice feature value is displayed as a large screen, and the other remaining sites are displayed as a small screen. For details, see the sequence 1 below.
  • the sub-screens may have a size, and the process of filling the sub-pictures obtained by the specified site into the segmentation will be filled in a specified order, and the specified order may be multiple.
  • Situation preferably, for example:
  • Sequence 1 The site with a large voice feature value is filled in a larger sprite. This will make the most active venue the most prominent.
  • Sequence 2 Preference is given to the historical position in the multi-picture. That is, according to the historical display position information of the site in the multi-screen, the existing historical position is selected, and the bit with the historical display number is preferentially selected. Set, so that the relative position of the venue in the multi-picture is unchanged, to avoid frequent sub-picture jumps, and is convenient for the viewer to watch.
  • the specific If the history display position information of the site 1 is: position 1 has X times, position 2 has Y times, ... position N has Z times, then when the site 1 needs to be displayed, the comparison history display The number of positions, the location with a large number of priority times. When the location has been displayed, select the position with the lower order, and then compare and select them until the display position is selected in the historical display position; if all the positions in the history have been If there is a venue display, select a new location outside of the historical location.
  • each of the site terminals displays the multi-screen
  • the site terminal does not display the screen of the site itself.
  • the site 1 / 2 / 3 is the designated site.
  • the site terminal of the site 1 displays one sub-screen.
  • the sub-screens are the site 2/3; the site terminal of the site 2 displays two sub-sections.
  • the screens and sub-screens are the site 1 / 3; the site terminal of the site 3 displays 2 screens, the sub-screens are the site 1 /2; the remaining other sites are displayed as 3 screens, and the sub-screens are the venues 1 /2/ 3 .
  • the method may further include:
  • the mixing of the voices of all the conference sites is generally performed.
  • the venue of the mixing can be narrowed during the mixing to improve the mixing effect.
  • the rule of the two parts can be included, that is, the selection rule of the site participating in the mixing, that is, selecting a specified number of venues from the active site for multi-party mixing, and second, outputting the rules of mixing, that is, not outputting to the site.
  • the rules of the venue sound are mixed in multiple ways.
  • the rules for outputting the mix can be: The venue in the multi-picture gets the sound of the other venues participating in the mix, and the venue in the multi-picture does not get the sound of all the venues participating in the mix. See Figure 5: If the site participating in the mix is 1/2/3, the four sound signals generated are represented as: AudioData 1/2/3. AudioData 1/2. AudioData 2/3. AudioData 1 /3. The voice that will be heard at site 1 is AudioData 2/3; the voice heard at site 2 is AudioData 1/3; the voice heard at site 3 is AudioData 1/2; the remaining voices heard at the site are AudioData 1/2/ 3.
  • the time period is used as a statistical unit, and some feature values in the time period are counted to determine whether a certain site is in an active state, and as a basis for participating in multi-picture synthesis, thereby realizing dynamic adjustment of the content of the sub-picture in the multi-picture. , significantly improved the effectiveness of the conference, greatly improving the conference experience of the participants.
  • the embodiment of the present invention can also dynamically adjust the number and position of the sub-pictures in the multi-picture, thereby effectively improving the conference effect.
  • 6 is a schematic diagram of a device according to another embodiment of the present invention, where the device includes:
  • the audio receiving unit 601 is configured to receive audio data of the conference site
  • the voice feature value obtaining unit 602 is configured to acquire a voice feature value of the corresponding site in a first specified time period according to the audio data of each site in the site, where the voice feature value is used to represent an activation state of the site;
  • the site screening unit 603 is configured to select a designated site from the multiple sites according to an activation state of each site;
  • the sub-picture updating unit 604 is configured to fill the image of the specified site as a sub-picture into the multi-picture to update the multi-picture in real time.
  • the voice feature value acquiring unit specifically includes:
  • An audio energy value obtaining sub-unit configured to acquire an audio energy value of the corresponding site in a first specified time period, and use the audio energy value as the voice feature value, if the audio energy value is greater than If the specified energy threshold is used, the site is determined to be active; or
  • the continuous voice state duration acquisition sub-unit is configured to count the duration of the corresponding conference site in the continuous voice state during the first specified time period, and use the duration as the voice feature value, if the duration is greater than the specified duration threshold, Then the site is determined to be active.
  • the audio energy value obtaining subunit specifically includes:
  • a first sub-unit configured to select a plurality of second specified time periods in the first specified time period, and acquire a plurality of sample audio energy data in each second specified time period;
  • a first calculating subunit configured to acquire audio energy data of the second time period according to the root mean square value of the plurality of sample audio energy data, and then average the audio energy data of the plurality of second specified time segments As the audio energy value.
  • the audio energy value obtaining subunit specifically includes:
  • a second sub-unit configured to: select a plurality of second specified time periods in the first specified time period, and select a plurality of third specified time periods in each second specified time period; Acquiring multiple sample audio energy data within three specified time periods;
  • a second calculating subunit configured to: obtain audio energy data of a third time period according to a root mean square value of the plurality of sample audio energy data; and further, according to the audio energy data of the plurality of third specified time segments Mean value obtains audio energy data for each second specified time period;
  • a weighting processing subunit configured to: perform weighting processing on each second specified time period, and add the result as the audio energy value; wherein the weighting processing rule is: the closer to the current time Then the weight is greater.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the time period is used as a statistical unit, and some feature values in the time period are counted to determine whether a certain site is in an active state, and as a basis for participating in multi-picture synthesis, thereby realizing dynamic adjustment of the content of the sub-picture in the multi-picture. , significantly improved the effectiveness of the conference, greatly improving the conference experience of the participants.
  • the embodiment of the present invention can also dynamically adjust the number and position of the sub-pictures in the multi-picture. Therefore, it also effectively improves the conference effect.
  • FIG. 7 is a schematic diagram of a system according to still another embodiment of the present invention.
  • the system includes the device and one or more site terminals in the previous embodiment, where the site terminal is configured to display a multi-screen generated by the device.
  • the time period is used as a statistical unit, and some feature values in the time period are counted to determine whether a certain site is in an active state, and as a basis for participating in multi-picture synthesis, thereby realizing dynamic adjustment of the content of the sub-picture in the multi-picture. , significantly improved the effectiveness of the conference, greatly improving the conference experience of the participants.
  • the embodiment of the present invention can also dynamically adjust the number and position of the sub-pictures in the multi-picture, thereby effectively improving the conference effect. It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明实施例公开了一种多画面视讯会议的画面控制方法、设备及系统,所述方法包括:接收会场的音频数据;根据所述会场中每个会场的音频数据,实时获取相应会场在第一指定时间段内的语音特征值,所述语音特征值用于表征会场的激活状态;根据各个会场的激活状态从所述多个会场中选择指定会场;将所述指定会场的图像作为子画面填充到多画面中,以对所述多画面进行实时更新。通过统计时间段内的特征值来判断某会场是否处于激活状态,并作为参与多画面合成的依据,实现了多画面中子画面内容的动态调整,显著提高了会议效果,改善了与会者的会议体验。此外还可以动态调整多画面中子画面的个数及位置,从而也有效的提高了会议效果。

Description

一种多画面视讯会议中的画面控制方法、 设备及系统 本申请要求于 2012 年 5 月 25 日提交中国专利局、 申请号为 201210166632.6 , 发明名称为"一种多画面视讯会议中的画面控制方法、 设备 及系统"的中国专利申请的优先权, 在先申请文件的内容通过引用结合在本申 请中。
技术领域
本发明涉及视讯会议领域, 尤其是涉及一种多画面视讯会议中的画面控 制方法、 设备及系统。
背景技术
在视讯会议系统中, 由于与会会场个数多, 且分布各地, 为使与会者能 与其他会场与会人员进行面对面的直接交流, 在同一时间内能看到其他会场 的与会人员, 普遍釆用了多画面的技术, 与会者通过观看多画面, 可以同时 与多个会场的与会人员进行交流。
当前视讯会议系统显示多画面的方案为: 预先设定多画面的模式, 如 4 画面、 9画面等, 然后将固定的几个会场填入到多画面的子画面中, 会议时各 会场看到的多画面均为这种预先设定的模式。 发明人在实现本发明的过程中 发现: 釆用现有技术中这种方案时, 子画面中的会场可能一直未发言, 而其 他发言踊跃的会场却未在多画面中显示, 使得视讯会议达不到预期的效果; 此外现有技术中的多画面显示形式固定, 无法根据现场情况进行调整。
发明内容 本发明实施例的目的是提供一种多画面视讯会议中的画面控制方法、 设 备及系统, 以根据现场各会场的情况实时调整子画面从而有效提高会议效果。
本发明实施例公开了一种多画面视讯会议的画面控制方法, 所述方法包 括:
接收会场的音频数据; 根据所述会场中每个会场的音频数据, 实时获取相应会场在第一指定时 间段内的语音特征值, 所述语音特征值用于表征会场的激活状态;
根据各个会场的激活状态从所述多个会场中选择指定会场;
将所述指定会场的图像作为子画面填充到多画面中, 以对所述多画面进 行实时更新。
本发明实施例还公开了一种多画面视讯会议的画面控制设备, 所述设备 包括:
音频接收单元, 用于接收会场的音频数据;
语音特征值获取单元, 用于根据所述会场中每个会场的音频数据, 实时 获取相应会场在第一指定时间段内的语音特征值, 所述语音特征值用于表征 会场的激活状态;
会场 选单元, 用于根据各个会场的激活状态从所述多个会场中选择指 定会场;
子画面更新单元, 用于将所述指定会场的图像作为子画面填充到多画面 中, 以对所述多画面进行实时更新。
本发明实施例还公开了一种多画面视讯会议的画面控制系统, 所述系统 包括上述设备以及一个或多个会场终端, 所述会场终端用于显示经所述设备 控制生成的多画面。
本发明实施例以时间段为统计单位, 通过统计该时间段内的一些特征值 来判断某会场是否处于激活状态, 并作为参与多画面合成的依据, 从而实现 了多画面中子画面内容的动态调整, 显著提高了会议效果, 大大改善了与会 者的会议体验。 此外, 本发明实施例还可以动态调整多画面中子画面的个数 及位置, 从而也有效的提高了会议效果。
附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图 1是本发明一实施例方法的流程图;
图 2是本发明一实施例中音视频解码示意图;
图 3是本发明一实施例中多画面等比切分方式示意图;
图 4是本发明一实施例中多画面大小子画面嵌套切分方式示意图; 图 5是本发明一实施例中多方混音示意图;
图 6是本发明另一实施例设备的示意图;
图 7是本发明再一实施例系统的示意图。
具体实施方式
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例 , 都属于本发明保护的范围。
图 1是本发明一实施例方法的流程图, 所述方法包括:
S101: 接收会场的音频数据。 所述会场可以是一个或多个。 在本实施例 中, 具体可以是 MCU (Multipoint Control Unit, 多点控制单元)接收各会 场的 RTP (实时传送协议, Real-time Transport Protocol )码流, 并根据对 应的音视频协议进行解码处理, RTP包解码后输出为音视频棵码流, 参见图 2 所示, 图 2中 Site表示会场, Site 1码流解码后音频数据为 AudioData 1, 视频数据为 VideoDatal— Site X码流解码后音频数据为 AudioData X, 视频 数据为 VideoData X。
S102: 根据所述会场中每个会场的音频数据, 实时获取相应会场在第一 指定时间段内的语音特征值, 所述语音特征值用于表征会场的激活状态。 要 选择哪些会场应该进入到多画面中, 首先要有一个评判标准, 在本实施例中, 这一评判标准就是各个会场的语音特征值。 若某一会场的语音特征值满足某 种条件, 则该会场就可以看为一个激活会场, 或称活跃会场, 就可以作为进 入多画面的一个备选会场了。
在本实施例中, 可以有多种方式对语音特征值进行定义及评价, 下面以 举例的方式进行说明。 需要指出的是, 在本发明其他实施例中, 同样还可以 有其他多种方式对语音特征值进行定义及评价, 对此本发明实施例不做限制。
方式一: 获取相应会场在第一指定时间段内的音频能量值, 并将所述音 频能量值作为所述语音特征值 V. , 若所述音频能量值大于指定的能量阈值, 则 判定会场处于激活状态。 优选的, 获取音频能量值可以有以下两种方法: 第一种方法是: 在所述第一指定时间段内选取多个第二指定时间段, 在 每个第二指定时间段内获取多个样点音频能量数据, 根据所述多个样点音频 能量数据的均方根值获取第二时间段的音频能量数据, 再将所述多个第二指 定时间段的音频能量数据的均值作为所述音频能量值。
具体来讲, 可以以 TO (典型的如 1分钟)为第一指定时间段, 然后获取 各会场在 TO 内的语音特征值。 获取的步骤是: 对于一个会场, 在 TO中选取 多个第二指定时间段 T1 (如 20ms ) , 即以 T1为能量计算子单元, 然后在 T1 内进行釆样获取该会场的多个音频能量数据, 如在一个 T1内进行 N次釆样, 每次釆样获取的音频能量数据分别为 xl、 x2、 ... xN, 则该会场一个 T1的音频 能量数据 xrms可通过以下公式计算:
Figure imgf000006_0001
然后求出 TO内各个 T1的平均值, 即可作为 TO的音频特征值。
第二种方法是: 在所述第一指定时间段内选取多个第二指定时间段, 再 在每个第二指定时间段内选取多个第三指定时间段; 在每个第三指定时间段 内获取多个样点音频能量数据, 根据所述多个样点音频能量数据的均方根值 获取第三时间段的音频能量数据; 再根据所述多个第三指定时间段的音频能 量数据的均值获取每个第二指定时间段的音频能量数据; 最后将每个第二指 定时间段的音频能量数据进行加权处理后相加, 将结果作为所述音频能量值; 其中所述加权处理的规则是: 距当前时刻越近则权重越大。
所述第二种方法是基于第一种方法的, 是在第一种方法基础上的扩充。 具体来讲, 区别点在于第二种方法考察的是一个更长的时间段 T, 然后在 T内 选取多个 TO, 通过第一种方法得到每个 TO的音频能量数据, 然后对各个 TO 的音频能量数据进行加权处理后相加, 将结果作为最终的音频能量值。 由于 第二种方法考察的时间段更长(由 TO扩展到 T ) , 所以会比第一种方法更准 确一些。
方式二: 统计相应会场在所述第一指定时间段内处于连续语音态的时长, 并将所述时长作为语音特征值, 若所述时长大于指定的时长阔值, 则判定会 场处于激活状态。 具体的, 可以进行 VAD ( Voice Act ivi ty Detect ion, 语音 活动检测)检测, 统计 TO时间段内处于连续语音态的时长, 比较时长, 并根 据时长来选定激活会场。
如会场 1、 2. . N, 在 TO 时间段内 VAD检测累积的时长分别对应为: VolTimeLen VolTimeLen 2... VolTimeLen N, 对各个 VolTimeLen进行排序, 并与一个预置的时长阔值 GateVolTimeLen 进行对比; 大于等于 GateVolTimeLen的会场可标识为激活会场;小于 Ga teVolTimeLen的会场标识 为非激活会场。 当然, 在本发明其他实施例中, 还可以不使用时长阔值, 而 是在所有会场中选取连续语音态时长最大的 W个会场作为激活会场。
方式三: 获取相应会场在第一指定时间段内的音频能量值和连续语音态 时长, 并将所述音频能量值和所述时长的组合作为语音特征值; 若所述组合 满足指定规则, 则判定会场处于激活状态。 例如可以先用音频能量值进行初 次筛选, 再用连续语音态时长进行二次过滤; 或者以一种值为主, 同时参考 另一值, 例如: 语音时长长但语音能量小, 该会场可以被认为是激活, 而语 音时长短但语音能量大, 则该会场不能被认为是激活, 这样就可以避免出现 因与会者突然敲下桌子或者咳嗽声而错误的将该会场判断为激活会场的情 况。
S103: 根据各个会场的激活状态从所述多个会场中选择指定会场。 所述 指定会场可以是一个或多个。 根据语音特征值获得了各个会场的激活状态后, 判断哪些会场应该作为指定会场进入到多画面中就有了判断基础了。
在本实施例中, 可以有多种方式从激活会场中选择出指定会场填充到多 画面中, 下面以举例的方式进行说明。 需要指出的是, 在本发明其他实施例 中, 同样还可以有其他多种方式进行选择, 对此本发明实施例不做限制。
方式曱: 将当前处于激活状态的会场作为指定会场。 即将当前所有的激 活会场作为指定会场。 这样实现起来最简单。
方式乙: 将上次处于激活状态的会场和当前处于激活状态的会场都作为 指定会场。 这样可以兼顾历史显示。 具体的, 当前的激活会场分别为 Act iveS i te 2、 3... Act iveS i teN, 记录在一个集合 CurAct iveTabe l中; 而 上一轮切换时的激活会场记录在集合 PreAc t iveTabe l , 取两个集合 PreAc t iveTabe 1与 CurAc t iveTabe 1会场信息的并集部分作为本次多画面的子 画面会场, 参与多画面的拼接。
方式丙: 将当前处于激活状态的会场, 以及上次处于激活状态且语音特 征值大于当前处于激活状态的会场的语音特征值最小值的会场, 作为指定会 场。 即当前的激活会场都参与多画面拼接, 而上次的激活会场根据语音特征 比较, 也部分可以参与本次多画面拼接, 例如: 上次激活会场的语音特征值 小于当前激活会场中的最小语音特征值的会场, 不参与本次多画面拼接, 而 上次激活会场的语音特征值大于等于当前激活会场中的最小语音特征值的会 场, 可以参与本次多画面的拼接。
S104 : 将所述指定会场的图像作为子画面填充到多画面中, 以对所述多 画面进行实时更新。 这样多画面中的各个子画面便可以在会议进行过程中随 着各个会场的发言情况而实时进行调整了, 避免了现有技术中看到的子画面 一成不变的情况, 可以及时的将不活跃会场从多画面中剔除, 并及时的将新 的活跃会场力。入到多画面中。 多画面中的子画面可以是一个或多个。
在本实施例中, 将指定会场作为子画面填充到多画面的步骤, 具体可以 有多种方式, 下面以举例的方式进行说明。 需要指出的是, 在本发明其他实 施例中, 同样还可以有其他多种方式进行填充, 对此本发明实施例不做限制。
方式 A: 根据所述指定会场的数量, 釆用等比切分的方式将所述多画面进 行切分, 并将所述指定会场按照指定顺序填入切分后得到的子画面中。 所谓 等比切分, 也可称为宽高等比切分, 即: 对多画面切分的次数为指定会场的 数量减一, 每次切分时都将被切分的窗口均分为二。 参见图 3 , 图 3示出了不 同数量的会场进入到多画面后, 多画面的切分形式随着子画面数量的变化而 变化的过程: 2画面时, 则各子画面的宽度比和高度比均为 1 : 1 ; 3画面时, 则各子画面的宽度比为 1 : 1 : 1 , 高度比为 2: 1 : 1 ; 4画面时, 个子画面的宽高 比和高度比均为 1: 1 : 1 : 1 , 以此类推。
方式 B: 根据所述指定会场的数量, 釆用大画面嵌套小画面的方式将所述 多画面进行切分, 并将所述指定会场按照指定顺序填入切分后得到的子画面 中。 参见图 4 , 图 4示出了不同数量的会场进入到多画面后, 多画面的切分形 式随着子画面数量的变化而变化的过程。 此外, 在图 4 中, 大、 小子画面的 填充顺序为: 语音特征值最高的会场显示为大画面, 其他剩余会场显示为小 画面, 具体可参见下文的顺序 1。
在以上方式 A、 方式 B中, 子画面有时会有大小之分, 那么所述指定会场 填入切分后得到的子画面的过程将按指定顺序进行填充, 而所述指定顺序可 以是多种情况, 优选的, 例如:
顺序 1 : 语音特征值较大的会场填入到较大的子画面中。 这样可以使得最 活跃的会场最突出显示。
顺序 2 : 优先填入在所述多画面中的历史位置。 即, 根据会场在多画面中 的历史显示位置信息, 选择已有的历史位置, 优先选择历史显示次数多的位 置, 以使得该会场在多画面中的相对位置不变, 避免子画面跳跃频繁, 便于 观看者观看。 在本实施例中, 具体的: 如会场 1的历史显示位置信息分别为: 位置 1 有 X次、 位置 2有 Y次、 …位置 N有 Z次, 则当需要显示会场 1时, 比较历史显示位置次数, 优先选择次数值大的位置, 当该位置已有会场显示, 则选择次数次低的位置, 依次进行比较选择, 直至在历史显示位置中选到显 示位置; 若历史中所有的位置都已有会场显示, 则选择一个历史位置之外的 新位置。
另外, 各个会场终端对所述多画面进行显示时, 也可以有多种情况: 可 以统一显示同一个多画面, 该多画面由所有的所述指定会场组成; 也可以是 令被选为指定会场的会场终端不显示该会场本身的画面, 如会场 1 / 2/ 3 为指 定会场, 则会场 1的会场终端显示 1个子画面, 子画面分别为会场 2/ 3;会场 2的会场终端显示 2个子画面, 子画面分别为会场 1 / 3; 会场 3的会场终端显 示 2个画面, 子画面分别为会场 1 /2 ; 剩余其他会场均显示为 3个画面, 子画 面分别为会场 1 /2/ 3。
此外, 在本实施例中步骤 S1 03之后, 还可以包括:
从所述激活会场中选择指定数量的会场进行多方混音, 和 /或, 按照不向 会场输出本会场声音的规则进行多方混音。 现有技术中混音时一般是对所有 会场的语音进行混音, 而在本实施例中, 因为可以判断出激活会场, 所以混 音时就可以缩小混音的会场范围, 以提高混音效果。 可以包括两部分规则, 一是参与混音的会场的选择规则, 即, 从所述激活会场中选择指定数量的会 场进行多方混音, 二是输出混音的规则, 即, 按照不向会场输出本会场声音 的规则进行多方混音。
对于从所述激活会场中选择指定数量的会场进行多方混音, 可以是: 所 有激活会场都参与混音; 也可以是: 在多画面中的所有会场即 M个所述指定 会场都参与混音;还可以是:用户先设定混音会场上限个数 X(如 X取值为 4 ) , 然后比较激活会场的数量 N与 X的大小, 若 N<=X , 则取所有 N个激活会场进 行混音, 若 N>X, 则在 N个激活会场中选取语音特征值最大的 X方进行混音。 而输出混音的规则, 可以是: 在多画面中的会场得到的是其他参与混音 的会场的声音, 而不在多画面中的会场得到的是所有参与混音的会场的声音。 参见图 5所示: 如果参与混音的会场为 1/2/3, 则生成的 4个声音信号, 分别 表示为: AudioData 1/2/3. AudioData 1/2. AudioData 2/3. AudioData 1/3。 会场 1将听到的声音为 AudioData 2/3; 会场 2听到的声音为 AudioData 1/3; 会场 3听到的声音为 AudioData 1/2; 剩余其他会场听到的声音为 AudioData 1/2/3。
本实施例以时间段为统计单位, 通过统计该时间段内的一些特征值来判 断某会场是否处于激活状态, 并作为参与多画面合成的依据, 从而实现了多 画面中子画面内容的动态调整, 显著提高了会议效果, 大大改善了与会者的 会议体验。 此外, 本发明实施例还可以动态调整多画面中子画面的个数及位 置, 从而也有效的提高了会议效果。 图 6是本发明另一实施例设备的示意图, 所述设备包括:
音频接收单元 601, 用于接收会场的音频数据;
语音特征值获取单元 602, 用于根据所述会场中每个会场的音频数据, 实 时获取相应会场在第一指定时间段内的语音特征值, 所述语音特征值用于表 征会场的激活状态;
会场筛选单元 603,用于根据各个会场的激活状态从所述多个会场中选择 指定会场;
子画面更新单元 604,用于将所述指定会场的图像作为子画面填充到多画 面中, 以对所述多画面进行实时更新。
优选的, 所述语音特征值获取单元具体包括:
音频能量值获取子单元, 用于获取相应会场在第一指定时间段内的音频 能量值, 并将所述音频能量值作为所述语音特征值, 若所述音频能量值大于 指定的能量阔值, 则判定会场处于激活状态; 或者,
连续语音态时长获取子单元, 用于统计相应会场在所述第一指定时间段 内处于连续语音态的时长, 并将所述时长作为语音特征值, 若所述时长大于 指定的时长阔值, 则判断会场处于激活状态。
优选的, 所述音频能量值获取子单元具体包括:
第一釆样子单元, 用于在所述第一指定时间段内选取多个第二指定时间 段, 在每个第二指定时间段内获取多个样点音频能量数据;
第一计算子单元, 用于根据所述多个样点音频能量数据的均方根值获取 第二时间段的音频能量数据, 再将所述多个第二指定时间段的音频能量数据 的均值作为所述音频能量值。
优选的, 所述音频能量值获取子单元具体包括:
第二釆样子单元, 用于: 在所述第一指定时间段内选取多个第二指定时 间段, 再在每个第二指定时间段内选取多个第三指定时间段; 在每个第三指 定时间段内获取多个样点音频能量数据;
第二计算子单元, 用于: 根据所述多个样点音频能量数据的均方根值获 取第三时间段的音频能量数据; 再根据所述多个第三指定时间段的音频能量 数据的均值获取每个第二指定时间段的音频能量数据;
加权处理子单元, 用于: 将每个第二指定时间段的音频能量数据进行加 权处理后相加, 将结果作为所述音频能量值; 其中所述加权处理的规则是: 距当前时刻越近则权重越大。
对于设备实施例而言, 由于其基本相似于方法实施例, 所以描述的比较 简单, 相关之处参见方法实施例的部分说明即可。
本实施例以时间段为统计单位, 通过统计该时间段内的一些特征值来判 断某会场是否处于激活状态, 并作为参与多画面合成的依据, 从而实现了多 画面中子画面内容的动态调整, 显著提高了会议效果, 大大改善了与会者的 会议体验。 此外, 本发明实施例还可以动态调整多画面中子画面的个数及位 置, 从而也有效的提高了会议效果。 图 7 是本发明再一实施例系统的示意图, 所述系统包括上一实施例所述 的设备以及一个或多个会场终端, 所述会场终端用于显示所述设备生成的多 画面。
对于系统实施例而言, 由于其基本相似于方法实施例, 所以描述的比较 简单, 相关之处参见方法实施例的部分说明即可。
本实施例以时间段为统计单位, 通过统计该时间段内的一些特征值来判 断某会场是否处于激活状态, 并作为参与多画面合成的依据, 从而实现了多 画面中子画面内容的动态调整, 显著提高了会议效果, 大大改善了与会者的 会议体验。 此外, 本发明实施例还可以动态调整多画面中子画面的个数及位 置, 从而也有效的提高了会议效果。 需要说明的是, 在本文中, 诸如第一和第二等之类的关系术语仅仅用来 将一个实体或者操作与另一个实体或操作区分开来, 而不一定要求或者暗示 这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语"包括"、 "包含" 或者其任何其他变体意在涵盖非排他性的包含, 从而使得包括一系 列要素的过程、 方法、 物品或者设备不仅包括那些要素, 而且还包括没有明 确列出的其他要素, 或者是还包括为这种过程、 方法、 物品或者设备所固有 的要素。 在没有更多限制的情况下, 由语句 "包括一个 ... ... " 限定的要素, 并不排除在包括所述要素的过程、 方法、 物品或者设备中还存在另外的相同 要素。
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步 骤是可以通过程序来指令相关的硬件来完成, 所述的程序可以存储于计算机 可读取存储介质中, 这里所称得的存储介质, 如: R0M、 RAM, 磁碟、 光盘等。
以上所述仅为本发明的较佳实施例而已, 并非用于限定本发明的保护范 施例的说明只是用于帮助理解本发明的方法及其核心思想; 同时, 对于本领 域的一般技术人员, 依据本发明的思想, 在具体实施方式及应用范围上均会 有改变之处。 综上所述, 本说明书内容不应理解为对本发明的限制。 凡在本 发明的精神和原则之内所作的任何修改、 等同替换、 改进等, 均包含在本发 明的保护范围内。

Claims

权 利 要求 书
1、 一种多画面视讯会议的画面控制方法, 其特征在于, 所述方法包括: 接收会场的音频数据;
根据所述会场中每个会场的音频数据, 实时获取相应会场在第一指定时间 段内的语音特征值, 所述语音特征值用于表征会场的激活状态;
根据各个会场的激活状态从所述多个会场中选择指定会场;
将所述指定会场的图像作为子画面填充到多画面中, 以对所述多画面进行 实时更新。
2、 根据权利要求 1所述的方法, 其特征在于, 获取相应会场在第一指定时 间段内的语音特征值的步骤, 具体包括:
获取相应会场在第一指定时间段内的音频能量值, 并将所述音频能量值作 为所述语音特征值, 若所述音频能量值大于指定的能量阔值, 则判定会场处于 激活状态。
3、 根据权利要求 2所述的方法, 其特征在于, 获取相应会场在第一指定时 间段内的音频能量值的步骤, 具体包括:
在所述第一指定时间段内选取多个第二指定时间段, 在每个第二指定时间 段内获取多个样点音频能量数据, 根据所述多个样点音频能量数据的均方根值 获取第二时间段的音频能量数据, 再将所述多个第二指定时间段的音频能量数 据的均值作为所述音频能量值。
4、 根据权利要求 2所述的方法, 其特征在于, 获取相应会场在第一指定时 间段内的音频能量值的步骤, 具体包括:
在所述第一指定时间段内选取多个第二指定时间段, 再在每个第二指定时 间段内选取多个第三指定时间段; 在每个第三指定时间段内获取多个样点音频 能量数据, 根据所述多个样点音频能量数据的均方根值获取第三时间段的音频 能量数据; 再根据所述多个第三指定时间段的音频能量数据的均值获取每个第 二指定时间段的音频能量数据; 最后将每个第二指定时间段的音频能量数据进 行加权处理后相加, 将结果作为所述音频能量值; 其中所述加权处理的规则是: 距当前时刻越近则权重越大。
5、 根据权利要求 1所述的方法, 其特征在于, 获取相应会场在第一指定时 间段内的语音特征值的步骤, 具体包括:
统计相应会场在所述第一指定时间段内处于连续语音态的时长, 并将所述 时长作为语音特征值, 若所述时长大于指定的时长阔值, 则判定会场处于激活 状态; 或者,
获取相应会场在第一指定时间段内的音频能量值和连续语音态时长, 并将 所述音频能量值和所述时长的组合作为语音特征值, 若所述组合满足指定规则, 则判定会场处于激活状态。
6、 根据权利要求 1所述的方法, 其特征在于, 根据各个会场的激活状态从 所述多个会场中选择指定会场的步骤, 具体包括:
将当前处于激活状态的会场作为指定会场; 或者,
将上次处于激活状态的会场和当前处于激活状态的会场都作为指定会场; 或者,
将当前处于激活状态的会场, 以及上次处于激活状态且语音特征值大于当 前处于激活状态的会场的语音特征值最小值的会场, 作为指定会场。
7、 根据权利要求 1所述的方法, 其特征在于, 将所述指定会场的图像作为 子画面填充到多画面中的步骤, 具体包括:
根据所述指定会场的数量, 釆用等比切分的方式将所述多画面进行切分, 并将所述指定会场按照指定顺序填入切分后得到的子画面中; 或者,
根据所述指定会场的数量, 釆用大画面嵌套小画面的方式将所述多画面进 行切分, 并将所述指定会场按照指定顺序填入切分后得到的子画面中。
8、 根据权利要求 7所述的方法, 其特征在于, 所述指定顺序具体为: 语音特征值较大的会场填入到较大的子画面中; 或者,
优先填入在所述多画面中的历史位置的顺序。
9、 根据权利要求 1所述的方法, 其特征在于, 根据各个会场的激活状态从 所述多个会场中选择指定会场的步骤之后, 还包括: 从所述激活会场中选择指定数量的会场进行多方混音, 和 /或, 按照不向会 场输出本会场声音的规则进行多方混音。
10、 一种多画面视讯会议的画面控制设备, 其特征在于, 所述设备包括: 音频接收单元, 用于接收会场的音频数据;
语音特征值获取单元, 用于根据所述会场中每个会场的音频数据, 实时获 取相应会场在第一指定时间段内的语音特征值, 所述语音特征值用于表征会场 的激活状态;
会场筛选单元, 用于根据各个会场的激活状态从所述多个会场中选择指定 会场;
子画面更新单元, 用于将所述指定会场的图像作为子画面填充到多画面中, 以对所述多画面进行实时更新。
1 1、 根据权利要求 10所述的设备, 其特征在于, 所述语音特征值获取单元 具体包括:
音频能量值获取子单元, 用于获取相应会场在第一指定时间段内的音频能 量值, 并将所述音频能量值作为所述语音特征值, 若所述音频能量值大于指定 的能量阔值, 则判定会场处于激活状态; 或者,
连续语音态时长获取子单元, 用于统计相应会场在所述第一指定时间段内 处于连续语音态的时长, 并将所述时长作为语音特征值, 若所述时长大于指定 的时长阔值, 则判断会场处于激活状态。
12、 根据权利要求 1 1所述的设备, 其特征在于, 所述音频能量值获取子单 元具体包括:
第一釆样子单元, 用于在所述第一指定时间段内选取多个第二指定时间段, 在每个第二指定时间段内获取多个样点音频能量数据;
第一计算子单元, 用于根据所述多个样点音频能量数据的均方根值获取第 二时间段的音频能量数据, 再将所述多个第二指定时间段的音频能量数据的均 值作为所述音频能量值。
13、 根据权利要求 1 1所述的设备, 其特征在于, 所述音频能量值获取子单 元具体包括:
第二釆样子单元, 用于: 在所述第一指定时间段内选取多个第二指定时间 段, 再在每个第二指定时间段内选取多个第三指定时间段; 在每个第三指定时 间段内获取多个样点音频能量数据;
第二计算子单元, 用于: 根据所述多个样点音频能量数据的均方根值获取 第三时间段的音频能量数据; 再根据所述多个第三指定时间段的音频能量数据 的均值获取每个第二指定时间段的音频能量数据;
加权处理子单元, 用于: 将每个第二指定时间段的音频能量数据进行加权 处理后相加, 将结果作为所述音频能量值; 其中所述加权处理的规则是: 距当 前时刻越近则权重越大。
14、 一种多画面视讯会议的画面控制系统, 其特征在于, 所述系统包括权 利要求 10~13任一项所述的设备以及一个或多个会场终端, 所述会场终端用于 显示经所述设备控制生成的多画面。
PCT/CN2012/085024 2012-05-25 2012-11-22 一种多画面视讯会议中的画面控制方法、设备及系统 WO2013174115A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/553,263 US20150092011A1 (en) 2012-05-25 2014-11-25 Image Controlling Method, Device, and System for Composed-Image Video Conference

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210166632.6A CN102857732B (zh) 2012-05-25 2012-05-25 一种多画面视讯会议中的画面控制方法、设备及系统
CN201210166632.6 2012-05-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/553,263 Continuation US20150092011A1 (en) 2012-05-25 2014-11-25 Image Controlling Method, Device, and System for Composed-Image Video Conference

Publications (1)

Publication Number Publication Date
WO2013174115A1 true WO2013174115A1 (zh) 2013-11-28

Family

ID=47403875

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/085024 WO2013174115A1 (zh) 2012-05-25 2012-11-22 一种多画面视讯会议中的画面控制方法、设备及系统

Country Status (3)

Country Link
US (1) US20150092011A1 (zh)
CN (1) CN102857732B (zh)
WO (1) WO2013174115A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112312224A (zh) * 2020-04-30 2021-02-02 北京字节跳动网络技术有限公司 信息显示方法、装置和电子设备

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103139546B (zh) * 2013-02-04 2017-02-08 武汉今视道电子信息科技有限公司 一种用于车载显示的多路视频切换方法
CN105791738B (zh) * 2014-12-15 2019-03-12 深圳Tcl新技术有限公司 视频会议中视频窗口的调整方法及装置
US9819877B1 (en) * 2016-12-30 2017-11-14 Microsoft Technology Licensing, Llc Graphical transitions of displayed content based on a change of state in a teleconference session
CN109151367B (zh) * 2018-10-17 2021-01-26 维沃移动通信有限公司 一种视频通话方法及终端设备
CN110262866B (zh) * 2019-06-18 2022-06-28 深圳市拔超科技股份有限公司 一种屏幕多画面布局的切换方法、装置及可读存储介质
US11050973B1 (en) 2019-12-27 2021-06-29 Microsoft Technology Licensing, Llc Dynamically controlled aspect ratios for communication session video streams
US11064256B1 (en) 2020-01-15 2021-07-13 Microsoft Technology Licensing, Llc Dynamic configuration of communication video stream arrangements based on an aspect ratio of an available display area
CN112185360B (zh) * 2020-09-28 2024-07-02 苏州科达科技股份有限公司 语音数据识别方法、多人会议的语音激励方法及相关设备
CN114339363B (zh) * 2021-12-21 2023-12-22 深圳市捷视飞通科技股份有限公司 画面切换处理方法、装置、计算机设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248210A1 (en) * 2005-05-02 2006-11-02 Lifesize Communications, Inc. Controlling video display mode in a video conferencing system
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
CN101867768A (zh) * 2010-05-31 2010-10-20 杭州华三通信技术有限公司 一种视频会议会场画面控制方法及其装置
CN102131071A (zh) * 2010-01-18 2011-07-20 华为终端有限公司 视频画面切换的方法和装置

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6288740B1 (en) * 1998-06-11 2001-09-11 Ezenia! Inc. Method and apparatus for continuous presence conferencing with voice-activated quadrant selection
US6577333B2 (en) * 2000-12-12 2003-06-10 Intel Corporation Automatic multi-camera video composition
US20050099492A1 (en) * 2003-10-30 2005-05-12 Ati Technologies Inc. Activity controlled multimedia conferencing
US7664246B2 (en) * 2006-01-13 2010-02-16 Microsoft Corporation Sorting speakers in a network-enabled conference
US7768543B2 (en) * 2006-03-09 2010-08-03 Citrix Online, Llc System and method for dynamically altering videoconference bit rates and layout based on participant activity
US8542266B2 (en) * 2007-05-21 2013-09-24 Polycom, Inc. Method and system for adapting a CP layout according to interaction between conferees
US8514265B2 (en) * 2008-10-02 2013-08-20 Lifesize Communications, Inc. Systems and methods for selecting videoconferencing endpoints for display in a composite video image
US8456510B2 (en) * 2009-03-04 2013-06-04 Lifesize Communications, Inc. Virtual distributed multipoint control unit
CN101867786A (zh) * 2009-04-20 2010-10-20 中兴通讯股份有限公司 一种视频监控方法及装置
US8558868B2 (en) * 2010-07-01 2013-10-15 Cisco Technology, Inc. Conference participant visualization
GB201017382D0 (en) * 2010-10-14 2010-11-24 Skype Ltd Auto focus
US8379077B2 (en) * 2010-11-24 2013-02-19 Cisco Technology, Inc. Automatic layout and speaker selection in a continuous presence video conference
US9118940B2 (en) * 2012-07-30 2015-08-25 Google Technology Holdings LLC Video bandwidth allocation in a video conference

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248210A1 (en) * 2005-05-02 2006-11-02 Lifesize Communications, Inc. Controlling video display mode in a video conferencing system
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
CN102131071A (zh) * 2010-01-18 2011-07-20 华为终端有限公司 视频画面切换的方法和装置
CN101867768A (zh) * 2010-05-31 2010-10-20 杭州华三通信技术有限公司 一种视频会议会场画面控制方法及其装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112312224A (zh) * 2020-04-30 2021-02-02 北京字节跳动网络技术有限公司 信息显示方法、装置和电子设备

Also Published As

Publication number Publication date
CN102857732B (zh) 2015-12-09
CN102857732A (zh) 2013-01-02
US20150092011A1 (en) 2015-04-02

Similar Documents

Publication Publication Date Title
WO2013174115A1 (zh) 一种多画面视讯会议中的画面控制方法、设备及系统
EP3163866B1 (en) Self-adaptive display method and device for image of mobile terminal, and computer storage medium
JP5508450B2 (ja) マルチストリームかつマルチサイトのテレプレゼンス会議システムのための自動的なビデオレイアウト
JP6151273B2 (ja) 動的な能動的参加者が無制限のテレビ会議
CN102843542B (zh) 多流会议的媒体协商方法、设备和系统
US7707247B2 (en) System and method for displaying users in a visual conference between locations
JP4994646B2 (ja) 通信端末および通信システム、並びに通信端末の表示方法
US20070299981A1 (en) Techniques for managing multi-window video conference displays
WO2012103820A2 (zh) 视频会议中突显关注者的方法、设备及系统
US10050882B2 (en) Method for adjusting media stream transmission bandwidth and related apparatus
EP2863642B1 (en) Method, device and system for video conference recording and playing
WO2015176569A1 (zh) 用于视频会议呈现的方法、装置和系统
US10110831B2 (en) Videoconference device
US8836753B2 (en) Method, apparatus, and system for processing cascade conference sites in cascade conference
CN106162046A (zh) 一种视频会议图像呈现方法及其装置
US9602758B2 (en) Communication apparatus, conference system, computer-readable recording medium, and display control method
WO2016206471A1 (zh) 多媒体业务处理方法、系统及装置
KR102069695B1 (ko) 분산 텔레프레즌스 서비스 제공 방법 및 장치
TWI526083B (zh) 用於多使用者回授以影響影音品質的系統、方法與電腦程式產品
CN116366798A (zh) 视频会议中显示视频画面的方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12877383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12877383

Country of ref document: EP

Kind code of ref document: A1