WO2019011189A1 - Audio and video acquisition method and apparatus for conference television, and terminal device - Google Patents

Audio and video acquisition method and apparatus for conference television, and terminal device Download PDF

Info

Publication number
WO2019011189A1
WO2019011189A1 PCT/CN2018/094807 CN2018094807W WO2019011189A1 WO 2019011189 A1 WO2019011189 A1 WO 2019011189A1 CN 2018094807 W CN2018094807 W CN 2018094807W WO 2019011189 A1 WO2019011189 A1 WO 2019011189A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
collection
conference
sound
Prior art date
Application number
PCT/CN2018/094807
Other languages
French (fr)
Chinese (zh)
Inventor
张泽良
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2019011189A1 publication Critical patent/WO2019011189A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/155Conference systems involving storage of or access to video conference sessions

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

An audio and video acquisition method for a conference television comprises: acquiring audio data obtained by an audio and video acquisition device by means of sound acquisition, and determining an audio and video source position of a speech in a conference site according to the audio data; moving, according to the audio and video source position, the position of the audio and video acquisition device to a sound acquisition position that satisfies a preset sound acquisition condition; and moving, according to the audio and video source position and the sound acquisition position, the position of the audio and video acquisition device to an image acquisition position that satisfies a preset image acquisition condition.

Description

会议电视的音视频采集方法、装置和终端设备Audio and video collection method, device and terminal device for conference television 技术领域Technical field
本公开涉及但不限于会议电视系统领域,尤其是一种会议电视的音视频采集方法、装置和终端设备。The present disclosure relates to, but is not limited to, the field of conference television systems, and in particular, to an audio and video collection method, apparatus and terminal device for conference television.
背景技术Background technique
在会议中经常出现会议发言人没有在摄像头采集范围内或者麦克风采集声音模糊等情形,可通过人为参与的手段,调节摄像头或者调节麦克风位置,以使摄像头或者麦克风采集视频或音频的效果达到最好。In the meeting, there are often cases where the conference spokesperson is not in the camera collection range or the microphone collection sound is blurred. The camera can be adjusted or the microphone position can be adjusted by means of human participation, so that the camera or microphone can achieve the best video or audio. .
比如在一个小型会议场景,视频经常只有一个或两个摄像头采集,音频一般只有一个麦克风采集,摄像头和麦克风的采集角度和位置也是事先预置好的。For example, in a small conference scene, the video is often only collected by one or two cameras. The audio is generally only collected by one microphone. The acquisition angle and position of the camera and the microphone are also preset.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
上述小型会议场景下的音视频采集方式,只能保证特定位置的人处于预设的图像采集和声音采集,如果与会的其他人员想交流,就可能出现图像采集不能抓拍发言人,声音采集不清晰等情况,这时,只能通过人为参与的方式,调节摄像头和麦克风的采集角度和位置,使发言人处于预设的图像采集和声音采集的状态。The audio and video collection methods in the above small conference scene can only ensure that the people in a specific location are in the preset image acquisition and sound collection. If other people attending the conference want to communicate, the image collection may not capture the speaker, and the sound collection is not clear. In other cases, at this time, the angle and position of the camera and microphone can only be adjusted by means of human participation, so that the speaker is in the state of preset image acquisition and sound collection.
本公开提供一种会议电视的音视频采集方法、装置和终端设备,能够实现根据声音采集定位发言人并自动移动到发言人音视频采集的预设位置。The present disclosure provides an audio and video collection method, apparatus, and terminal device for a conference television, which can realize positioning a speaker according to the sound collection and automatically move to a preset position of the speaker audio and video collection.
本公开实施例提供一种会议电视的音视频采集方法,所述方法包括:An embodiment of the present disclosure provides an audio and video collection method for a conference television, where the method includes:
获取音视频采集设备进行声音采集得到的音频数据,根据所述音频数据定位会场内发言的音视频源位置;Obtaining audio data obtained by the audio and video collection device for sound collection, and positioning the audio and video source position of the speaking in the venue according to the audio data;
根据所述音视频源位置,移动所述音视频采集设备的自身位置到满足声 音采集预设条件的声音采集位置;And moving the position of the audio and video collection device to a sound collection position that satisfies a preset condition of the sound collection according to the position of the audio and video source;
根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。And moving the position of the audio and video collection device to an image collection position that satisfies an image acquisition preset condition according to the audio and video source position and the sound collection position.
本公开实施例还提供一种会议电视的音视频采集装置,所述装置包括:The embodiment of the present disclosure further provides an audio and video collection device for a conference television, where the device includes:
定位模块,设置为:获取音视频采集设备进行声音采集得到的音频数据,根据所述音频数据定位会场内发言的音视频源位置;The positioning module is configured to: obtain audio data obtained by the audio and video collection device for sound collection, and locate the audio and video source position of the speaking in the conference according to the audio data;
声音采集位置移动模块,设置为:根据所述音视频源位置,移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置;The sound collection position moving module is configured to: move the position of the audio and video collection device to a sound collection position that satisfies a sound collection preset condition according to the audio and video source position;
图像采集位置移动模块,设置为:根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。The image capturing position moving module is configured to: move the position of the audio and video collecting device to an image capturing position that satisfies an image capturing preset condition according to the audio and video source position and the sound collecting position.
本公开实施例还提供一种终端设备,所述终端设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:Embodiments of the present disclosure also provide a terminal device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer The program implements the following steps:
获取声音采集的音频数据,根据所述音频数据定位会场内发言的音视频源位置;Obtaining audio data collected by the sound, and positioning the audio and video source position of the speaking in the venue according to the audio data;
根据所述音视频源位置,移动音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置;And moving the position of the audio and video collection device to the sound collection position that satisfies the preset condition of the sound collection according to the position of the audio and video source;
根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。And moving the position of the audio and video collection device to an image collection position that satisfies an image acquisition preset condition according to the audio and video source position and the sound collection position.
本公开实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述会议电视的音视频采集方法方法。The embodiment of the present disclosure further provides a computer readable storage medium storing computer executable instructions, which are implemented to implement the audio and video collection method method of the conference television.
本公开实施例提供的一种会议电视的音视频采集方法、装置和终端设备,通过获取声音采集的音频数据,根据所述音频数据定位会场内发言的音视频 源位置;根据所述音视频源位置,移动音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置;根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。本公开实施例提供的会议电视的音视频采集方法,相比本领域已知技术中会议电视的音视频采集方法,基于声音识别,通过声音采集、图像采集及通过无人技术实现的音视频采集设备的移动,只要在布置会议电视系统时进行一次配置,以后根据对会议发言人的声音采集来定位发言人的位置,实现音视频采集设备自动靠近发言人,无须参会人员的过多人工干预,自动调节移动到满足声音采集预设条件的声音采集位置和满足图像采集预设条件的图像采集位置,实现会议电视的音视频采集位置的自动调整,达到会议电视的音视频采集的预设效果,减少了会议电视的人力成本,提高了电视会议的效率。An audio and video collection method, apparatus, and terminal device for a conference television according to an embodiment of the present disclosure, by acquiring audio data collected by a sound, and locating an audio and video source position of a conference in the conference according to the audio data; according to the audio and video source Positioning, moving the position of the audio and video collection device to a sound collection position satisfying the sound collection preset condition; moving the position of the audio and video collection device to meet the image collection according to the audio and video source position and the sound collection position The image acquisition position of the preset condition. The audio and video collection method of the conference television provided by the embodiment of the present disclosure is based on sound recognition, sound collection, image acquisition, and audio and video collection through unmanned technology, compared to audio and video collection methods of conference television in the prior art. The movement of the device is configured once during the deployment of the conference television system. Later, the location of the speaker is located according to the voice collection of the conference speaker, so that the audio and video collection device is automatically approached to the speaker without excessive manual intervention by the participants. Automatically adjust to the sound collection position that satisfies the sound collection preset condition and the image acquisition position that satisfies the image acquisition preset condition, and realize the automatic adjustment of the audio and video collection position of the conference television to achieve the preset effect of the audio and video collection of the conference television. , reducing the labor cost of conference TV and improving the efficiency of video conferencing.
在有益效果之后添加套话“在阅读并理解了附图和详细描述后,可以明白其他方面。Adding a clue after the beneficial effects "Other aspects can be understood after reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1为本公开实施例提供的一种会议电视的音视频采集方法的流程图;FIG. 1 is a flowchart of a method for collecting audio and video of a conference television according to an embodiment of the present disclosure;
图2(a)至图2(d)为本公开可选实施例提供的应用于小型会议的会议电视的音视频采集方法的音视频采集示意图;2(a) to 2(d) are schematic diagrams of audio and video collection of an audio and video collection method for a conference television for a small conference according to an alternative embodiment of the present disclosure;
图3(a)至图3(f)为本公开可选实施例提供的应用于大型会议的会议电视的音视频采集方法的音视频采集示意图;3(a) to 3(f) are schematic diagrams of audio and video collection of an audio and video collection method for a conference television for a large conference according to an alternative embodiment of the present disclosure;
图4为本公开实施例提供的一种会议电视的音视频采集装置的程序模块架构图。FIG. 4 is a block diagram of a program module of an audio and video collection device for a conference television according to an embodiment of the present disclosure.
本公开的较佳实施方式Preferred embodiment of the present disclosure
下面结合附图对本公开的实施方式进行描述。Embodiments of the present disclosure will be described below with reference to the accompanying drawings.
本公开实施例提供的一种会议电视的音视频采集方法可应用于会话电视系统,所述会议电视系统大致可包括:An audio and video collection method for a conference television provided by an embodiment of the present disclosure may be applied to a conversation television system, and the conference television system may include:
(1)显示设备。所述显示设备设置为:在视频会议场内本端或场外远端 显示会场音视频,如显示屏幕等显示器;(1) Display device. The display device is configured to: display a site audio and video, such as a display screen, etc., on the local end or the off-site remote end in the video conference field;
(2)视频终端设备。所述视频终端设备设置为:接收并输出音视频采集设备采集的会议电视的音视频,及控制和管理音视频采集设备移动,比如计算机主机等;(2) Video terminal equipment. The video terminal device is configured to: receive and output audio and video of the conference television collected by the audio and video collection device, and control and manage the movement of the audio and video collection device, such as a computer host;
(3)可移动的音视频采集设备。所述音视频采集设备上搭载视频和音频采集设备,可采用无人移动技术实现的音视频采集设备,比如和无人机原理相同或相似的移动音视频采集设备,所述音视频采集设备可以悬浮在半空中,在空中移动。(3) Mobile audio and video collection equipment. The audio and video capture device is equipped with a video and audio capture device, and the audio and video capture device can be implemented by using an unmanned mobile technology, such as a mobile audio and video capture device having the same or similar principle as the drone, and the audio and video capture device can be Suspended in midair and moved in the air.
其中,所述可移动的音视频采集设备可通过无线保真(WIFI,Wireless Fidelity)或蓝牙等无线技术与视频终端设备连接,所述可移动的音视频采集设备采集的音视频图像数据可通过WIFI或蓝牙等技术传送给所述视频终端设备。The mobile audio and video capture device can be connected to the video terminal device by using a wireless technology such as Wireless Fidelity (WIFI) or Bluetooth, and the audio and video image data collected by the movable audio and video capture device can pass through. Techniques such as WIFI or Bluetooth are transmitted to the video terminal device.
所述音视频采集设备可通过搭载视频和音频采集设备采集会议现场的声音数据和视频图像数据,并发送给所述视频终端设备,所述视频终端设备可接收所述音视频采集设备采集的会场音频数据,并对所述视频和音频数据处理后,通过显示设备显示出来。The audio and video collection device can collect the sound data and the video image data of the conference site by using the video and audio collection device, and send the video data to the video terminal device, where the video terminal device can receive the venue collected by the audio and video collection device. The audio data is processed and displayed by the display device after the video and audio data are processed.
请参阅图1,图1为本公开实施例提供的一种会议电视的音视频采集方法的流程图,所述方法可包括:Referring to FIG. 1 , FIG. 1 is a flowchart of a method for collecting audio and video of a conference television according to an embodiment of the present disclosure, where the method may include:
S100、获取音视频采集设备进行声音采集得到的音频数据,根据所述音频数据定位会场内发言的音视频源位置。S100: Acquire audio data obtained by the audio and video collection device for sound collection, and locate an audio and video source position of the speaking in the venue according to the audio data.
可选地,步骤S100之前还包括:Optionally, before step S100, the method further includes:
会议开始时,移动所述音视频采集设备的自身位置到会场的预设初始位置。At the beginning of the conference, the position of the audio and video capture device is moved to a preset initial position of the conference site.
可选地,在布置会议电视系统时,预先配置所述音视频采集设备在会场内的预设初始位置。Optionally, when the conference television system is disposed, the preset initial position of the audio and video collection device in the conference site is pre-configured.
对于只需要一个所述音视频采集设备的小型会场,一般情况下,默认搭载视频和音频采集设备的音视频采集设备可配置在每个会场的中央位置,即所述音视频采集设备设置在会场的中间,当会场内有会议桌时,由于会议室的会议桌一般可放在会场的中间,因此,可以会议桌为参照坐标,所述音视频采集设备可设置在会议桌的上方中间。For a small conference site that requires only one of the audio and video capture devices, the audio and video capture device that is equipped with the video and audio capture device can be configured in the central location of each site, that is, the audio and video capture device is set in the conference site. In the middle, when there is a conference table in the conference hall, since the conference table of the conference room can generally be placed in the middle of the conference venue, the conference table can be used as a reference coordinate, and the audio and video collection device can be disposed in the middle of the conference table.
对于大型会场,由于一个所述音视频采集设备只能覆盖一定的范围,比如每个所述音视频采集设备可以覆盖的直径为6米的范围,即覆盖半径为3米的范围,可需要多台所述音视频采集设备,根据会场大小布置合适数量的所述音视频采集设备,使所述音视频采集设备覆盖整个会场。可从某一个位置开始,比如从靠近主席台的位置开始,以预设的直径范围,比如每6米左右,一个一个设置所述音视频采集设备,依次布置,直至覆盖整个会场。For a large conference site, since one of the audio and video capture devices can only cover a certain range, for example, each of the audio and video capture devices can cover a range of 6 meters in diameter, that is, a coverage radius of 3 meters, which may require more The audio and video collection device of the station is configured to arrange an appropriate number of the audio and video collection devices according to the size of the site, so that the audio and video collection device covers the entire site. Starting from a certain position, for example, starting from a position close to the podium, the audio and video collection devices are arranged one by one with a preset diameter range, for example, every 6 meters, and are arranged in order until the entire site is covered.
当不进行电视会议时,所述音视频采集设备可根据会场情况放置在会场内的适当位置。尤其是大型会场具有多个所述音视频采集设备时,当不进行电视会议时,多个所述音视频采集设备可以集中放置在会场内的某个适当位置。When the video conference is not performed, the audio and video collection device can be placed in an appropriate position in the conference site according to the site situation. In particular, when a large conference site has a plurality of the audio and video collection devices, when the video conference is not performed, the plurality of audio and video collection devices may be placed in a proper position in the conference site.
当电视会议开始时,会议电视系统可启动,开启终端设备,所述音视频采集设备可自动移动到会场内的预设初始位置。比如,只有一个所述音视频采集设备的小型会场,所述音视频采集设备接收到会议电视系统启动的指令后,可触发移动到会议桌上方中间位置的初始指令,使所述音视频采集设备启动,悬浮在会议桌上方中间位置的半空中;当是布置有多个所述音视频采集设备的大型会场时,大型会场会议电视终端开启后,则每个所述音视频采集设备启动后,各自可移动到布置会议电视系统时配置的相应会场预设初始位置,使整个会场都处于所述音视频采集设备覆盖范围内。When the video conference starts, the conference television system can be activated to enable the terminal device, and the audio and video capture device can automatically move to a preset initial position in the conference site. For example, there is only one small conference site of the audio and video capture device, and after receiving the instruction of the conference television system, the audio and video capture device may trigger an initial instruction to move to the middle position of the conference table, so that the audio and video capture device is enabled. Startup, suspended in the middle of the conference table, in the middle of the conference table; when a large conference venue with a plurality of the audio and video collection devices is arranged, after the large conference venue television terminal is turned on, after each of the audio and video collection devices is activated, Each of the venues can be moved to the preset location of the corresponding site configured when the conference television system is deployed, so that the entire conference site is within the coverage of the audio and video collection device.
由于所述音视频采集设备上搭载视频和音频采集设备,当会场内有人发言时,所述音视频采集设备可根据声音开始采集音频数据,所述音视频采集设备获取所述音频数据后,可根据所述音频数据定位发言的音视频源位置,即声音定位。The audio and video capture device can start collecting audio data according to the sound, and the audio and video capture device can obtain the audio data, and the audio and video capture device can obtain the audio data. Positioning the audio and video source of the speech, that is, sound localization, according to the audio data.
S200、根据所述音视频源位置,移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置。S200: Move a position of the audio and video collection device to a sound collection position that satisfies a sound collection preset condition according to the audio and video source position.
可选地,所述音视频采集设备根据获取到的音频数据,通过声音定位检测到会场中发言人的位置后,所述音视频采集设备可自动移动到距离发言位置的满足声音采集预设条件的声音采集预设位置,使所述音视频采集设备靠近发言位置,从而使所述音视频采集设备处于声音采集的较佳位置。所述声音采集预设位置一般可为距离发言位置为1至1.5米的一个弧形区域,即定位发言位置后,所述音视频采集设备可自动移动到处于靠近发言位置的1至 1.5米的一个弧形区域内。Optionally, after the audio and video collection device detects the location of the speaker in the conference site by using sound positioning, the audio and video collection device may automatically move to the position of the speaking position to satisfy the sound collection preset condition. The sound captures the preset position, so that the audio and video capture device is close to the speaking position, so that the audio and video capture device is in a better position for sound collection. The sound collection preset position may generally be an arc-shaped area from a speaking position of 1 to 1.5 meters, that is, after positioning the speaking position, the audio-video collecting device can automatically move to a position of 1 to 1.5 meters close to the speaking position. Inside an arc.
比如在小型会场,由于所述音视频采集设备可移动,一般只需要一个所述音视频采集设备即可满足需求,通常情况下,小型会场的会议桌长度可小于5米,而一个所述音视频采集设备可以覆盖的直径为6米,虽然会议桌在所述音视频采集设备覆盖范围内,但采集的音视频效果未必是最好的,因此可根据所述发言位置动态调整所述音视频采集设备,使所述音视频采集设备处在发言人合适的位置,从而使所述音视频采集设备处于音视频采集的预设位置,使所述音视频采集设备采集的音视频达到预设的清晰效果,所述清晰效果一般通过采集位置的移动就可以实现。For example, in a small conference site, since the audio and video collection device can be moved, generally only one of the audio and video collection devices is required to meet the demand. Generally, the conference table of the small conference site can be less than 5 meters in length, and one of the sounds is The video capture device can cover a diameter of 6 meters. Although the conference table is within the coverage of the audio and video capture device, the collected audio and video effects are not necessarily the best, so the audio and video can be dynamically adjusted according to the speaking position. Collecting the device so that the audio and video capture device is in a suitable position of the speaker, so that the audio and video capture device is in a preset position for audio and video capture, so that the audio and video collected by the audio and video capture device reaches a preset state. Clear effect, the clear effect is generally achieved by the movement of the acquisition position.
所述音视频采集设备采集会场音频数据后,可分析所述音频数据,判断所述音频数据是否小于某一预设音频数据阈值,同时可判断发言位置的距离和判断发言位置的方向,当判断所述音频数据小于某一预设音频阈值时,可确定所述音视频采集设备距离发言人太远,导致采集的音频数据声音太小,可向发言方向移动所述音视频采集设备的位置,靠近发言人。After the audio and video collection device collects the audio data of the conference site, the audio data may be analyzed to determine whether the audio data is smaller than a preset audio data threshold, and the distance of the speaking position and the direction of the speaking position may be determined. When the audio data is smaller than a certain preset audio threshold, the audio and video collection device may be determined to be too far away from the speaker, so that the collected audio data is too small, and the position of the audio and video collection device may be moved in the speaking direction. Close to the speaker.
当判断需要调整所述音视频采集设备的位置时,可通过所述音视频采集设备的位置移动,使所述音视频采集设备采集到的音频达到音频预设条件。When it is determined that the position of the audio and video collection device needs to be adjusted, the audio collected by the audio and video collection device can be brought to an audio preset condition by using the position of the audio and video collection device.
当分析获取的会议现场发言人的声音小于某一预设声音分贝阈值时,比如会场有两个声音源A和B,所述音视频采集设备可通过声音检测到当前声音源A小于600HZ(赫兹),同时有其它声音源B大于700HZ和25分贝以上,自调教系统可锁定当前声音源B的位置,所述音视频采集设备可自动移动到声音源B所确定的位置,使所述音视频采集设备采集到的音频达到音频预设条件。所述音频预设条件可以指音频分贝高于某一预设阈值,从而使采集的会议现场声音足够清晰。When the voice of the conference site speaker obtained by the analysis is less than a preset sound decibel threshold, for example, the site has two sound sources A and B, and the audio and video collection device can detect that the current sound source A is less than 600 Hz (hertz) ), while other sound sources B are greater than 700 Hz and 25 decibels or more, the self-tuning system can lock the position of the current sound source B, and the audio and video capture device can automatically move to the position determined by the sound source B, so that the audio and video The audio collected by the acquisition device reaches the audio preset condition. The audio preset condition may mean that the audio decibel is higher than a certain preset threshold, so that the collected conference site sound is sufficiently clear.
所述音视频采集设备在判断不需要调整进行音视频采集的位置时,即可默认当前位置为进行音视频采集中声音采集的最佳预设位置。When the audio and video collection device determines that the position for audio and video collection is not required to be adjusted, the current location may be the default preset position for sound collection in the audio and video collection.
S300、根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。S300. Move the position of the audio and video collection device to an image collection location that satisfies an image acquisition preset condition according to the audio and video source location and the sound collection location.
在所述音视频采集设备移动到进行声音采集的预设位置后,满足了声音采集效果的要求,同时所述音视频采集设备可获取会场的视频图像数据,对所述视频图像数据进行处理,并将处理后的所述视频图像数据通过显示设备 显示出来。After the audio and video collection device moves to a preset position for performing sound collection, the sound collection effect is met, and the audio and video collection device can acquire video image data of the site, and process the video image data. And processing the processed video image data through a display device.
所述音视频采集设备可根据当前声音采集的预设位置,获取会场的视频图像数据,判断所述音视频采集设备在声音采集预设位置采集的视频图像是否满足预设的视频图像预设条件,即采集的视频图像是否足够清晰。当判断采集的视频图像不能满足预设的视频预设条件时,所述音视频采集设备可根据所述发言位置和所述声音采集位置,判断需要调整的所述音视频采集设备的满足图像采集预设条件的图像采集位置,并移动到所述图像采集位置,通过所述音视频采集设备的位置移动,使所述音视频采集设备采集到的视频图像达到视频图像采集的预设条件;当判断采集的视频图像满足预设的视频预设条件时,所述音视频采集设备可默认当前位置为所述音视频采集设备的视频图像采集预设位置。The audio and video capture device can obtain the video image data of the site according to the preset position of the current sound collection, and determine whether the video image captured by the audio and video capture device at the sound capture preset position satisfies a preset video image preset condition. , that is, whether the captured video image is clear enough. When it is determined that the captured video image cannot meet the preset video preset condition, the audio and video collection device may determine, according to the speaking position and the sound collection location, that the audio and video collection device that needs to be adjusted satisfies the image collection. Presetting the image capturing position of the condition and moving to the image capturing position, and moving the position of the audio and video collecting device to make the video image collected by the audio and video collecting device reach a preset condition for video image capturing; When the captured video image meets the preset video preset condition, the audio and video capture device may default to the current position as the video image capture preset position of the audio and video capture device.
可选地,当所述音视频采集设备判断需要调整视频图像采集的位置时,比如分析获取的会议现场发言人的视频图像清晰度或分辨率小于某一预设视频图像阈值时,可通过视频自调教系统,确定视频图像采集预设位置,使所述音视频采集设备的位置在声音采集的预设范围内移动,移动到视频图像采集距离发言位置的预设视频采集点,靠近发言人,使所述音视频采集设备采集到的视频达到视频图像采集的预设条件,所述视频图像采集的预设条件可以指视频清晰度高于某一预设阈值或视频分辨率高于某一预设阈值,比如采集的人脸画面处在视频画面的正中间的位置,从而使采集的会议现场声音足够清晰。Optionally, when the audio and video collection device determines that the location of the video image collection needs to be adjusted, for example, if the video image resolution or resolution of the obtained conference site speaker is less than a certain preset video image threshold, the video may be passed through the video. The self-adjusting system determines the preset position of the video image acquisition, so that the position of the audio and video collection device moves within the preset range of the sound collection, and moves to a preset video collection point of the video image collection distance speaking position, close to the speaker, The video captured by the audio and video capture device reaches a preset condition for video image acquisition, and the preset condition of the video image capture may be that the video resolution is higher than a certain preset threshold or the video resolution is higher than a certain preset. Set the threshold, for example, the captured face image is in the middle of the video screen, so that the collected conference scene sound is clear enough.
其中,所述视频自调教系统,可以是指所述音视频采集设备根据发言人的声音大小(比如大于700HZ和25分贝以上)确定当前圆弧形位置范围,同时根据视频采集传感器确定人脸图像最佳原则(脸部在图像2/5至3/5中间位置,鼻子在中间轴上)来确定当前圆弧线上最佳一点位置,及根据所确认的最佳一点位置和接收的会议现场发言人的当前视频图像来获取所述音视频采集设备移动方向及移动距离。The video self-adjusting system may be that the audio and video collection device determines the current arc-shaped position range according to the speaker's sound size (for example, greater than 700 Hz and 25 decibels or more), and determines the face image according to the video acquisition sensor. The best principle (face in the middle of the image 2/5 to 3/5, nose on the intermediate axis) to determine the best point on the current arc line, and according to the best point confirmed and the meeting site received The current video image of the speaker is used to obtain the moving direction and moving distance of the audio and video collecting device.
所述音视频采集设备在判断不需要调整音视频采集设备的图像采集位置时,即可默认当前位置为发言位置的视频图像采集预设最佳位置。When the audio and video collection device determines that it is not necessary to adjust the image collection position of the audio and video collection device, the audio image acquisition device that presets the current position is the preset optimal position for the video image of the speaking position.
可选地,所述会议开始时,移动所述音视频采集设备的自身位置到会场的预设初始位置的步骤之后,还包括:Optionally, after the step of moving the position of the audio and video collection device to the preset initial position of the site, the method further includes:
当会议是按照发言顺序采集音视频时,接收发言的音视频源位置,根据所述音视频源位置移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置和图像采集预设条件的图像采集位置。When the conference collects audio and video in the order of speaking, the audio and video source position of the speech is received, and the position of the audio and video collection device is moved according to the position of the audio and video source to the sound collection position and image collection that meet the preset conditions of the sound collection. The image acquisition position of the preset condition.
由于行政会议一般是按照固定的发言顺序和发言时间进行的,即每个发言位置的发言顺序和发言时间都是预先设置的固定的顺序和时间,因此电视会议是行政会议时,电视会议的音视频采集也可以是按照发言顺序采集音视频的,根据预设发言位置顺序和预设的每个发言位置的发言时间,电视会议系统可配置按照发言人顺序自动识别声音和图像的位置,所述视频终端设备可根据当前发言人预先下发会议发言人的位置,所述音视频采集设备可根据所述视频终端设备下发的坐标移动位置,即所述视频终端设备可直接发送移动所述音视频采集设备到当前发言位置的指令,使所述音视频采集设备在预置的时间移动到对应的发言位置。Since the executive meeting is generally carried out according to a fixed order of speech and speaking time, that is, the speaking order and the speaking time of each speaking position are preset in a fixed order and time, so the video conference is the voice of the video conference during the executive meeting. The video capture may also be to collect audio and video according to the order of speaking. According to the preset speaking position order and the preset speaking time of each speaking position, the video conference system may be configured to automatically recognize the position of the sound and the image according to the speaker sequence. The video terminal device may pre-deliver the location of the conference speaker according to the current speaker, and the audio and video collection device may move the location according to the coordinates sent by the video terminal device, that is, the video terminal device may directly send the sound to the voice. The instruction of the video capture device to the current speaking position causes the audio and video collection device to move to the corresponding speaking position at a preset time.
可选地,所述根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置的步骤之后,还包括:Optionally, after the step of moving the position of the audio and video collection device to the image collection location that meets the image acquisition preset condition, according to the audio and video source location and the sound collection location, the method further includes:
当在预设时间阈值内未接收到所述音频数据时,移动所述音视频采集设备的自身位置回到所述预设初始位置。When the audio data is not received within the preset time threshold, the position of the audio and video capture device is moved back to the preset initial position.
可选地,当在预设时间阈值内,比如30秒内,未接收到所述音视频采集设备的所述音频数据时,可判断当前发言人发言完毕,此时所述音视频采集设备可移动回到所述预设初始位置。比如只有一个所述音视频采集设备的小型会场,当本次发言人发言结束,所述音视频采集设备可自动移动到会场的中间位置,亦即会议桌上方的中间位置,方便当有其他人发言时,所述音视频采集设备可以就近地移动到发言位置附近进行音视频的采集。Optionally, when the audio data of the audio and video collection device is not received within a preset time threshold, for example, within 30 seconds, the current speaker may be judged to be finished, and the audio and video collection device may be Move back to the preset initial position. For example, if there is only one small conference site of the audio and video capture device, when the speaker's speech ends, the audio and video capture device can automatically move to the middle position of the conference site, that is, the middle position of the conference table, so that when there are other people When speaking, the audio and video collection device can move to the vicinity of the speaking position to perform audio and video collection.
由于本公开实施例是采取可移动音视频采集设备,所述音视频采集设备可悬在半空中,因此所述音视频采集设备可移动到会场的上方中间位置。Since the embodiment of the present disclosure adopts a movable audio and video collection device, the audio and video collection device can be suspended in midair, so the audio and video collection device can be moved to an upper middle position of the venue.
可选地,当电视会议结束时,所述音视频采集设备可接收到电视会议系统关闭指令,所述音视频采集设备可自动移动到会场的预先指定位置,放置在会场内的一个位置上,尤其是大型会场具有多个所述音视频采集设备时,当不进行电视会议时,多个所述音视频采集设备可集中放置在会场内的空闲处。Optionally, when the video conference ends, the audio and video capture device may receive a video conference system shutdown command, and the audio and video capture device may automatically move to a pre-designated location of the conference site and place it at a location in the conference site. In particular, when a large conference site has a plurality of the audio and video collection devices, when the video conference is not performed, the plurality of audio and video collection devices can be placed in a free place in the conference site.
可选地,所述根据所述音视频源位置,移动到满足声音采集预设条件的声音采集位置的步骤,还包括:Optionally, the step of moving to a sound collection location that meets a sound collection preset condition according to the audio and video source location further includes:
当检测到至少有两人同时发言时,根据同时发言的每个发言的音视频源位置,以同时发言的所述音视频源位置为顶点形成几何图形,以相对于所述几何图形中心的预设阈值范围内的位置为目标位置,移动所述音视频采集设备的自身位置到所述目标位置。When it is detected that at least two people are speaking at the same time, according to the audio and video source position of each speech of the simultaneous speech, the geometrical image is formed with the audio and video source position of the simultaneous speech as a vertex to be relative to the geometric center The position within the threshold range is set as the target position, and the position of the audio and video collection device is moved to the target position.
可选地,当检测到至少有两人同时发言时,可根据同时发言的每个发言位置,获取一个目标位置,所述目标位置可同时兼顾发言的所述每个发言位置,并可移动到所述目标位置。Optionally, when it is detected that at least two people speak at the same time, a target location may be acquired according to each speaking position of the simultaneous speaking, and the target location may simultaneously take into account each of the speaking positions of the speaking, and may move to The target location.
特定地,当检测到至少有两人发言时,可根据声音定位发言人的位置,根据所述发言人的位置确定一个目标位置,所述目标位置可同时兼顾发言的所述每个发言位置,并可移动到所述目标位置,使所述音视频装置采集音视频时能够兼顾每个所述发言人。Specifically, when it is detected that at least two people speak, the position of the speaker may be located according to the sound, and a target position may be determined according to the position of the speaker, and the target position may simultaneously take into account each of the speaking positions of the speaking. And moving to the target location, so that the audio and video device can take care of each of the speakers when collecting audio and video.
比如,如果发言人正在发言时,本次发言被其他人打断,或者至少需要两个人轮流交互式发言时,所述音视频采集设备可接收到至少两个人的音频数据,此时,所述音视频采集设备可根据声音定位发言人的位置,根据所述发言人的位置确定一个中间位置,所述音视频采集设备可自动移动到所述发言人中间的预设采集位置,所述发言人中间的预设采集位置,可选地,是相对于以发言人为顶点,发言人之间的连线为边形成的多边形中心的预设阈值范围内,即以同时发言的所述音视频源位置为顶点形成几何图形,以相对于所述几何图形中心的预设阈值内的位置为目标位置,移动所述音视频采集设备的自身位置到所述目标位置,比如,当有两个人同时发言时,则为相对于两人连线中点的预设阈值范围内,如果是三个人同时发言,则为相对于三角形的中心的预设阈值范围内,由于中心是一个点,而音视频采集位置为一个范围,所以在相对于几何图形中心的预设阈值范围内的位置,都是可以作为音视频采集的位置的。For example, if the speaker is speaking, the speech is interrupted by another person, or at least two people need to take an interactive speech, the audio and video collection device can receive audio data of at least two people. The audio and video collection device may determine an intermediate position according to the location of the speaker according to the position of the sound, and the audio and video collection device may automatically move to a preset collection position among the speakers, the speaker The preset position of the intermediate acquisition is optionally within a preset threshold range of the center of the polygon formed by the edge of the speaker with the speaker as the apex, that is, the position of the audio and video source simultaneously speaking. Forming a geometry for the vertices, moving the position of the audio-video capture device to the target position with the position within the preset threshold relative to the center of the geometric shape as a target position, for example, when two people speak at the same time , which is within a preset threshold range relative to the midpoint of the two people. If three people speak at the same time, it is relative to the center of the triangle. Disposed within the threshold range, since the center is a point, and the audio and video capture position of a range, with respect to the geometry within a preset threshold range of the center position, as is the position of the audio and video capture.
可选地,所述根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置的步骤之后,还包括:Optionally, after the step of moving the position of the audio and video collection device to the image collection location that meets the image acquisition preset condition, according to the audio and video source location and the sound collection location, the method further includes:
接收调整音视频采集位置的指令,根据所述指令移动所述音视频采集设 备的音视频采集位置。Receiving an instruction to adjust an audio and video collection position, and moving the audio and video collection position of the audio and video collection device according to the instruction.
当所述音视频采集设备接收到调整音视频采集位置的指令时,所述音视频采集设备可按照所述指令移动位置,使所述音视频采集设备采集到的音视频达到音频预设条件和视频图像预设条件。When the audio and video collection device receives an instruction to adjust an audio and video collection position, the audio and video collection device may move the position according to the instruction, so that the audio and video collected by the audio and video collection device reaches an audio preset condition and Video image preset condition.
比如当电视会议通过本端和远端显示时,如果本端或远端发现视频或音频不够清晰,本端和远端可以通过红外遥控控制所述音视频采集设备,本端或远端可向所述音视频采集设备发送进行调整的指令,当所述音视频采集设备接收到调整位置的指令时,所述音视频采集设备可根据所述指令移动位置,使所述音视频采集设备采集到的音频达到音频预设条件或视频图像达到视频图像预设条件。For example, when the video conference is displayed on the local end or the remote end, if the local or remote end finds that the video or audio is not clear enough, the local end and the remote end can control the audio and video capture device by using an infrared remote control. The audio and video collection device sends an instruction for adjusting, and when the audio and video collection device receives the instruction for adjusting the position, the audio and video collection device may move the position according to the instruction, so that the audio and video collection device collects The audio reaches the audio preset condition or the video image reaches the video image preset condition.
可选地,还包括:当包括至少两个所述音视频采集设备时,选取其中一个音视频采集设备为主音视频采集设备,通过主音视频采集设备控制会场内其他音视频采集设备。Optionally, the method further includes: when at least two of the audio and video collection devices are included, one of the audio and video collection devices is selected as the main audio and video collection device, and the other audio and video collection devices in the conference are controlled by the main audio and video collection device.
当包括至少两个所述音视频采集设备时,可选取其中一个音视频采集设备为主音视频采集设备,其他音视频采集设备为从音视频采集设备,通过所述主音视频采集设备协调从音视频采集设备,其中,所述主音视频采集设备可以为采集会场音视频和协调所述从音视频采集设备的音视频采集设备,其他负责会场视频和音频采集位置优化乃至最优的音视频采集设备可以为从音视频采集设备。When at least two of the audio and video collection devices are included, one of the audio and video capture devices may be selected as the main audio and video capture device, and the other audio and video capture devices are the audio and video capture devices, and the audio and video capture device coordinates the audio and video. The collecting device, wherein the main audio and video collecting device can collect the audio and video of the venue and coordinate the audio and video collecting device of the audio and video collecting device, and other audio and video collecting devices that are responsible for optimizing the video and audio collection position of the venue and even the optimal audio and video capturing device can For collecting audio and video equipment.
一般情况下,靠近主席台的所述音视频采集设备可为主,即主音视频采集设备,其他的所述音视频采集设备可为从音视频采集设备,主音视频采集设备可控制其他所述从音视频采集设备到达采集范围并切换设备传输到会议电视终端上,靠近发言人的所述从音视频采集设备调整采集的音视频效果后可把采集的音视频通过所述主音视频采集设备传送到会议电视终端。In general, the audio and video capture device near the rostrum can be the main, that is, the main audio and video capture device, and the other audio and video capture devices can be the audio and video capture device, and the main audio and video capture device can control other such slaves. The audio and video collection device reaches the collection range and the switching device transmits to the conference television terminal. After the audio and video collection device is adjusted from the audio and video collection device, the collected audio and video can be transmitted to the main audio and video collection device through the speaker. Conference TV terminal.
可选地,在大型会场,当有多个所述音视频采集设备时,可通过设置多个所述音视频采集设备中的一个所述音视频采集设备为主音视频采集设备,其他音视频采集设备为从音视频采集设备,通过所述主音视频采集设备来实现所述从音视频采集设备和所述视频终端设备之间的数据交流,可选地,设置靠近主席台的可为主音视频采集设备,其他的可为从音视频采集设备。Optionally, in a large conference site, when there are multiple audio and video collection devices, one of the plurality of audio and video collection devices may be set as the main audio and video collection device, and other audio and video collection devices may be used. The device is an audio and video collection device, and the data communication between the audio and video collection device and the video terminal device is implemented by the primary audio and video collection device. Optionally, the main audio and video collection is set close to the podium. Equipment, others can be from audio and video capture equipment.
可选地,大型会场会议电视终端开启后,可通过所述主视音频采集设备 下发所述从视音频采集设备在会场内的预设初始位置,即预置位置,所述从音视频采集设备可根据所述主音视频采集设备下发的位置移动到各自在会场内的预设初始位置,即预置位置。Optionally, after the large-scale conference video terminal is turned on, the preset video, the preset position, that is, the preset position, the preset position, the audio and video collection, may be sent by the video and audio collection device. The device can be moved to a preset initial position, that is, a preset position, in the conference site according to the location delivered by the main audio and video capture device.
可选地,大型会场可通过一个所述主音视频采集设备协调多个所述从音视频采集设备,所述从音视频采集设备可负责视频和音频采集位置的优化乃至最优,并服从所述主音视频采集设备最终下发的位置。所述主音视频采集设备可以选择单个或多个所述从音视频采集设备音视频合成一个音视频源作为输入,声音只选择当前发言人为声音源。Optionally, the large conference site may coordinate a plurality of the slave audio and video collection devices by using the primary audio and video capture device, where the slave audio and video capture device may be responsible for optimizing or optimizing the video and audio collection locations, and obeying the The final location of the main audio and video capture device. The main audio and video capture device may select a single or a plurality of the audio and video capture devices to synthesize an audio and video source as an input, and select only the current speaker as the sound source.
比如所述视频终端设备接收到采集的音视频图像数据后进行处理,如果本端或远端发现视频或音频不够清晰,可反馈所述音视频采集设备进行调整,当所述视频终端设备接收到调整所述音视频采集设备位置的指令时,如果是在大型会场,所述视频终端设备可先发给所述主音视频采集设备,然后可通过所述主音视频采集设备调节所述从音视频采集设备的位置。For example, the video terminal device receives the collected audio and video image data and processes it. If the local or remote end finds that the video or audio is not clear enough, the audio and video collection device may be fed back to adjust, when the video terminal device receives the video terminal device. When the instruction for adjusting the position of the audio and video collection device is adjusted, if the video terminal device is in a large conference site, the video terminal device may be sent to the primary audio and video capture device, and then the audio and video capture device may be adjusted by the primary audio and video capture device. The location of the device.
可选地,当在预设时间阈值内未接收到所述音频数据时,可移动所述音视频采集设备的自身位置回到所述预设初始位置。如果是在大型会场,可由主音视频采集设备统一调度从音视频采集设备进行位置的移动。Optionally, when the audio data is not received within a preset time threshold, the position of the audio and video collection device can be moved back to the preset initial position. If it is in a large conference site, the position of the audio and video collection device can be uniformly scheduled by the main audio and video collection device.
可选地,当本端或远端发送控制所述从音视频采集设备的指令时,也是所述视频终端设备先发给所述主音视频采集设备,然后可通过所述主音视频采集设备控制所述从音视频采集设备。Optionally, when the local end or the remote end sends an instruction to control the slave audio and video capture device, the video terminal device is also sent to the primary audio and video capture device, and then the control device can be controlled by the primary audio and video capture device. Described from audio and video collection equipment.
在大型会场,如果本次发言人发言结束,所述从音视频采集设备可移动到会场内对应的所述预设初始位置,也是可由所述主音视频采集设备统一调度所述从音视频采集设备。In the large-scale conference, if the speaker's speech ends, the audio-video collection device may be moved to the preset initial position corresponding to the conference site, and the slave audio-video capture device may be uniformly scheduled by the master audio-video collection device. .
在大型会场,如果本次发言被其他人打断,可通过所述从音视频采集设备把采集到位置坐标和声音的信息发送给所述主音视频采集设备,由所述主音视频采集设备对所述从音视频采集设备采集的信息进行处理后,根据处理结果分别调度所述从音视频采集设备。In a large conference site, if the speech is interrupted by another person, the information collected by the audio and video collection device to collect the position coordinates and the sound may be sent to the main audio and video collection device by the main audio and video collection device. After the information collected by the audio and video collection device is processed, the slave audio and video collection device is separately scheduled according to the processing result.
在大型会场,当会议配置是按照发言人顺序识别声音图像位置时,所述视频终端设备可根据当前发言人,预先下发会议发言人的位置给所述主音视频采集设备,所述主音视频采集设备可根据所述视频终端设备下发的坐标控制自己的位置或其它一个或多个所述从音视频采集设备移动位置,并可选择 单个或多个所述音视频采集设备采集的音视频,合成一个音视频源作为输入。In a large conference site, when the conference configuration is to identify the location of the sound image in the order of the speaker, the video terminal device may pre-deliver the location of the conference speaker to the primary audio and video collection device according to the current speaker. The device may control its own location or other one or more of the slave audio and video capture device mobile locations according to coordinates sent by the video terminal device, and may select audio and video collected by a single or multiple of the audio and video capture devices. Combine an audio and video source as input.
下面以可选应用实施例来详细描述上述过程。The above process is described in detail below with an optional application embodiment.
在会议开始前,可根据会场的大小预先配置一个或多个自动音视频采集设备。所述会议电视系统终端开机后,可根据预先配置好的自动音视频采集设备个数预置一个采集位置,如图2(a)和图3(a),下面分别描述。Before the conference starts, one or more automatic audio and video capture devices can be pre-configured according to the size of the venue. After the conference television system terminal is powered on, a collection position may be preset according to the number of pre-configured automatic audio and video acquisition devices, as shown in FIG. 2(a) and FIG. 3(a), respectively.
(一)小型会场场景(1) Small venue scene
请参阅图2(a)至图2(d),图2(a)至图2(d)为单音视频采集设备的小型会议采集示意图。Please refer to FIG. 2(a) to FIG. 2(d). FIG. 2(a) to FIG. 2(d) are schematic diagrams of small conference acquisition of a monophonic video capture device.
1)请参阅图2(a),会议开始,所述音视频采集设备可预置在会场中间位置,当会场本地有人1发言,可通过声音定位到发言人1的位置,所述音视频采集设备靠近发言人1,同时可调教图像达到优化乃至最优,然后把图像发送给所述视频终端设备。1) Please refer to FIG. 2(a). At the beginning of the conference, the audio and video collection device can be preset in the middle of the conference site. When a local person 1 speaks at the conference site, the location of the speaker 1 can be located by voice, and the audio and video collection is performed. The device is close to the speaker 1 while the image is adjustable to optimize or even optimal, and then the image is sent to the video terminal device.
如果所述视频终端设备请求调教所述音视频采集设备,所述音视频采集设备可根据所述视频终端设备发送的坐标最优原则做相应处理;如果所述视频终端设备不请求调教所述音视频采集设备,则所述音视频采集设备可默认当前位置最优,如图2(b)所示。If the video terminal device requests to adjust the audio and video collection device, the audio and video collection device may perform corresponding processing according to the coordinate optimal principle sent by the video terminal device; if the video terminal device does not request to adjust the tone For the video capture device, the audio and video capture device may default to the current location, as shown in Figure 2(b).
2)如果会议中有两个人(发言人1和发言人2)同时发言,所述音视频采集设备可根据声音采集坐标自动移动到两个人预设的位置,一般情况下可在面向两个人的方向、且在两个人连线中心所在的弧形区域内。2) If two people (speaker 1 and speaker 2) speak at the same time, the audio and video capture device can automatically move to the preset position of two people according to the sound collection coordinates, and generally can be oriented to two people. Direction, and in the arc area where the center of the two people is connected.
如果所述视频终端设备请求调教所述音视频采集设备,所述音视频采集设备可根据所述视频终端设备发送的坐标最优原则做相应处理;如果所述视频终端设备不请求调教所述音视频采集设备,则所述音视频采集设备可默认当前位置最优,如图2(d)所示。If the video terminal device requests to adjust the audio and video collection device, the audio and video collection device may perform corresponding processing according to the coordinate optimal principle sent by the video terminal device; if the video terminal device does not request to adjust the tone For the video capture device, the audio and video capture device may default to the current location, as shown in Figure 2(d).
3)如果会议配置了根据发言人优先原则采集,所述视频终端设备可下发当前发言人的位置坐标,所述音视频采集设备可根据所述坐标移动,然后根据自身的音视频自动将自身位置调节优化乃至最优后把进行音视频采集得到的数据发送给所述视频终端设备。如图2(b)所示当前是发言人1讲话,所述音视频采集设备根据接收的坐标,可移动到发言人1采集位置并将自身位置调节优化乃至最优。当下一个发言人2发言时,所述视频终端设备可下发当前发言人2的位置坐标,所述音视频采集设备可移动到发言人2采集位置 并将自身位置调节优化乃至最优,如图2(c)所示。3) If the conference is configured to be collected according to the speaker priority principle, the video terminal device may send the location coordinates of the current speaker, and the audio and video capture device may move according to the coordinates, and then automatically set itself according to its own audio and video. The position adjustment optimization is optimized to send the data obtained by the audio and video collection to the video terminal device. As shown in FIG. 2(b), the speaker 1 is currently speaking. The audio and video collection device can move to the location of the speaker 1 according to the received coordinates and optimize or optimize the position adjustment. When the next speaker 2 speaks, the video terminal device can send the position coordinates of the current speaker 2, and the audio-video collecting device can move to the speaker 2 collection position and optimize and even optimize the position adjustment, as shown in the figure. 2(c).
4)如果是会议讨论环节,可根据声音的强弱做位置微调。如图2(b)所示,当检测到当前发言人1的声音最强,所述音视频采集设备可移动到发言人1采集优化乃至最优位置。如果检测到发言人1和发言人2声音强弱一致,则可回到如图2(d)所示的采集位置,或者,也可默认所述视频终端设备下发的位置为采集最优位置。4) If it is a discussion session, you can make a fine-tuning based on the strength of the sound. As shown in FIG. 2(b), when it is detected that the current speaker 1 has the strongest sound, the audio-video collecting device can move to the speaker 1 to collect optimization or even the optimal position. If it is detected that the voices of the speaker 1 and the speaker 2 are consistent, the device can return to the collection position as shown in FIG. 2(d), or the location of the video terminal device can be used as the optimal location for collection. .
讨论结束,默认回到2(a)的位置。At the end of the discussion, the default is to return to the 2(a) position.
(二)大型会场场景(2) Large-scale venue scene
请参阅图3(a)至图3(f),图3(a)至图3(f)为多个所述音视频采集设备的大型会议采集示意图,可说明的是,附图中的采集即为音视频采集设备的简写。Referring to FIG. 3( a ) to FIG. 3( f ), FIG. 3( a ) to FIG. 3( f ) are schematic diagrams of large-scale conference collection of a plurality of the audio and video collection devices, which may be illustrated in the drawing. It is short for audio and video capture equipment.
1)开启终端设备,所述音视频采集设备可预置在会场特定位置,如图3(a)所示。1) The terminal device is turned on, and the audio and video collection device can be preset at a specific location of the site, as shown in FIG. 3(a).
2)会议开始,所述音视频采集设备可预置在会场默认的初始位置,如图3(b)所示。2) At the beginning of the conference, the audio and video capture device can be preset to the default initial position of the conference site, as shown in Figure 3(b).
当本地有人发言,所述主音视频采集设备1可通过分析每个从音视频采集设备2、3和4上报的声音,定位到发言人的位置,所述主音视频采集设备1可向最近的一个或多个所述从音视频采集设备下发靠近发言人的指令,同时所述主音视频采集设备1可调教一个或多个从音视频采集设备让(合成或非合成)图像和声音达到优化乃至最优,过程可包括:When a local speaker speaks, the main audio and video capture device 1 can locate the location of the speaker by analyzing each voice reported from the audio and video capture devices 2, 3, and 4, and the master audio and video capture device 1 can move to the nearest one. Or the plurality of instructions from the audio and video collection device are sent to the speaker, and the main audio and video capture device 1 can teach one or more audio and video capture devices to optimize (synthesized or non-synthesized) images and sounds. Optimal, the process can include:
(1)如图3(c)所示,发言人2开始发言,如果是从音视频采集设备3单独采集就可直接把数据发送给所述主音视频采集设备1,由所述主音视频采集设备1发送给视频终端设备。(1) As shown in FIG. 3(c), the speaker 2 starts to speak, and if it is separately collected from the audio-video collecting device 3, the data can be directly sent to the main audio-video collecting device 1, and the main audio-video collecting device 1 is sent to the video terminal device.
如果会议电视的所述视频终端设备请求调教所述从音视频采集设备3,所述视频终端设备可发送给所述主音视频采集设备1坐标,由所述主音视频采集设备1统一分配到所述从音视频采集设备3,所述从音视频采集设备3可根据所述视频终端设备发送的坐标做相应处理;如果会议电视的所述视频终端设备不请求调教所述从音视频采集设备3,则所述从音视频采集设备3可默认当前位置最优。If the video terminal device of the conference television requests to tune the slave audio and video capture device 3, the video terminal device may send the coordinates to the master audio and video capture device 1, and the master audio and video capture device 1 uniformly allocates the From the audio and video collection device 3, the slave audio and video collection device 3 can perform corresponding processing according to the coordinates sent by the video terminal device; if the video terminal device of the conference television does not request to adjust the slave audio and video collection device 3, Then, the slave audio and video collection device 3 can default to the current location.
(2)如图3(d)所示,发言人2开始发言,如果是多台协同采集(从 音视频采集设备2和从音视频采集设备3),可把所述从音视频采集设备2和从音视频采集设备3采集的数据发送给所述主音视频采集设备1进行图像和声音合成,然后所述主音视频采集设备1可将合成的数据发送给会议电视的所述视频终端设备。(2) As shown in FIG. 3(d), the speaker 2 starts to speak. If there are multiple cooperative acquisitions (from the audio and video collection device 2 and the audio and video collection device 3), the slave audio and video collection device 2 can be And the data collected from the audio-video collecting device 3 is sent to the main audio-video collecting device 1 for image and sound synthesis, and then the main audio-video collecting device 1 can transmit the synthesized data to the video terminal device of the conference television.
如果会议电视的所述视频终端设备请求调教从音视频采集设备2,则所述视频终端设备可发送给所述主音视频采集设备1坐标,由所述主音视频采集设备1统一分配到所述从音视频采集设备2,所述从音视频采集设备2或从音视频采集设备3可根据所述视频终端设备发送的坐标做相应处理;如果会议电视的所述视频终端设备请求调教从音视频采集设备3,则所述视频终端设备发送给所述主音视频采集设备1坐标,由所述主音视频采集设备1统一分配到所述从音视频采集设备3,所述从音视频采集设备3可根据所述视频终端设备发送的坐标做相应处理;如果会议电视的所述视频终端设备不请求调教从音视频采集设备2,则所述从音视频采集设备2可默认当前位置最优;如果会议电视的所述视频终端设备不请求调教从音视频采集设备3,则所述从音视频采集设备3可默认当前位置最优。If the video terminal device of the conference television requests to tune from the audio and video collection device 2, the video terminal device may send the coordinates to the primary audio and video collection device 1, and the primary audio and video collection device 1 uniformly allocates the slave audio/video collection device 1 to the slave The audio/video collecting device 2, the slave audio/video collecting device 2 or the audio/video collecting device 3 can perform corresponding processing according to the coordinates sent by the video terminal device; if the video terminal device of the conference television requests tuning from the audio and video collection Device 3, the video terminal device sends the coordinates to the main audio and video collection device 1, and the main audio and video capture device 1 is uniformly allocated to the slave audio and video capture device 3, and the slave audio and video capture device 3 can be The coordinates sent by the video terminal device are processed accordingly; if the video terminal device of the conference television does not request to tune from the audio and video collection device 2, the slave audio/video collection device 2 may default to the current location; if the conference television The video terminal device does not request to tune from the audio and video collection device 3, and the slave audio and video collection device 3 can default to Optimal location.
(3)如果会议中,同时有多人发言(发言人2和发言人3),如图3(c)所示,所述主音视频采集设备1可向最近的一个或多个从音视频采集设备(比如从音视频采集设备2、从音视频采集设备3或从音视频采集设备4)下发靠近发言人的指令,同时所述主音视频采集设备1可调教(从音视频采集设备2、从音视频采集设备3或从音视频采集设备4)设备,让合成图像和声音达到优化乃至最优,(从音视频采集设备2、从音视频采集设备3或从音视频采集设备4)然后把图像发送给所述主音视频采集设备1,经过所述主音视频采集设备1合成处理后把合成后的图像数据发送给所述视频终端设备。(3) If there are many people speaking at the same time (Speaker 2 and Speaker 3), as shown in Figure 3(c), the lead audio and video capture device 1 can collect from the most recent one or more slave audio and video. The device (for example, from the audio and video capture device 2, from the audio and video capture device 3 or from the audio and video capture device 4) issues an instruction near the speaker, and the primary audio and video capture device 1 is adjustable (from the audio and video capture device 2) From the audio and video capture device 3 or from the audio and video capture device 4), the composite image and sound are optimized or optimized (from the audio and video capture device 2, from the audio and video capture device 3 or from the audio and video capture device 4) The image is sent to the main audio and video capture device 1, and after the main audio and video capture device 1 synthesizes the processed image data, the combined image data is sent to the video terminal device.
如图3(d)所示,如果所述视频终端设备请求调教音视频采集设备,所述视频终端设备可发送给所述主音视频采集设备1坐标,由所述主音视频采集设备1统一分配给从音视频采集设备2、从音视频采集设备3或从音视频采集设备4位置坐标;如果所述视频终端设备不请求调教音视频采集设备,则所述主、从音视频采集设备可默认当前位置最优。As shown in FIG. 3(d), if the video terminal device requests the tuning audio and video collection device, the video terminal device may send the coordinates to the primary audio and video collection device 1, and the primary audio and video collection device 1 uniformly allocates From the audio and video capture device 2, from the audio and video capture device 3 or from the audio and video capture device 4 position coordinates; if the video terminal device does not request the tuning audio and video capture device, the master and slave audio and video capture devices may default to the current The position is optimal.
(4)如果会议配置了根据发言人优先原则采集,所述视频终端设备可下发当前发言人1的位置坐标,如图3(e)所示:所述主音视频采集设备1可 根据坐标移动到当前发言人1的位置,如果所述主音视频采集设备1需要从音视频采集设备2和从音视频采集设备3协助采集,可下发命令给从音视频采集设备2和从音视频采集设备3,然后从音视频采集设备2和从音视频采集设备3可根据自身的音视频自动调节自身位置优化乃至最优后把进行音视频采集得到的图像和音频数据发送给所述主音视频采集设备1。所述主音视频采集设备1合成图像和对音频进行处理后可把合成后的图像数据和处理后的音频数据传送给所述视频终端设备。(4) If the conference is configured to be collected according to the speaker priority principle, the video terminal device can deliver the location coordinates of the current speaker 1, as shown in FIG. 3(e): the master audio and video capture device 1 can move according to coordinates. To the current speaker 1 location, if the primary audio and video capture device 1 needs to assist the acquisition from the audio and video capture device 2 and the audio and video capture device 3, the command can be sent to the audio and video capture device 2 and the audio and video capture device. 3, and then from the audio and video collection device 2 and the audio and video collection device 3 can automatically adjust its own position according to its own audio and video optimization or even the best to send the audio and video acquisition of the image and audio data to the main audio and video acquisition device 1. The main audio and video capture device 1 can synthesize the image and process the audio to transmit the combined image data and the processed audio data to the video terminal device.
如果当前发言人2发言,如图3(f)所示:所述主音视频采集设备1可根据坐标移动到当前发言人2的位置,如果所述主音视频采集设备1需要从音视频采集设备2和从音视频采集设备3协助采集,可下发命令给从音视频采集设备2和从音视频采集设备3,然后从音视频采集设备2和从音视频采集设备3可根据自身的音视频自动调节自身位置优化乃至最优后把进行音视频采集得到的图像和音频数据发送给所述主音视频采集设备1,所述主音视频采集设备1合成图像和对音频进行处理后可把合成后的图像数据和处理后的音频数据传送给所述视频终端设备。If the current speaker 2 speaks, as shown in FIG. 3(f): the main audio and video capture device 1 can move to the position of the current speaker 2 according to the coordinates, if the main audio and video capture device 1 needs to be from the audio and video capture device 2 And assisting the acquisition from the audio and video collection device 3, and issuing commands to the audio and video collection device 2 and the audio and video collection device 3, and then from the audio and video collection device 2 and the audio and video collection device 3 can automatically according to their own audio and video The image and audio data obtained by the audio and video acquisition are sent to the main audio and video collection device 1 after adjusting the position optimization and the optimization. The main audio and video collection device 1 combines the image and processes the audio to obtain the synthesized image. The data and the processed audio data are transmitted to the video terminal device.
(5)视频会议结束,音视频采集设备可默认移动到预定位置,如图3(a)所示。(5) At the end of the video conference, the audio and video capture device can be moved to the predetermined location by default, as shown in Figure 3(a).
本公开实施例提供的会议电视的音视频采集位置定位方法,基于无人移动技术实现的可移动音视频采集设备,根据对会议发言人的声音采集定位发言人的位置,所述音视频采集设备自动靠近发言人,并自动调节移动到发言人的满足声音采集预设条件的声音采集位置和满足图像采集预设条件的图像采集位置,实现会议电视的音视频采集位置的自动调整,相比本领域已知技术中会议电视的音视频采集方法,本公开基于声音识别,通过声音采集、图像采集及可移动音视频采集设备的移动,只要在布置会议电视系统时一次配置,以后就根据自动采集系统自动的调节采集效果,无须参会人员的过多人工干预,就可以自动调解音视频采集达到预设的效果,减少了会议电视的人力成本,提高了电视会议的效率。The audio and video collection position locating method of the conference television provided by the embodiment of the present disclosure is based on the movable audio and video collection device implemented by the unmanned mobile technology, and the location of the location speaker is collected according to the voice of the conference speaker, and the audio and video collection device Automatically close to the speaker, and automatically adjust the sound collection position that meets the sound collection preset condition and the image collection position that meets the image acquisition preset condition, and realize the automatic adjustment of the audio and video collection position of the conference television. The audio and video collection method of the conference television in the prior art is based on voice recognition, and the sound collection, the image acquisition, and the movement of the movable audio and video collection device are configured once, and then automatically collected according to the conference television system. The system automatically adjusts the collection effect, and can automatically adjust the audio and video collection to achieve the preset effect without excessive manual intervention by the participants, reducing the labor cost of the conference television and improving the efficiency of the video conference.
请参阅图4,本公开实施例还提供一种会议电视的音视频采集装置,所述装置可包括:Referring to FIG. 4, an embodiment of the present disclosure further provides an audio and video collection device for a conference television, where the device may include:
定位模块10,设置为:获取音视频采集设备进行声音采集得到的音频数 据,根据所述音频数据定位会场内发言的音视频源位置。The positioning module 10 is configured to: obtain audio data obtained by the audio and video collection device for sound collection, and locate the audio and video source position of the speaking in the conference according to the audio data.
可选地,当电视会议开始时,会议电视系统启动,开启终端设备,所述音视频采集设备可自动移动到会场内的预设初始位置,由于所述音视频采集设备上搭载视频和音频采集设备,当会场内有人发言时,所述音视频采集设备可开始采集音频数据和视频图像数据,所述音视频采集设备获取音频数据后,可根据所述音频数据定位发言位置,即声音定位。Optionally, when the video conference starts, the conference television system is started, and the terminal device is enabled, and the audio and video collection device can be automatically moved to a preset initial position in the conference site, because the audio and video collection device is equipped with video and audio collection. The audio and video collection device may start to collect audio data and video image data when the audio and video collection device obtains the audio data, and may locate the speaking position, that is, the sound localization, according to the audio data.
声音采集位置移动模块20,设置为:根据所述音视频源位置,移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置。The sound collection position moving module 20 is configured to: according to the audio and video source position, move the position of the audio and video collection device to a sound collection position that satisfies a sound collection preset condition.
可选地,所述音视频采集设备可根据获取到的音频数据,通过声音定位检测到会场中发言人的位置后,所述音视频采集设备可自动移动到距离发言位置的满足声音采集预设条件的声音采集预设位置,使所述音视频采集设备靠近发言位置,从而使所述音视频采集设备处于声音采集的较佳位置。Optionally, after the audio and video collection device detects the location of the speaker in the conference site by using the sound data, the audio and video capture device can automatically move to the position of the speaking position to satisfy the sound collection preset. The conditional sound captures the preset position, so that the audio and video collection device is close to the speaking position, so that the audio and video collection device is in a better position for sound collection.
图像采集位置移动模块30,设置为:根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。The image capturing position moving module 30 is configured to: move the position of the audio and video capturing device to an image capturing position that satisfies an image capturing preset condition according to the audio and video source position and the sound collecting position.
可选地,所述音视频采集设备可根据当前声音采集的预设位置,获取会场的视频图像数据,判断所述音视频采集设备在声音采集预设位置采集的视频图像是否满足预设的视频图像预设条件,即采集的图像是否足够清晰。当判断采集的图像不能满足预设的视频图像预设条件时,所述音视频采集设备可根据所述发言位置和所述声音采集位置,判断需要调整的所述音视频采集设备的满足图像采集预设条件的图像采集位置,并移动到所述图像采集位置,通过所述音视频采集设备的位置移动,使所述音视频采集设备采集到的视频图像达到视频图像采集的预设条件。Optionally, the audio and video collection device may obtain the video image data of the site according to the preset position of the current sound collection, and determine whether the video image captured by the audio and video capture device at the sound collection preset position satisfies the preset video. The image preset condition, that is, whether the captured image is clear enough. When it is determined that the captured image cannot meet the preset video image preset condition, the audio and video collection device may determine, according to the speaking position and the sound collection location, that the audio and video collection device that needs to be adjusted satisfies the image collection. Presetting the image capturing position of the condition and moving to the image capturing position, and moving the position of the audio and video collecting device to make the video image collected by the audio and video collecting device reach a preset condition for video image capturing.
可选地,所述装置还包括:Optionally, the device further includes:
位置初始模块,设置为:会议开始时,移动所述音视频采集设备的自身位置到会场的预设初始位置。The location initial module is configured to: when the conference starts, move the position of the audio and video capture device to a preset initial position of the conference site.
可选地,当电视会议开始时,会议电视系统启动,开启终端设备,所述音视频采集设备可自动移动到会场内的预设初始位置。Optionally, when the video conference starts, the conference television system is started, and the terminal device is turned on, and the audio and video collection device can be automatically moved to a preset initial position in the conference site.
位置移回模块,设置为:当在预设时间阈值内未接收到所述音频数据时,移动所述音视频采集设备的自身位置回到所述预设初始位置。The position moving back module is configured to: when the audio data is not received within the preset time threshold, move the position of the audio and video collection device to return to the preset initial position.
可选地,当在预设时间阈值内,比如30秒内,所述音视频采集设备未接收到所述音频数据时,可判断当前发言人发言完毕,此时所述音视频采集设备可移动回到所述预设初始位置。Optionally, when the audio and video collection device does not receive the audio data within a preset time threshold, for example, within 30 seconds, the current speaker may be judged to be finished, and the audio and video collection device may be moved. Go back to the preset initial position.
本公开实施例还提供一个或多个存储有计算机可执行指令的非易失性计算机可读存储介质,所述计算机可执行指令被一个或多个处理器执行时,其中,可使得所述一个或多个处理器执行所述方法的步骤。Embodiments of the present disclosure also provide one or more non-transitory computer readable storage media storing computer-executable instructions, when the computer-executable instructions are executed by one or more processors, wherein the one Or a plurality of processors perform the steps of the method.
本公开实施例还提供一种终端设备,所述终端设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现所述方法的步骤。Embodiments of the present disclosure also provide a terminal device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer The steps of the method are implemented at the time of the program.
本公开实施例还提供一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现所述方法的步骤。Embodiments of the present disclosure also provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer The steps of the method are implemented at the time of the program.
本公开实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述会议电视的音视频采集方法方法。The embodiment of the present disclosure further provides a computer readable storage medium storing computer executable instructions, which are implemented to implement the audio and video collection method method of the conference television.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述程序可存储于一计算机可读取存储介质中,如本公开实施例中,该程序可存储于计算机系统的存储介质中,并被该计算机系统中的至少一个处理器执行,以实现包括如上述方法的实施例的流程。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, such as the present disclosure. In an embodiment, the program can be stored in a storage medium of the computer system and executed by at least one processor in the computer system to implement a process comprising an embodiment of the method described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机 存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM,Random Access Memory)、只读存储器(ROM,Read-Only Memory)、电可擦除只读存储器(EEPROM,Electrically Erasable Programmable Read-only Memory)、闪存或其他存储器技术、光盘只读存储器(CD-ROM,Compact Disc Read-Only Memory)、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and functional blocks/units of the methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components work together. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on a computer readable medium, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media. The computer storage medium includes, but is not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), and Electrically Erasable Programmable Read-only Memory (EEPROM). Flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disc storage, magnetic cassette, magnetic tape, disk storage or other magnetic storage device, or Any other medium used to store the desired information and that can be accessed by the computer. Moreover, it is well known to those skilled in the art that communication media typically includes computer readable instructions, data structures, program modules or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .
本领域的普通技术人员可以理解,可以对本公开的技术方案进行修改或者等同替换,而不脱离本公开技术方案的精神和范围,均应涵盖在本公开的权利要求范围当中。A person skilled in the art can understand that the technical solutions of the present disclosure may be modified or equivalent, without departing from the spirit and scope of the present disclosure, and should be included in the scope of the claims of the present disclosure.
工业实用性Industrial applicability
本公开实施例提供的一种会议电视的音视频采集方法、装置和终端设备,通过获取声音采集的音频数据,根据所述音频数据定位会场内发言的音视频源位置;根据所述音视频源位置,移动音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置;根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。本公开实施例提供的会议电视的音视频采集方法,相比本领域已知技术中会议电视的音视频采集方法,基于声音识别,通过声音采集、图像采集及通过无人技术实现的音视频采集设备的移动,只要在布置会议电视系统时进行一次配置,以后根据对会议发言人的声音采集来定位发言人的位置,实现音视频采集设备自动靠近发言人,无须参会人员的过多人工干预,自动调节移动到满足声音采集预设条件的声音采集位置和满足图像采集预设条件的图像采集位置,实现会议电视的音视频采集位置的自动调整,达到会议电视的音视频采集的预设效果,减少了会议电视的人力成本,提高了电视会议的效率。An audio and video collection method, apparatus, and terminal device for a conference television according to an embodiment of the present disclosure, by acquiring audio data collected by a sound, and locating an audio and video source position of a conference in the conference according to the audio data; according to the audio and video source Positioning, moving the position of the audio and video collection device to a sound collection position satisfying the sound collection preset condition; moving the position of the audio and video collection device to meet the image collection according to the audio and video source position and the sound collection position The image acquisition position of the preset condition. The audio and video collection method of the conference television provided by the embodiment of the present disclosure is based on sound recognition, sound collection, image acquisition, and audio and video collection through unmanned technology, compared to audio and video collection methods of conference television in the prior art. The movement of the device is configured once during the deployment of the conference television system. Later, the location of the speaker is located according to the voice collection of the conference speaker, so that the audio and video collection device is automatically approached to the speaker without excessive manual intervention by the participants. Automatically adjust to the sound collection position that satisfies the sound collection preset condition and the image acquisition position that satisfies the image acquisition preset condition, and realize the automatic adjustment of the audio and video collection position of the conference television to achieve the preset effect of the audio and video collection of the conference television. , reducing the labor cost of conference TV and improving the efficiency of video conferencing.

Claims (10)

  1. 一种会议电视的音视频采集方法,所述方法包括:An audio and video collection method for a conference television, the method comprising:
    获取音视频采集设备进行声音采集得到的音频数据,根据所述音频数据定位会场内发言的音视频源位置;Obtaining audio data obtained by the audio and video collection device for sound collection, and positioning the audio and video source position of the speaking in the venue according to the audio data;
    根据所述音视频源位置,移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置;And moving the position of the audio and video collection device to a sound collection position that satisfies a sound collection preset condition according to the audio and video source position;
    根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。And moving the position of the audio and video collection device to an image collection position that satisfies an image acquisition preset condition according to the audio and video source position and the sound collection position.
  2. 根据权利要求1所述的方法,所述获取声音采集的音频数据,根据所述音频数据定位会场内发言的音视频源位置的步骤之前,还包括:The method according to claim 1, wherein the step of acquiring the audio data collected by the sound, and locating the position of the audio and video source speaking in the venue according to the audio data, further includes:
    会议开始时,移动所述音视频采集设备的自身位置到会场的预设初始位置。At the beginning of the conference, the position of the audio and video capture device is moved to a preset initial position of the conference site.
  3. 根据权利要求2所述的方法,所述会议开始时,移动所述音视频采集设备的自身位置到会场的预设初始位置的步骤之后,还包括:The method of claim 2, after the step of moving the position of the audio and video collection device to the preset initial position of the site, the method further includes:
    当会议是按照发言顺序采集音视频时,接收发言的音视频源位置,根据所述音视频源位置移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置和图像采集预设条件的图像采集位置。When the conference collects audio and video in the order of speaking, the audio and video source position of the speech is received, and the position of the audio and video collection device is moved according to the position of the audio and video source to the sound collection position and image collection that meet the preset conditions of the sound collection. The image acquisition position of the preset condition.
  4. 根据权利要求2所述的方法,所述根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置的步骤之后,还包括:The method according to claim 2, after the step of moving the position of the audio-video collecting device to the image capturing position satisfying the image capturing preset condition according to the audio-video source position and the sound collecting position, Also includes:
    当在预设时间阈值内未接收到所述音频数据时,移动所述音视频采集设备的自身位置回到所述预设初始位置。When the audio data is not received within the preset time threshold, the position of the audio and video capture device is moved back to the preset initial position.
  5. 根据权利要求1所述的方法,其中,所述根据所述音视频源位置,移动音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置的步骤包括:The method according to claim 1, wherein the step of moving the position of the audio/video collecting device to the sound collecting position satisfying the sound collecting preset condition according to the position of the audio and video source comprises:
    当检测到至少有两人同时发言时,根据同时发言的每个发言的音视频源位置,以同时发言的所述音视频源位置为顶点形成几何图形,以相对于所述 几何图形中心的预设阈值范围内的位置为目标位置,移动所述音视频采集设备的自身位置到所述目标位置。When it is detected that at least two people are speaking at the same time, according to the audio and video source position of each speech of the simultaneous speech, the geometrical image is formed with the audio and video source position of the simultaneous speech as a vertex to be relative to the geometric center The position within the threshold range is set as the target position, and the position of the audio and video collection device is moved to the target position.
  6. 根据权利要求1所述的方法,所述根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置的步骤之后,还包括:The method according to claim 1, after the step of moving the position of the audio-video collecting device to the image capturing position satisfying the image capturing preset condition according to the audio-video source position and the sound collecting position, Also includes:
    接收调整音视频采集位置的指令,根据所述指令移动所述音视频采集设备的音视频采集位置。Receiving an instruction to adjust an audio and video collection location, and moving the audio and video collection location of the audio and video collection device according to the instruction.
  7. 根据权利要求1至6任一项所述的方法,还包括:A method according to any one of claims 1 to 6, further comprising:
    当包括至少两个所述音视频采集设备时,选取其中一个音视频采集设备为主音视频采集设备,通过主音视频采集设备控制会场内其他音视频采集设备。When at least two of the audio and video collection devices are included, one of the audio and video capture devices is selected as the main audio and video capture device, and the other audio and video capture devices in the conference are controlled by the main audio and video capture device.
  8. 一种会议电视的音视频采集装置,所述装置包括:An audio and video collection device for a conference television, the device comprising:
    定位模块,设置为:获取音视频采集设备进行声音采集得到的音频数据,根据所述音频数据定位会场内发言的音视频源位置;The positioning module is configured to: obtain audio data obtained by the audio and video collection device for sound collection, and locate the audio and video source position of the speaking in the conference according to the audio data;
    声音采集位置移动模块,设置为:根据所述音视频源位置,移动所述音视频采集设备的自身位置到满足声音采集预设条件的声音采集位置;The sound collection position moving module is configured to: move the position of the audio and video collection device to a sound collection position that satisfies a sound collection preset condition according to the audio and video source position;
    图像采集位置移动模块,设置为:根据所述音视频源位置和所述声音采集位置,移动所述音视频采集设备的自身位置到满足图像采集预设条件的图像采集位置。The image capturing position moving module is configured to: move the position of the audio and video collecting device to an image capturing position that satisfies an image capturing preset condition according to the audio and video source position and the sound collecting position.
  9. 根据权利要求8所述的装置,所述装置还包括:The apparatus of claim 8 further comprising:
    位置初始模块,设置为:会议开始时,移动所述音视频采集设备的自身位置到会场的预设初始位置;The location initial module is configured to: when the conference starts, move the position of the audio and video collection device to a preset initial position of the conference site;
    位置移回模块,设置为:当在预设时间阈值内未接收到所述音频数据时,移动所述音视频采集设备的自身位置回到所述预设初始位置。The position moving back module is configured to: when the audio data is not received within the preset time threshold, move the position of the audio and video collection device to return to the preset initial position.
  10. 一种终端设备,所述终端设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现权利要求1至7任一项所述方法的步骤。A terminal device, comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the computer program when implementing the claim 1 The steps of any of the methods of any of 7.
PCT/CN2018/094807 2017-07-12 2018-07-06 Audio and video acquisition method and apparatus for conference television, and terminal device WO2019011189A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710566930.7 2017-07-12
CN201710566930.7A CN109257558A (en) 2017-07-12 2017-07-12 Audio/video acquisition method, device and the terminal device of video conferencing

Publications (1)

Publication Number Publication Date
WO2019011189A1 true WO2019011189A1 (en) 2019-01-17

Family

ID=65001038

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094807 WO2019011189A1 (en) 2017-07-12 2018-07-06 Audio and video acquisition method and apparatus for conference television, and terminal device

Country Status (2)

Country Link
CN (1) CN109257558A (en)
WO (1) WO2019011189A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110351629B (en) * 2019-07-16 2021-01-19 广州国音智能科技有限公司 Radio reception method, radio reception device and terminal
CN110536101A (en) * 2019-09-29 2019-12-03 广州视源电子科技股份有限公司 Electronic platform, video conferencing system and method
CN111107296B (en) * 2019-11-26 2022-12-23 视联动力信息技术股份有限公司 Audio data acquisition method and device, electronic equipment and readable storage medium
CN111294681B (en) * 2020-02-28 2021-10-22 联想(北京)有限公司 Classroom terminal system and control method, controller and master control equipment thereof
CN111883186B (en) * 2020-07-10 2022-12-23 上海明略人工智能(集团)有限公司 Recording device, voice acquisition method and device, storage medium and electronic device
CN112689116A (en) * 2020-12-04 2021-04-20 北京芯翌智能信息技术有限公司 Video conference system, control method thereof, storage medium and terminal
CN116684735B (en) * 2023-06-14 2024-04-09 广州市远知初电子科技有限公司 Audio and video acquisition system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233321A1 (en) * 2006-03-29 2007-10-04 Kabushiki Kaisha Toshiba Position detecting device, autonomous mobile device, method, and computer program product
CN201294577Y (en) * 2008-10-31 2009-08-19 比亚迪股份有限公司 Signal acquisition device for conference system
CN102137318A (en) * 2010-01-22 2011-07-27 华为终端有限公司 Method and device for controlling adapterization
CN104010251A (en) * 2013-02-27 2014-08-27 晨星半导体股份有限公司 Radio system and related method
CN105283775A (en) * 2013-04-12 2016-01-27 株式会社日立制作所 Mobile robot and sound source position estimation system
CN105765964A (en) * 2013-11-27 2016-07-13 思科技术公司 Shift camera focus based on speaker position

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233321A1 (en) * 2006-03-29 2007-10-04 Kabushiki Kaisha Toshiba Position detecting device, autonomous mobile device, method, and computer program product
CN201294577Y (en) * 2008-10-31 2009-08-19 比亚迪股份有限公司 Signal acquisition device for conference system
CN102137318A (en) * 2010-01-22 2011-07-27 华为终端有限公司 Method and device for controlling adapterization
CN104010251A (en) * 2013-02-27 2014-08-27 晨星半导体股份有限公司 Radio system and related method
CN105283775A (en) * 2013-04-12 2016-01-27 株式会社日立制作所 Mobile robot and sound source position estimation system
CN105765964A (en) * 2013-11-27 2016-07-13 思科技术公司 Shift camera focus based on speaker position

Also Published As

Publication number Publication date
CN109257558A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
WO2019011189A1 (en) Audio and video acquisition method and apparatus for conference television, and terminal device
US9883143B2 (en) Automatic switching between dynamic and preset camera views in a video conference endpoint
US20200186649A1 (en) Camera tracking method and director device
JP4482330B2 (en) System and method for providing recognition of a remote person in a room during a video conference
CN100505837C (en) System and method for controlling image collector for target positioning
CN107613243A (en) A kind of panoramic video recording arrangement and method for recording based on tone tracking
JP2015056905A (en) Reachability of sound
US11601731B1 (en) Computer program product and method for auto-focusing a camera on an in-person attendee who is speaking into a microphone at a hybrid meeting that is being streamed via a videoconferencing system to remote attendees
CN207443029U (en) A kind of panoramic video recording arrangement based on tone tracking
CN108513063A (en) A kind of intelligent meeting camera system captured automatically
EP2528326A1 (en) Method and device for switching video pictures
JP2017191967A (en) Speech output device, speech output system, speech output method and program
US10104490B2 (en) Optimizing the performance of an audio playback system with a linked audio/video feed
US10225670B2 (en) Method for operating a hearing system as well as a hearing system
US9686511B1 (en) Positioning system and method for image capturing devices
CN102308597B (en) Conference microphone system
CN111343413A (en) Video conference system and display method thereof
CN111222117A (en) Identification method and device of identity information
CN107438169A (en) Alignment system, pre-determined bit method and real-time location method
JP2015069136A (en) Communication conference device having sound volume adjustment function for each speaker
KR101680524B1 (en) System for displaying speaker in conference room and control method thereof
CN217590959U (en) Novel remote audio and video conference system
WO2018113083A1 (en) Voice acquisition method, device and system
US11902659B1 (en) Computer program product and method for auto-focusing a lighting fixture on a person in a venue who is wearing, or carrying, or holding, or speaking into a microphone at the venue
US11889187B1 (en) Computer program product and method for auto-focusing one or more lighting fixtures on selected persons in a venue who are performers of a performance occurring at the venue

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18831798

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.05.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18831798

Country of ref document: EP

Kind code of ref document: A1